debian

Debian中Hadoop与Spark集成方法

小樊
49
2025-09-19 20:13:44
栏目: 智能运维

Prerequisites
Before integrating Hadoop and Spark on Debian, ensure you have:

1. Install and Configure Hadoop
Download Hadoop (e.g., 3.3.6) from the Apache website and extract it to /opt:

wget https://dlcdn.apache.org/hadoop/common/hadoop-3.3.6/hadoop-3.3.6.tar.gz
tar -xzvf hadoop-3.3.6.tar.gz -C /opt
ln -s /opt/hadoop-3.3.6 /opt/hadoop  # Create a symbolic link for easy access

Set environment variables in /etc/profile:

echo "export HADOOP_HOME=/opt/hadoop" >> /etc/profile
echo "export PATH=\$PATH:\$HADOOP_HOME/bin:\$HADOOP_HOME/sbin" >> /etc/profile
source /etc/profile

Configure core Hadoop files in $HADOOP_HOME/etc/hadoop:

Format HDFS (only once) and start services:

hdfs namenode -format
start-dfs.sh  # Start HDFS
start-yarn.sh  # Start YARN

Verify with hdfs dfsadmin -report (check DataNodes) and yarn node -list (check NodeManagers).

2. Install and Configure Spark
Download Spark (e.g., 3.3.2) pre-built for Hadoop (e.g., spark-3.3.2-bin-hadoop3.tgz) and extract it to /opt:

wget https://dlcdn.apache.org/spark/spark-3.3.2/spark-3.3.2-bin-hadoop3.tgz
tar -xzvf spark-3.3.2-bin-hadoop3.tgz -C /opt
ln -s /opt/spark-3.3.2-bin-hadoop3 /opt/spark  # Symbolic link

Set environment variables in /etc/profile:

echo "export SPARK_HOME=/opt/spark" >> /etc/profile
echo "export PATH=\$PATH:\$SPARK_HOME/bin:\$SPARK_HOME/sbin" >> /etc/profile
source /etc/profile

Configure Spark to integrate with Hadoop:

Start Spark’s master and worker nodes:

start-master.sh  # Start Spark Master (accessible at http://localhost:8080)
start-slave.sh spark://localhost:7077  # Start Spark Worker

3. Integrate Hadoop and Spark
The key to integration is ensuring Spark can access Hadoop’s resources (HDFS, YARN). The above configurations achieve this by:

To validate integration, run a Spark job that reads/writes data from HDFS:

# Example: Count words in an HDFS file
/opt/spark/bin/run-example SparkPi 10  # Run a sample Spark job
/opt/spark/bin/spark-submit \
  --class org.apache.spark.examples.SparkPi \
  --master yarn \
  --deploy-mode client \
  /opt/spark/examples/jars/spark-examples_2.12-3.3.2.jar 10

Troubleshooting Tips

0
看了该问题的人还看了