在Ubuntu上集成HDFS与YARN涉及多个步骤,包括安装Hadoop、配置环境变量、设置HDFS和YARN的相关配置文件以及启动相应的服务。以下是一个详细的教程:
安装Java:
sudo apt-get update
sudo apt-get install openjdk-8-jdk
java -version
安装SSH:
sudo apt-get install openssh-server
ssh-keygen -t rsa
cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
下载并解压Hadoop:
wget https://mirrors.tuna.tsinghua.edu.cn/apache/hadoop/common/hadoop-3.4.0/hadoop-3.4.0.tar.gz
tar -xzf hadoop-3.4.0.tar.gz
sudo mv hadoop-3.4.0 /usr/local/hadoop
修改配置文件:
core-site.xml:
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://localhost:9000</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/opt/hadoop/tmp</value>
</property>
</configuration>
hdfs-site.xml:
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>/data/nn</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>/data/dn</value>
</property>
</configuration>
配置HDFS环境变量:
编辑~/.bashrc
文件:
export HADOOP_HOME=/usr/local/hadoop
export PATH=$PATH:$HADOOP_HOME/bin
source ~/.bashrc
格式化NameNode:
hdfs namenode -format
启动HDFS集群:
start-dfs.sh
修改配置文件:
yarn-site.xml:
<configuration>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
<property>
<name>yarn.resourcemanager.hostname</name>
<value>localhost</value>
</property>
</configuration>
mapred-site.xml:
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>
配置YARN环境变量:
编辑~/.bashrc
文件:
export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
export HADOOP_HOME=/usr/local/hadoop
export PATH=$PATH:$HADOOP_HOME/bin
source ~/.bashrc
启动YARN集群:
start-yarn.sh
检查进程:
jps
验证HDFS状态:
hdfs dfs -ls /
验证YARN应用:
yarn application -list
通过以上步骤,你可以在Ubuntu上成功集成HDFS与YARN。确保所有配置文件保持一致,并根据实际集群规模和工作负载需求调整配置参数。