CentOS环境下Hadoop高效部署指南
sudo yum install -y java-1.8.0-openjdk-devel
echo 'export JAVA_HOME=/usr/lib/jvm/java-1.8.0-openjdk' >> /etc/profile
source /etc/profile
ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa
cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
scp ~/.ssh/authorized_keys user@slave1:~/.ssh/ # 重复至所有从节点
sudo systemctl stop firewalld && sudo systemctl disable firewalld
sudo setenforce 0
sudo sed -i 's/SELINUX=enforcing/SELINUX=disabled/g' /etc/selinux/config
echo "192.168.1.101 node1
192.168.1.102 node2
192.168.1.103 node3" | sudo tee -a /etc/hosts
/opt/hadoop),并设置所有权:wget https://downloads.apache.org/hadoop/common/hadoop-3.3.4/hadoop-3.3.4.tar.gz
tar -xzvf hadoop-3.3.4.tar.gz -C /opt/
sudo chown -R user:user /opt/hadoop # 替换为实际用户
/etc/profile,添加Hadoop路径:echo 'export HADOOP_HOME=/opt/hadoop' >> /etc/profile
echo 'export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin' >> /etc/profile
source /etc/profile
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://mycluster</value> <!-- HA逻辑名称 -->
</property>
<property>
<name>ha.zookeeper.quorum</name>
<value>node1:2181,node2:2181,node3:2181</value> <!-- ZooKeeper集群 -->
</property>
</configuration>
<configuration>
<property>
<name>dfs.nameservices</name>
<value>mycluster</value>
</property>
<property>
<name>dfs.ha.namenodes.mycluster</name>
<value>nn1,nn2</value> <!-- 两个NameNode ID -->
</property>
<property>
<name>dfs.namenode.rpc-address.mycluster.nn1</name>
<value>node1:8020</value>
</property>
<property>
<name>dfs.namenode.rpc-address.mycluster.nn2</name>
<value>node2:8020</value>
</property>
<property>
<name>dfs.namenode.shared.edits.dir</name>
<value>qjournal://node1:8485;node2:8485;node3:8485/mycluster</value> <!-- JournalNode共享目录 -->
</property>
<property>
<name>dfs.journalnode.edits.dir</name>
<value>/opt/hadoop/journal</value> <!-- JournalNode数据目录 -->
</property>
<property>
<name>dfs.ha.automatic-failover.enabled</name>
<value>true</value> <!-- 开启自动故障转移 -->
</property>
<property>
<name>dfs.replication</name>
<value>3</value> <!-- 数据副本数 -->
</property>
</configuration>
<configuration>
<property>
<name>yarn.resourcemanager.hostname</name>
<value>node1</value> <!-- ResourceManager所在节点 -->
</property>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value> <!-- Shuffle服务 -->
</property>
<property>
<name>yarn.scheduler.maximum-allocation-mb</name>
<value>8192</value> <!-- 单个任务最大内存 -->
</property>
</configuration>
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>
hdfs namenode -format
start-dfs.sh
start-yarn.sh
jps命令查看各节点进程(主节点应有NameNode、ResourceManager;从节点应有DataNode、NodeManager)。http://node1:9870(HDFS Web UI)和http://node1:8088(YARN Web UI),确认集群状态正常。hdfs dfs -put /local/file.txt /user/root/
hdfs dfs -ls /user/root/
<property>
<name>dfs.blocksize</name>
<value>256M</value>
</property>
yarn.scheduler.maximum-allocation-mb(如8GB内存节点设置为6GB),避免单个任务占用过多资源。yarn.nodemanager.resource.memory-mb(如设置为节点内存的80%),提升资源利用率。hdfs-site.xml中的dfs.ha.namenodes、dfs.namenode.shared.edits.dir等参数,以及core-site.xml中的ha.zookeeper.quorum。