在Linux上实现Hadoop的高可用性(High Availability, HA)主要是通过配置HDFS的高可用性和YARN的高可用性来实现的。以下是详细的步骤和配置说明:
HADOOP_HOME
和 JAVA_HOME
。core-site.xml
:<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://mycluster</value>
</property>
<property>
<name>dfs.nameservices</name>
<value>mycluster</value>
</property>
<property>
<name>dfs.ha.namenodes.mycluster</name>
<value>nn1,nn2</value>
</property>
<property>
<name>dfs.namenode.rpc-address.mycluster.nn1</name>
<value>master:8020</value>
</property>
<property>
<name>dfs.namenode.http-address.mycluster.nn1</name>
<value>master:9870</value>
</property>
<property>
<name>dfs.namenode.rpc-address.mycluster.nn2</name>
<value>node1:8020</value>
</property>
<property>
<name>dfs.namenode.http-address.mycluster.nn2</name>
<value>node1:9870</value>
</property>
<property>
<name>dfs.ha.namenode.shared.edits.dir</name>
<value>qjournal://master:8485;node1:8485;node2:8485/mycluster</value>
</property>
</configuration>
hdfs-site.xml
:<configuration>
<property>
<name>dfs.replication</name>
<value>3</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>/usr/local/soft/hadoop-3.1.2/data/namenode</value>
</property>
<property>
<name>dfs.namenode.shared.edits.dir</name>
<value>qjournal://master:8485;node1:8485;node2:8485/mycluster</value>
</property>
<property>
<name>dfs.ha.fencing.methods</name>
<value>ssh</value>
</property>
<property>
<name>dfs.ha.automatic-failover.enabled</name>
<value>true</value>
</property>
</configuration>
yarn-site.xml
:<configuration>
<property>
<name>yarn.resourcemanager.ha.enabled</name>
<value>true</value>
</property>
<property>
<name>yarn.resourcemanager.cluster-id</name>
<value>yarn-cluster-12345</value>
</property>
<property>
<name>yarn.resourcemanager.ha.rm-ids</name>
<value>rm1,rm2</value>
</property>
<property>
<name>yarn.resourcemanager.address.rm1</name>
<value>master:8088</value>
</property>
<property>
<name>yarn.resourcemanager.address.rm2</name>
<value>node2:8088</value>
</property>
</configuration>
hdfs namenode -format
start-dfs.sh
start-yarn.sh
jps
命令检查NameNode和ResourceManager是否正常运行。通过以上步骤,可以在Linux上成功配置Hadoop的高可用性,确保集群在部分节点故障时仍能继续运行。