配置CentOS上的HDFS高可用性(HA)涉及多个步骤,包括设置NameNode、SecondaryNameNode、DataNode、ZooKeeper以及配置相关文件。以下是一个基本的指南,帮助你在CentOS系统上配置HDFS高可用性。
下载并解压ZooKeeper:
wget https://downloads.apache.org/zookeeper/zookeeper-3.8.0/apache-zookeeper-3.8.0-bin.tar.gz
tar -xzf apache-zookeeper-3.8.0-bin.tar.gz
mv apache-zookeeper-3.8.0-bin /opt/zookeeper
配置ZooKeeper:
编辑/opt/zookeeper/conf/zoo.cfg
文件,添加或修改以下内容:
dataDir=/var/lib/zookeeper
clientPort=2181
server.1=node1:2888
server.2=node2:2888
server.3=node3:2888
创建myid文件: 在每台节点上创建一个myid文件,内容为其节点编号。
echo 1 > /var/lib/zookeeper/myid # 在node1上
echo 2 > /var/lib/zookeeper/myid # 在node2上
echo 3 > /var/lib/zookeeper/myid # 在node3上
启动ZooKeeper服务:
/opt/zookeeper/bin/zkServer.sh start
下载并解压Hadoop:
wget https://downloads.apache.org/hadoop/core/hadoop-3.2.0/hadoop-3.2.0.tar.gz
tar -xzf hadoop-3.2.0.tar.gz
mv hadoop-3.2.0 /opt/hadoop
配置环境变量:
编辑/etc/profile
文件,添加以下内容:
export HADOOP_HOME=/opt/hadoop
export PATH=$PATH:$HADOOP_HOME/bin
export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop
配置core-site.xml:
编辑/opt/hadoop/etc/hadoop/core-site.xml
文件,添加以下内容:
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://mycluster</value>
</property>
<property>
<name>ha.zookeeper.quorum</name>
<value>node1:2181,node2:2181,node3:2181</value>
</property>
</configuration>
配置hdfs-site.xml:
编辑/opt/hadoop/etc/hadoop/hdfs-site.xml
文件,添加以下内容:
<configuration>
<property>
<name>dfs.nameservices</name>
<value>mycluster</value>
</property>
<property>
<name>dfs.ha.namenodes.mycluster</name>
<value>nn1,nn2</value>
</property>
<property>
<name>dfs.namenode.rpc-address.mycluster.nn1</name>
<value>node1:8020</value>
</property>
<property>
<name>dfs.namenode.rpc-address.mycluster.nn2</name>
<value>node2:8020</value>
</property>
<property>
<name>dfs.namenode.http-address.mycluster.nn1</name>
<value>node1:50070</value>
</property>
<property>
<name>dfs.namenode.http-address.mycluster.nn2</name>
<value>node2:50070</value>
</property>
<property>
<name>dfs.namenode.shared.edits.dir</name>
<value>qjournal://node1:8485;node2:8485;node3:8485/mycluster</value>
</property>
<property>
<name>dfs.journalnode.edits.dir</name>
<value>/var/lib/hadoop/journalnode</value>
</property>
<property>
<name>dfs.client.failover.proxy.provider.mycluster</name>
<value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
</property>
<property>
<name>dfs.ha.fencing.methods</name>
<value>sshfence</value>
</property>
<property>
<name>dfs.ha.fencing.ssh.private-key-files</name>
<value>/root/.ssh/id_rsa</value>
</property>
<property>
<name>dfs.ha.automatic-failover.enabled</name>
<value>true</value>
</property>
</configuration>
hdfs-site.xml
文件,添加以下内容:<configuration>
<property>
<name>dfs.datanode.data.dir</name>
<value>/var/lib/hadoop/datanode</value>
</property>
<property>
<name>dfs.namenode.rpc-address.mycluster.nn1</name>
<value>node1:8020</value>
</property>
<property>
<name>dfs.namenode.rpc-address.mycluster.nn2</name>
<value>node2:8020</value>
</property>
</configuration>
格式化NameNode: 在NameNode节点上执行以下命令:
hdfs namenode -format
启动Hadoop服务:
start-dfs.sh
hdfs haadmin
通过以上步骤,你可以在CentOS上配置一个高可用的HDFS集群。请根据你的具体需求和环境调整配置。