以下是Hadoop在Debian上配置高可用性的核心步骤,基于搜索资源中的关键信息整理:
基础配置
sudo apt install openjdk-8-jdk。/etc/hosts,确保节点间可通过主机名访问。sudo ufw disable(生产环境需按需配置)。节点规划
wget https://downloads.apache.org/zookeeper/zookeeper-3.8.0/apache-zookeeper-3.8.0-bin.tar.gz,解压至/opt/zookeeper。mkdir -p /var/lib/zookeeper,在conf/zoo.cfg中配置集群节点(server.1=node1:2888:3888等)。/var/lib/zookeeper目录下创建myid文件,内容为节点ID(如节点1写1)。zkServer.sh start,验证状态:zkServer.sh status。core-site.xml:设置默认文件系统为HDFS集群名,如<value>hdfs://mycluster</value>,并指定ZooKeeper地址ha.zookeeper.quorum。hdfs-site.xml:
dfs.nameservices=mycluster,dfs.ha.namenodes.mycluster=nn1,nn2。dfs.namenode.rpc-address.mycluster.nn1=node1:8020,nn2=node2:8020。dfs.namenode.shared.edits.dir=qjournal://node1:8485;node2:8485;node3:8485/mycluster。dfs.ha.automatic-failover.enabled=true。hdfs --daemon start journalnode。yarn-site.xml:
yarn.resourcemanager.ha.enabled=true。yarn.resourcemanager.cluster-id=rm-cluster,yarn.resourcemanager.ha.rm-ids=rm1,rm2。yarn.resourcemanager.address.rm1=node1:8032,rm2=node2:8032。start-dfs.sh(启动HDFS)、start-yarn.sh(启动YARN)。hdfs --daemon stop namenode,观察备用NameNode是否自动接管。hdfs haadmin -transitionToActive --forcemanual nn1手动切换(测试后需恢复)。hadoop fsck检查数据完整性,定期备份NameNode元数据。关键参考: