Ubuntu环境下Hadoop网络配置指南
Hadoop集群节点需使用静态IP以保证通信稳定性。Ubuntu 18.04及以上版本通过netplan配置静态IP:
sudo cp /etc/netplan/01-netcfg.yaml /etc/netplan/01-netcfg.yaml.bak01-netcfg.yaml),修改为以下内容(替换<your_ip>、<your_gateway>、<your_dns>及ens33为实际值):network:
  version: 2
  ethernets:
    ens33:
      addresses: ["<your_ip>/24"]
      gateway4: "<your_gateway>"
      nameservers:
        addresses: ["<your_dns>"]
sudo netplan applyip a确保节点间通过主机名识别,避免依赖动态DNS:
ubuntu改为namenode):sudo hostnamectl set-hostname namenode/etc/hosts文件,添加所有节点的IP与主机名映射(所有节点需一致):192.168.1.100 namenode
192.168.1.101 datanode1
192.168.1.102 datanode2
实现节点间免密通信(如NameNode向DataNode分发任务):
ssh-keygen -t rsa(直接回车使用默认路径和空密码)ssh-copy-id user@datanode1、ssh-copy-id user@datanode2(替换user为实际用户名)ssh datanode1(无需输入密码即可登录)调整Hadoop配置以适配网络环境,主要修改以下文件(位于$HADOOP_HOME/etc/hadoop/):
<configuration>
  <property>
    <name>fs.defaultFS</name>
    <value>hdfs://namenode:9000</value>
  </property>
</configuration>
<configuration>
  <property>
    <name>dfs.replication</name>
    <value>3</value>
  </property>
  <property>
    <name>dfs.namenode.name.dir</name>
    <value>/path/to/namenode/dir</value>
  </property>
  <property>
    <name>dfs.datanode.data.dir</name>
    <value>/path/to/datanode/dir</value>
  </property>
</configuration>
<configuration>
  <property>
    <name>yarn.resourcemanager.hostname</name>
    <value>resourcemanager</value>
  </property>
  <property>
    <name>yarn.nodemanager.aux-services</name>
    <value>mapreduce_shuffle</value>
  </property>
</configuration>
<configuration>
  <property>
    <name>mapreduce.framework.name</name>
    <value>yarn</value>
  </property>
</configuration>
允许Hadoop所需端口通信,避免网络阻断:
sudo ufw allow 9000/tcp
sudo ufw allow 50010/tcp
sudo ufw allow 8088/tcp
sudo ufw enable
setenforce 0)或永久关闭(编辑/etc/selinux/config,将SELINUX=enforcing改为SELINUX=disabled)hdfs namenode -format(注意:此操作会清除NameNode原有数据)start-dfs.sh  # 启动HDFS(NameNode、DataNode)
start-yarn.sh  # 启动YARN(ResourceManager、NodeManager)
hdfs dfsadmin -reportyarn node -list