ubuntu

Ubuntu Hadoop 网络配置指南

小樊
48
2025-09-21 10:40:37
栏目: 智能运维

Ubuntu Hadoop Network Configuration Guide

Configuring a reliable network is critical for a Hadoop cluster to ensure seamless communication between nodes (NameNode, DataNodes, ResourceManager, NodeManagers). Below is a structured guide tailored for Ubuntu systems, covering static IP setup, hostname configuration, hosts file modification, SSH setup, and essential Hadoop network configurations.

1. Set Static IP Addresses

Hadoop requires stable IP addresses for cluster nodes. Use Netplan (Ubuntu’s default network configuration tool) to configure static IPs.

Steps:

  1. Identify your network interface: Run ip a to list all interfaces (e.g., ens33).
  2. Edit the Netplan configuration file:
    Open /etc/netplan/01-netcfg.yaml (filename may vary) in a text editor (e.g., sudo nano /etc/netplan/01-netcfg.yaml).
  3. Configure static IP settings:
    Replace ens33, 192.168.1.100, 255.255.255.0, 192.168.1.1, and 8.8.8.8 with your interface name, desired IP, subnet mask, gateway, and DNS server respectively.
    network:
      version: 2
      renderer: networkd
      ethernets:
        ens33:
          dhcp4: no
          addresses: [192.168.1.100/24]
          gateway4: 192.168.1.1
          nameservers:
            addresses: [8.8.8.8, 8.8.4.4]
    
  4. Apply changes: Run sudo netplan apply to activate the new configuration.
  5. Verify: Use ip a to confirm the static IP is assigned.

2. Configure Hostnames

Each node in the cluster should have a unique hostname (e.g., namenode, datanode1). This helps identify nodes in logs and commands.

Steps:

  1. Set the hostname:
    Run sudo hostnamectl set-hostname <your_hostname> (e.g., sudo hostnamectl set-hostname namenode).
  2. Update the hostname file:
    Edit /etc/hostname and replace the existing content with <your_hostname>.
  3. Reboot (optional): Run sudo reboot to apply the hostname change.

3. Modify the /etc/hosts File

The /etc/hosts file maps IP addresses to hostnames, enabling name-based communication between nodes (avoids relying on DNS).

Steps:

  1. Edit the hosts file:
    Open /etc/hosts in a text editor (e.g., sudo nano /etc/hosts).
  2. Add node mappings:
    Include entries for all cluster nodes (replace IPs and hostnames with your setup). For example:
    192.168.1.100 namenode
    192.168.1.101 datanode1
    192.168.1.102 datanode2
    
  3. Save and exit: Propagate the file to all nodes using scp (e.g., scp /etc/hosts user@datanode1:/etc/hosts).

4. Configure SSH for Passwordless Login

Hadoop requires secure, passwordless communication between nodes (e.g., for NameNode to manage DataNodes). Use SSH keys to achieve this.

Steps:

  1. Generate an SSH key pair:
    On the NameNode (or each node), run ssh-keygen -t rsa. Press Enter to accept default paths and skip passphrase (for automation).
  2. Copy the public key to other nodes:
    Use ssh-copy-id user@remote_node_ip (e.g., ssh-copy-id user@datanode1) to add the public key to the ~/.ssh/authorized_keys file of each remote node.
  3. Test passwordless login:
    Run ssh user@remote_node_ip (e.g., ssh user@datanode1). You should log in without entering a password.

5. Configure Hadoop Core Network Files

Hadoop’s configuration files define how it interacts with the network. Key files include core-site.xml, hdfs-site.xml, and mapred-site.xml.

Steps:

  1. core-site.xml:
    Set the default file system to point to the NameNode’s IP and port (default: 9000).
    <configuration>
      <property>
        <name>fs.defaultFS</name>
        <value>hdfs://namenode:9000</value> <!-- Replace "namenode" with the NameNode's hostname -->
      </property>
    </configuration>
    
  2. hdfs-site.xml:
    Configure HDFS replication (set to 3 for production) and data directories.
    <configuration>
      <property>
        <name>dfs.replication</name>
        <value>3</value> <!-- Number of replicas for each data block -->
      </property>
      <property>
        <name>dfs.namenode.name.dir</name>
        <value>/path/to/namenode/dir</value> <!-- Local directory for NameNode metadata -->
      </property>
      <property>
        <name>dfs.datanode.data.dir</name>
        <value>/path/to/datanode/dir</value> <!-- Local directory for DataNode data storage -->
      </property>
    </configuration>
    
  3. mapred-site.xml:
    Set the MapReduce framework to use YARN (recommended for Hadoop 2.x+).
    <configuration>
      <property>
        <name>mapreduce.framework.name</name>
        <value>yarn</value>
      </property>
    </configuration>
    
  4. yarn-site.xml:
    Configure the ResourceManager (YARN’s master) hostname and resource allocation.
    <configuration>
      <property>
        <name>yarn.resourcemanager.hostname</name>
        <value>resourcemanager</value> <!-- Replace with ResourceManager's hostname -->
      </property>
      <property>
        <name>yarn.nodemanager.resource.memory-mb</name>
        <value>4096</value> <!-- Memory allocated to each NodeManager (in MB) -->
      </property>
    </configuration>
    
  5. Environment variables:
    Add Hadoop and Java paths to ~/.bashrc (or /etc/profile for system-wide access):
    export HADOOP_HOME=/usr/local/hadoop <!-- Replace with your Hadoop installation path -->
    export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin
    export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64 <!-- Replace with your Java path -->
    
    Apply changes with source ~/.bashrc.

6. Optional: Disable Firewall and SELinux

Firewalls and SELinux can block Hadoop’s network ports (e.g., 9000 for HDFS, 8088 for YARN). Disable them to avoid connectivity issues.

Steps:

  1. Stop and disable the firewall:
    sudo systemctl stop ufw
    sudo systemctl disable ufw
    
  2. Disable SELinux (if enabled):
    Edit /etc/selinux/config and set SELINUX=disabled. Reboot the system to apply changes.

7. Verify Network Configuration

After completing the above steps, verify that all nodes can communicate and Hadoop services start correctly.

Verification Steps:

  1. Check node connectivity:
    From the NameNode, run ping datanode1 (replace with a DataNode’s hostname/IP). Ensure there is no packet loss.
  2. List HDFS nodes:
    Run hdfs dfsadmin -report on the NameNode. You should see all DataNodes listed.
  3. Check YARN nodes:
    Run yarn node -list on the ResourceManager. You should see all NodeManagers listed.

By following these steps, you’ll establish a robust network foundation for your Ubuntu-based Hadoop cluster, ensuring reliable communication between nodes and optimal performance for distributed data processing.

0
看了该问题的人还看了