centos

CentOS HDFS部署指南

小樊
50
2025-10-11 10:51:56
栏目: 智能运维

CentOS HDFS Deployment Guide

This guide provides a step-by-step approach to deploying HDFS (Hadoop Distributed File System) on CentOS, covering both standalone and cluster setups. Follow these steps to set up a robust distributed file system.

Prerequisites

Before starting, ensure the following requirements are met:

Step 1: Install Java

Hadoop depends on Java. Install OpenJDK 8 using yum:

sudo yum install -y java-1.8.0-openjdk-devel

Verify installation:

java -version

Ensure the output shows Java 1.8.0.

Step 2: Download and Extract Hadoop

Download the latest stable Hadoop release from the Apache website. For example, to download Hadoop 3.3.4:

wget https://downloads.apache.org/hadoop/common/hadoop-3.3.4/hadoop-3.3.4.tar.gz

Extract the tarball to /usr/local and rename the directory for simplicity:

sudo tar -xzvf hadoop-3.3.4.tar.gz -C /usr/local/
sudo mv /usr/local/hadoop-3.3.4 /usr/local/hadoop

Step 3: Configure Hadoop Environment Variables

Set up environment variables to make Hadoop commands accessible globally. Create a new file /etc/profile.d/hadoop.sh:

sudo nano /etc/profile.d/hadoop.sh

Add the following lines (adjust paths if Hadoop is installed elsewhere):

export JAVA_HOME=/usr/lib/jvm/java-1.8.0-openjdk
export HADOOP_HOME=/usr/local/hadoop
export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin

Make the file executable and apply changes:

sudo chmod +x /etc/profile.d/hadoop.sh
source /etc/profile.d/hadoop.sh

Verify Hadoop installation:

hadoop version

Step 4: Configure HDFS Core Files

Edit Hadoop configuration files in $HADOOP_HOME/etc/hadoop to define HDFS behavior.

4.1 core-site.xml

This file configures the default file system and NameNode address. Replace namenode with your NameNode’s hostname:

<configuration>
    <property>
        <name>fs.defaultFS</name>
        <value>hdfs://namenode:9000</value>
    </property>
    <property>
        <name>hadoop.tmp.dir</name>
        <value>/usr/local/hadoop/tmp</value>
    </property>
</configuration>

4.2 hdfs-site.xml

This file sets HDFS-specific parameters like replication factor and data directories. Create directories for NameNode and DataNode data:

sudo mkdir -p /usr/local/hadoop/data/namenode
sudo mkdir -p /usr/local/hadoop/data/datanode
sudo chown -R $(whoami):$(whoami) /usr/local/hadoop/data

Add the following configurations to hdfs-site.xml:

<configuration>
    <property>
        <name>dfs.replication</name>
        <value>3</value> <!-- Adjust based on your cluster size (e.g., 1 for standalone) -->
    </property>
    <property>
        <name>dfs.namenode.name.dir</name>
        <value>/usr/local/hadoop/data/namenode</value>
    </property>
    <property>
        <name>dfs.datanode.data.dir</name>
        <value>/usr/local/hadoop/data/datanode</value>
    </property>
    <property>
        <name>dfs.permissions.enabled</name>
        <value>false</value> <!-- Disable permissions for testing (enable in production) -->
    </property>
</configuration>

Optional: mapred-site.xml and yarn-site.xml

If using YARN for resource management, configure these files:

Step 5: Format the NameNode

The NameNode must be formatted before first use to initialize its storage. Run this command on the NameNode:

hdfs namenode -format

Follow the prompts to complete formatting. This step creates the necessary directory structure and metadata files.

Step 6: Start HDFS

Start the HDFS services using the start-dfs.sh script (run from the NameNode):

start-dfs.sh

Check the status of HDFS daemons with:

jps

You should see NameNode, DataNode, and SecondaryNameNode processes running.

Step 7: Verify HDFS

Confirm HDFS is operational by:

  1. Web UI: Open a browser and navigate to http://<namenode-ip>:50070 (replace <namenode-ip> with your NameNode’s IP). You should see the HDFS dashboard with cluster information.
  2. Command Line: List the root directory to verify HDFS is accessible:
    hdfs dfs -ls /
    

Step 8: Stop HDFS (Optional)

To stop HDFS services, run:

stop-dfs.sh

Troubleshooting Tips

By following these steps, you’ll have a fully functional HDFS deployment on CentOS, ready to store and manage large datasets in a distributed environment.

0
看了该问题的人还看了