Debian 上搭建与配置 HDFS 的简明步骤
一 环境准备与安装
sudo apt update && sudo apt install -y openjdk-11-jdkjava -version、javac -versionsudo adduser --disabled-password --gecos "" hadoop && sudo usermod -aG sudo hadoopwget https://downloads.apache.org/hadoop/core/hadoop-3.3.15/hadoop-3.3.15.tar.gz
sudo tar -xzvf hadoop-3.3.15.tar.gz -C /usr/local
sudo mv /usr/local/hadoop-3.3.15 /usr/local/hadoop
sudo chown -R hadoop:hadoop /usr/local/hadoop
~/.bashrc 或 /etc/profile:export HADOOP_HOME=/usr/local/hadoop
export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin
source ~/.bashrc/etc/hosts 为各节点添加主机名映射(如:192.168.1.10 namenode,192.168.1.11 datanode1)ssh-keygen -t rsa -b 2048 -N "" -f ~/.ssh/id_rsa
ssh-copy-id hadoop@datanode1
sudo apt install -y ntp && sudo timedatectl set-ntp true二 核心配置
$HADOOP_HOME/etc/hadoop/hadoop-env.shexport JAVA_HOME=/usr/lib/jvm/java-11-openjdk-amd64
core-site.xml<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://namenode:8020</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/usr/local/hadoop/tmp</value>
</property>
</configuration>
hdfs-site.xml<configuration>
<property>
<name>dfs.replication</name>
<value>3</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:/usr/local/hadoop/hdfs/namenode</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:/usr/local/hadoop/hdfs/datanode</value>
</property>
<property>
<name>dfs.blocksize</name>
<value>134217728</value> <!-- 128MB -->
</property>
</configuration>
sudo mkdir -p /usr/local/hadoop/hdfs/{namenode,datanode}
sudo chown -R hadoop:hadoop /usr/local/hadoop
fs.defaultFS 指定默认文件系统地址(RPC 端口通常为 8020)。dfs.replication 为副本数;单机或测试可设为 1。dfs.blocksize 为块大小,默认 128MB。三 启动与验证
hdfs namenode -format$HADOOP_HOME/sbin/start-dfs.shjps(应看到 NameNode、DataNode,如配置了 Secondary 还会看到 SecondaryNameNode)hdfs dfsadmin -reporthdfs dfs -ls /
hdfs dfs -mkdir -p /user/hadoop/input
echo "Hello, HDFS" > test.txt
hdfs dfs -put test.txt /user/hadoop/input/
hdfs dfs -cat /user/hadoop/input/test.txt
hadoop-env.sh 中 JAVA_HOME/etc/hosts、SSH 免密、防火墙策略四 常见问题与优化
dfs.replication 建议与 DataNode 数量匹配;单机测试可设为 1dfs.blocksize(如 256MB/512MB),减少小文件数量hadoop.tmp.dir、HADOOP_LOG_DIR 到容量充足的磁盘分区,便于排查与扩容五 可选扩展 高可用 HA
hdfs-site.xml 中)
dfs.nameservices=ns1dfs.ha.namenodes.ns1=nn1,nn2dfs.namenode.rpc-address.ns1.nnX、dfs.namenode.http-address.ns1.nnXdfs.namenode.shared.edits.dir=qjournal://nn1:8485;nn2:8485;jn1:8485/ns1dfs.client.failover.proxy.provider.ns1=org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProviderdfs.ha.fencing.methods=sshfence,并配置 dfs.ha.fencing.ssh.private-key-filesdfs.ha.automatic-failover.enabled=true