Debian下Hadoop资源管理实践指南
一 核心概念与总体架构
二 环境准备与基础配置
sudo apt update && sudo apt install openjdk-11-jdksudo apt install openjdk-8-jdkwget https://downloads.apache.org/hadoop/common/hadoop-3.3.6/hadoop-3.3.6.tar.gztar -xzvf hadoop-3.3.6.tar.gz -C /usr/local/ && sudo mv /usr/local/hadoop-3.3.6 /usr/local/hadoop~/.bashrc并source ~/.bashrc):
export HADOOP_HOME=/usr/local/hadoopexport PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin/etc/hosts映射各节点IP与主机名ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsacat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys && chmod 600 ~/.ssh/authorized_keys$HADOOP_HOME/etc/hadoop/):
fs.defaultFS=hdfs://namenode:9000mapreduce.framework.name=yarnyarn.nodemanager.aux-services=mapreduce_shuffle 与对应类。三 YARN资源配置与调优
yarn.nodemanager.resource.memory-mb=28672(预留系统/容器开销,通常留出2–4GB)yarn.nodemanager.resource.cpu-vcores=7(保留1核给系统)yarn.scheduler.minimum-allocation-mb=2048、yarn.scheduler.maximum-allocation-mb=28672yarn.scheduler.minimum-allocation-vcores=1、yarn.scheduler.maximum-allocation-vcores=7mapreduce.map.memory.mb=4096、mapreduce.reduce.memory.mb=8192mapreduce.map.java.opts=-Xmx3072m、mapreduce.reduce.java.opts=-Xmx6144myarn.app.mapreduce.am.resource.mb=4096、yarn.app.mapreduce.am.command-opts=-Xmx3072mcapacity-scheduler.xml中定义队列层级、容量百分比、最大容量、ACL与抢占等,实现多租户与资源保障。fair-scheduler.xml中定义队列、权重、最小/最大资源、抢占与SLA,实现作业间公平共享。yarn.nodemanager.local-dirs(多磁盘逗号分隔,提升I/O)yarn.nodemanager.aux-services=mapreduce_shuffle与类org.apache.hadoop.mapred.ShuffleHandler已配置mapreduce.job.ubertask.enable=true(小作业合并执行,降低调度开销)yarn rmadmin -refreshQueues刷新队列配置(Capacity/Fair)四 运行监控与日常运维
jps查看进程:应存在ResourceManager、NodeManager、NameNode、DataNode(以及JobHistoryServer若启用)yarn queue -status <queue_name> 查看队列容量与用量yarn application -list -appStates ALL 查看应用状态yarn node -list 查看节点资源与已分配/可用资源yarn rmadmin -refreshNodes与hdfs dfsadmin -refreshNodeshdfs balancer)五 快速验证与示例配置
hdfs namenode -formatstart-dfs.sh、start-yarn.shhadoop jar $HADOOP_HOME/share/hadoop/mapreduce/hadoop-mapreduce-examples-*.jar pi 2 1000<configuration>
<property><name>yarn.nodemanager.aux-services</name><value>mapreduce_shuffle</value></property><property><name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name><value>org.apache.hadoop.mapred.ShuffleHandler</value></property><property><name>yarn.resourcemanager.hostname</name><value>resourcemanager-host</value></property></configuration>