您好,登录后才能下订单哦!
密码登录
登录注册
点击 登录注册 即表示同意《亿速云用户服务条款》
# 如何搭建Hadoop-Zookeeper环境
## 一、前言
在大数据技术生态中,Hadoop和Zookeeper是两个核心组件。Hadoop作为分布式存储和计算框架,而Zookeeper则提供分布式协调服务。本文将详细介绍如何从零开始搭建Hadoop-Zookeeper集群环境。
### 环境准备
- 3台CentOS 7服务器(建议4核8G内存以上)
- JDK 1.8+
- Hadoop 3.3.4
- Zookeeper 3.7.1
- 服务器间SSH免密登录配置
## 二、基础环境配置
### 1. 修改主机名和hosts文件
```bash
# 在三台服务器上分别执行
hostnamectl set-hostname hadoop01
hostnamectl set-hostname hadoop02
hostnamectl set-hostname hadoop03
# 编辑/etc/hosts文件,添加以下内容
192.168.1.101 hadoop01
192.168.1.102 hadoop02
192.168.1.103 hadoop03
systemctl stop firewalld
systemctl disable firewalld
setenforce 0
sed -i 's/SELINUX=enforcing/SELINUX=disabled/g' /etc/selinux/config
# 下载JDK
wget https://download.oracle.com/java/18/latest/jdk-18_linux-x64_bin.tar.gz
# 解压并配置环境变量
tar -zxvf jdk-18_linux-x64_bin.tar.gz -C /usr/local/
echo 'export JAVA_HOME=/usr/local/jdk-18
export PATH=$JAVA_HOME/bin:$PATH' >> /etc/profile
source /etc/profile
wget https://downloads.apache.org/zookeeper/zookeeper-3.7.1/apache-zookeeper-3.7.1-bin.tar.gz
tar -zxvf apache-zookeeper-3.7.1-bin.tar.gz -C /usr/local/
mv /usr/local/apache-zookeeper-3.7.1-bin /usr/local/zookeeper
# 创建数据和日志目录
mkdir -p /data/zookeeper/{data,logs}
# 复制配置文件模板
cp /usr/local/zookeeper/conf/zoo_sample.cfg /usr/local/zookeeper/conf/zoo.cfg
# 编辑zoo.cfg
vim /usr/local/zookeeper/conf/zoo.cfg
配置文件内容示例:
tickTime=2000
initLimit=10
syncLimit=5
dataDir=/data/zookeeper/data
dataLogDir=/data/zookeeper/logs
clientPort=2181
server.1=hadoop01:2888:3888
server.2=hadoop02:2888:3888
server.3=hadoop03:2888:3888
# 在三台服务器上分别执行
# hadoop01
echo "1" > /data/zookeeper/data/myid
# hadoop02
echo "2" > /data/zookeeper/data/myid
# hadoop03
echo "3" > /data/zookeeper/data/myid
echo 'export ZOOKEEPER_HOME=/usr/local/zookeeper
export PATH=$ZOOKEEPER_HOME/bin:$PATH' >> /etc/profile
source /etc/profile
# 三台服务器分别启动
zkServer.sh start
# 查看状态
zkServer.sh status
正常情况应该显示一个leader和两个follower。
wget https://downloads.apache.org/hadoop/common/hadoop-3.3.4/hadoop-3.3.4.tar.gz
tar -zxvf hadoop-3.3.4.tar.gz -C /usr/local/
mv /usr/local/hadoop-3.3.4 /usr/local/hadoop
echo 'export HADOOP_HOME=/usr/local/hadoop
export PATH=$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$PATH' >> /etc/profile
source /etc/profile
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://hadoop01:9000</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/data/hadoop/tmp</value>
</property>
<property>
<name>ha.zookeeper.quorum</name>
<value>hadoop01:2181,hadoop02:2181,hadoop03:2181</value>
</property>
</configuration>
<configuration>
<property>
<name>dfs.replication</name>
<value>3</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>/data/hadoop/hdfs/name</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>/data/hadoop/hdfs/data</value>
</property>
<property>
<name>dfs.ha.automatic-failover.enabled</name>
<value>true</value>
</property>
</configuration>
<configuration>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.resourcemanager.ha.enabled</name>
<value>true</value>
</property>
<property>
<name>yarn.resourcemanager.cluster-id</name>
<value>yarn-cluster</value>
</property>
<property>
<name>yarn.resourcemanager.ha.rm-ids</name>
<value>rm1,rm2</value>
</property>
<property>
<name>yarn.resourcemanager.hostname.rm1</name>
<value>hadoop01</value>
</property>
<property>
<name>yarn.resourcemanager.hostname.rm2</name>
<value>hadoop02</value>
</property>
<property>
<name>yarn.resourcemanager.zk-address</name>
<value>hadoop01:2181,hadoop02:2181,hadoop03:2181</value>
</property>
</configuration>
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>
hadoop01
hadoop02
hadoop03
mkdir -p /data/hadoop/{tmp,hdfs/{name,data}}
# 在所有节点生成密钥
ssh-keygen -t rsa
# 将公钥拷贝到各节点(包括自己)
ssh-copy-id hadoop01
ssh-copy-id hadoop02
ssh-copy-id hadoop03
scp -r /usr/local/hadoop hadoop02:/usr/local/
scp -r /usr/local/hadoop hadoop03:/usr/local/
scp /etc/profile hadoop02:/etc/
scp /etc/profile hadoop03:/etc/
# 三台服务器分别执行
zkServer.sh start
# 仅在hadoop01执行
hdfs namenode -format
start-dfs.sh
start-yarn.sh
# 检查HDFS
hdfs dfsadmin -report
# 检查YARN
yarn node -list
# 检查Zookeeper
zkCli.sh -server hadoop01:2181 ls /
Zookeeper无法选举Leader
Hadoop节点无法启动
HDFS副本数警告
本文详细介绍了Hadoop-Zookeeper集群环境的搭建过程,包括: 1. 基础环境配置 2. Zookeeper集群安装与配置 3. Hadoop集群安装与配置 4. 集群启动与验证 5. 常见问题解决方法
通过本文的指导,您应该能够成功搭建一个高可用的Hadoop-Zookeeper集群环境,为后续的大数据应用开发奠定基础。
注意:实际生产环境中还需要考虑安全配置、监控告警、备份恢复等更多因素。本文仅提供基础搭建指南。 “`
这篇文章共计约2400字,采用Markdown格式编写,包含了Hadoop-Zookeeper环境搭建的完整流程,从基础环境准备到具体配置步骤,再到验证和问题排查,最后给出总结。内容结构清晰,代码块和配置示例完整,可以直接用于实际环境搭建参考。
免责声明:本站发布的内容(图片、视频和文字)以原创、转载和分享为主,文章观点不代表本网站立场,如果涉及侵权请联系站长邮箱:is@yisu.com进行举报,并提供相关证据,一经查实,将立刻删除涉嫌侵权内容。