CentOS 上安装 Kubernetes 报错的快速排查与修复
一、先定位错误
sudo journalctl -f -u kubelet。kubeadm init --config=kubeadm.yaml --ignore-preflight-errors=SystemVerification 仅跳过系统校验以拿到更详细日志(排障后请去掉该参数再正式安装)。kubectl get nodes(能执行时)、kubectl get pods -A、top/vmstat/df -h 检查资源与磁盘。systemctl status kubelet、systemctl status containerd/docker。二、基础环境与内核参数
swapoff -a && sed -i '/swap/s/^/#/' /etc/fstab(K8s 要求禁用 Swap)。setenforce 0、systemctl stop firewalld;生产环境建议按需放行端口或接口而非全关。cat > /etc/sysctl.d/k8s.conf <<EOF
net.bridge.bridge-nf-call-ip6tables = 1
net.bridge.bridge-nf-call-iptables = 1
net.ipv4.ip_forward = 1
EOF
sysctl -p
hostnamectl set-hostname master 或 nodeX;yum install -y ntpdate && ntpdate time.windows.com。三、容器运行时与镜像拉取
yum install -y containerd.io && systemctl enable --now containerd。cat > /etc/docker/daemon.json <<EOF
{
"exec-opts": ["native.cgroupdriver=systemd"],
"registry-mirrors": ["https://mirrors.aliyun.com/dockerhub"]
}
EOF
systemctl daemon-reload && systemctl restart docker
kubeadm init --image-repository=registry.aliyuncs.com/google_containers --pod-network-cidr=10.244.0.0/16
# 若个别镜像仍失败,可手动 docker pull/tag,例如 coredns:1.8.0
systemctl status containerd && sudo systemctl start containerd && sudo systemctl enable containerd。四、网络与防火墙关键点
firewall-cmd --permanent --zone=trusted --add-interface=docker0
firewall-cmd --permanent --zone=trusted --add-port=6443/tcp
firewall-cmd --permanent --zone=trusted --add-port=10250/tcp
firewall-cmd --permanent --zone=trusted --add-port=10251/tcp
firewall-cmd --permanent --zone=trusted --add-port=10252/tcp
firewall-cmd --reload
kubectl apply -f https://raw.githubusercontent.com/coreos/flannel/master/Documentation/kube-flannel.ymlkubectl apply -f https://docs.projectcalico.org/manifests/calico.yaml--pod-network-cidr=10.244.0.0/16(Flannel 常用),或按 Calico 文档配置。五、常见报错对照与修复
| 报错关键词 | 典型原因 | 快速修复 |
|---|---|---|
| CRI v1 runtime API is not implemented … containerd.sock | containerd 未运行或未就绪 | systemctl start containerd && systemctl enable containerd |
| containerd: failed to start | 配置/权限/内核未就绪 | 检查 /etc/containerd/config.toml 与内核模块(br_netfilter/overlay),重启服务 |
| kubelet 不断重启/不健康 | cgroup 驱动不一致(Docker 与 kubelet 不同) | Docker 配置 "exec-opts": ["native.cgroupdriver=systemd"] 并重启 |
| 镜像拉取失败(gcr.io/google_containers/…) | 国内网络不通 | --image-repository=registry.aliyuncs.com/google_containers 或手动拉取并重打标签 |
| Node NotReady | CNI 未安装/网络未通 | 安装 Flannel/Calico,确认节点间互通与路由 |
| Port 6443/10250 被占用 | 已有进程占用或防火墙拦截 | `ss -lntp |
| MountVolume.SetUp failed | 内核/驱动/权限问题 | 升级内核、检查挂载与权限、确认 SELinux/AppArmor 策略 |
| kubeadm init 预检失败 | Swap 未关/内核参数缺失/防火墙阻断 | 关闭 Swap、设置内核参数、放行端口后重试 |
kubeadm reset -f,确认环境干净后重新 kubeadm init。journalctl -xeu kubelet、kubeadm init 输出),便于进一步定位。