CentOS 上 Kubernetes 部署常见问题与排查清单
一 环境准备与系统要求
systemctl stop firewalld && systemctl disable firewalldfirewall-cmd --permanent --zone=trusted --add-port=6443/tcp 等,随后 firewall-cmd --reloadsetenforce 0 并修改 /etc/selinux/config 为 SELINUX=permissive/disabledswapoff -a 并注释 /etc/fstab 中的 swap 行hostnamectl set-hostname master,并在 /etc/hosts 维护节点解析二 组件安装与初始化关键点
yum install -y docker-ce docker-ce-cli containerd.io && systemctl enable --now dockeryum install -y kubelet kubeadm kubectl && systemctl enable --now kubelet/etc/docker/daemon.json 中加入:{"exec-opts": ["native.cgroupdriver=systemd"]},随后 systemctl restart docker/etc/sysctl.d/k8s.conf 写入:
net.bridge.bridge-nf-call-iptables = 1net.bridge.bridge-nf-call-ip6tables = 1net.ipv4.ip_forward = 1sysctl -p 生效kubeadm init --image-repository=registry.aliyuncs.com/google_containers --pod-network-cidr=10.244.0.0/16mkdir -p $HOME/.kube && cp -i /etc/kubernetes/admin.conf $HOME/.kube/config && chown $(id -u):$(id -g) $HOME/.kube/configkubectl apply -f https://raw.githubusercontent.com/coreos/flannel/master/Documentation/kube-flannel.ymlkubectl apply -f https://docs.projectcalico.org/manifests/calico.yamlkubeadm token create --print-join-commandkubeadm join ...三 高频故障与快速修复
ImagePullBackOff/ErrImagePullregistry.aliyuncs.com/google_containers)或手动 docker pull/tag 所需镜像;初始化时通过 --image-repository 指定仓库kubectl get nodes 显示 NotReadykubectl get pods -n kube-system);检查内核参数与转发是否生效;确认节点间网络互通kubeadm join 失败或超时kubeadm reset 后重试systemctl status kubelet 报错或不断重启journalctl -u kubelet -f 查看具体错误;常见为 cgroup 驱动不一致、镜像缺失、Swap 未关闭、内核参数未生效等,逐项修正后 systemctl restart kubelet四 常用命令与验证步骤
kubectl get nodes 与 kubectl get pods -Ajournalctl -u kubelet -f、journalctl -u docker -fkubeadm reset(清理后重新 kubeadm init 与 kubeadm join)kubectl run nginx --image=nginx --port=80 --restart=Neverkubectl expose deployment nginx --type=NodePort --port=80NodeIP:NodePort 验证