Ubuntu上Kubernetes监控实操指南
一 监控架构与组件选型
二 快速落地步骤 Ubuntu 20.04/22.04
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm repo update
kubectl create ns monitoring
helm install prometheus prometheus-community/kube-prometheus-stack \
--namespace monitoring \
--set grafana.adminPassword=YourStrongPassw0rd
kubectl get pods -n monitoring
kubectl get svc -n monitoring
kubectl port-forward -n monitoring svc/prometheus-k8s 9090:9090
kubectl port-forward -n monitoring svc/grafana 3000:3000
kubectl port-forward -n monitoring svc/alertmanager-main 9093:9093
kubectl patch svc -n monitoring prometheus-k8s --type='json' \
-p='[{"op":"replace","path":"/spec/type","value":"NodePort"}]'
kubectl patch svc -n monitoring grafana --type='json' \
-p='[{"op":"replace","path":"/spec/type","value":"NodePort"}]'
kubectl patch svc -n monitoring alertmanager-main --type='json' \
-p='[{"op":"replace","path":"/spec/type","value":"NodePort"}]'
三 关键配置与验证
- job_name: 'kubernetes-nodes'
kubernetes_sd_configs:
- role: node
- job_name: 'kubernetes-pods'
kubernetes_sd_configs:
- role: pod
groups:
- name: node.rules
rules:
- alert: HighNodeLoad1
expr: node_load1 > 0.7
for: 5m
labels:
severity: warning
annotations:
summary: "High 1m load on {{ $labels.instance }}"
description: "1m load is above 0.7 (current: {{ $value }})"
四 常见问题与排错要点
kubectl describe pod <pod> -n monitoring;确认镜像可达与节点资源充足。