在CentOS环境中部署Kubernetes集群后,日志管理需覆盖收集→存储→查看/分析→轮转/清理→监控告警全流程,以下是具体方法与最佳实践:
根据集群规模、资源预算及需求复杂度,选择合适的日志收集方案:
EFK Stack(Elasticsearch+Fluentd+Kibana)
/var/log/containers/*.log)、系统(/var/log/kubelet.log)等日志并转发至Elasticsearch;Elasticsearch负责日志存储与索引;Kibana提供可视化 dashboard(如错误日志趋势、Pod日志量排名)。Loki+Promtail+Grafana
namespace=prod、pod_name=payment-service)。Filebeat+Elasticsearch+Kibana(简化版ELK)
/var/log/containers/*.log并转发至Elasticsearch,Kibana实现可视化。以EFK为例,详细说明部署流程:
部署Elasticsearch
-Xms512m -Xmx512m),防止OOM。apiVersion: apps/v1
kind: StatefulSet
metadata:
name: elasticsearch
spec:
serviceName: "elasticsearch"
replicas: 1
template:
spec:
containers:
- name: elasticsearch
image: docker.elastic.co/elasticsearch/elasticsearch:8.12.0
env:
- name: discovery.type
value: "single-node"
- name: ES_JAVA_OPTS
value: "-Xms512m -Xmx512m"
volumeMounts:
- name: elasticsearch-data
mountPath: /usr/share/elasticsearch/data
volumeClaimTemplates:
- metadata:
name: elasticsearch-data
spec:
accessModes: ["ReadWriteOnce"]
storageClassName: nfs-client # 替换为实际存储类
resources:
requests:
storage: 10Gi
部署Fluentd(DaemonSet)
/var/log/containers/*.log日志并转发至Elasticsearch。fluentd.conf):<source>
@type tail
path /var/log/containers/*.log
pos_file /var/log/fluentd-containers.log.pos
tag kubernetes.*
format json
read_from_head true
</source>
<match kubernetes.**>
@type elasticsearch
host elasticsearch.default.svc.cluster.local
port 9200
index_name kubernetes-logs
include_timestamp true
</match>
部署Kibana
elasticsearch.hosts: ["http://elasticsearch:9200"]),启动后通过http://<Kibana-Service-IP>:5601访问。kubernetes-logs-*),即可通过Discover页面查看实时日志。命令行工具
kubectl logs:基础日志查看命令,支持-f(实时跟踪)、--tail(指定行数)、-c(多容器Pod中指定容器)等参数。kubectl logs -f payment-service-abcde -n prod -c main(实时查看prod命名空间下payment-service的main容器日志)。可视化工具
kubernetes.namespace: "prod" AND log_level: "ERROR")。{namespace="prod", pod_name="payment-service"}),支持表格、折线图等展示。第三方工具
kubetail payment-service -n prod),方便查看同一服务的多个实例日志。stern 'payment-service.*' -n prod),适用于动态生成的Pod名称。存储选型
日志轮转(防止磁盘爆满)
logrotate配置容器日志轮转(/var/log/containers/*.log),示例配置(/etc/logrotate.d/kubernetes-containers):/var/lib/docker/containers/*/*.log {
daily # 每天轮转
rotate 7 # 保留7天
compress # 压缩旧日志
delaycompress # 延迟压缩(避免压缩当天日志)
missingok # 文件不存在不报错
notifempty # 空文件不轮转
copytruncate # 复制后截断原文件(不影响正在写入的日志)
}
结合Prometheus+Alertmanager实现日志异常实时告警:
elasticsearch_indices_indexing_slowlog_total,索引慢日志数)、Loki(如loki_dropped_chunks_total,丢弃的日志块数)的日志相关指标。groups:
- name: k8s-log-alerts
rules:
- alert: HighErrorLogs
expr: rate(elasticsearch_indices_indexing_slowlog_total[5m]) > 100 # 5分钟内错误日志数超过100条
for: 5m
labels:
severity: critical
annotations:
summary: "K8s集群错误日志过多 (instance {{ $labels.instance }})"
description: "5分钟内错误日志数超过100条,需立即排查"
kubernetes.labels.app: "payment-service")。resources.limits.memory: "500Mi"),避免占用过多节点资源。通过以上方法,可在CentOS+Kubernetes环境中构建高效、可靠的日志管理体系,满足故障排查、监控告警、合规审计等需求。