在Debian上配置Kafka的监控和报警可以通过多种工具和方法实现。以下是一些常用的方法和工具:
kafka_exporter
实例,每个实例监控一个 Kafka broker。version: '3.1'
services:
kafka-exporter-opslogs:
image: bitnami/kafka-exporter:latest
command: '--kafka.server10.2.19.43:9092 --kafka.server10.2.24.62:9092 --kafka.server10.5.98.190:9092 --kafka.version3.2.1'
restart: always
ports:
- 9310:9308
kafka-exporter-prod:
image: bitnami/kafka-exporter:latest
command: '--kafka.server192.168.53.99:9092 --kafka.server192.168.53.53:9092 --kafka.server192.168.53.96:9092'
restart: always
ports:
- 9311:9308
kafka_exporter
的 job,确保每个 kafka_exporter
实例都有唯一的 name 标签。job_name: 'kafka-exporter'
metrics_path: /metrics
scrape_interval: 15s
scrape_timeout: 10s
static_configs:
- targets:
- 10.0.0.26:9310
labels:
name: kafka-opslogs
- targets:
- 10.0.0.26:9311
labels:
name: kafka-prod
sudo apt-get update
sudo apt-get install grafana
kafka-server-start.sh
脚本,增加 JMX 端口配置。export JMX_PORT="9999"
jconsole
# 修改 Kafka-run-class.sh 文件,增加 JMX Server 配置
-Djava.rmi.server.hostname=LAPTOP-3B77RHGG3
为了及时发现和处理问题,可以配置 Prometheus 告警规则。例如:
alert: KAFKA_brokers_exception
expr: kafka_broker_info ! 1 for: 2m
labels:
severity: critical
annotations:
description: "当前 brokers 异常:"
alert: kafka_message_backpressure
expr: sum(kafka_consumergroup_lag_sum{job="kafka-exporter"}) by (name, consumergroup, topic) > 5000 for: 2m
labels:
severity: critical
annotations:
description: "【环境】 【消费组】{{ labels.consumergroup }}【topic】{{ labels.topic }}【积压】:{{ value }} printf "%.2f" "
通过以上方法和工具,你可以在 Debian 上有效地监控 Kafka 集群,确保其稳定运行。