linux

如何监控Zookeeper的运行状态

小樊
47
2025-11-22 22:02:47
栏目: 大数据

监控 Zookeeper 运行状态的实用方案

一 快速健康检查与命令行

二 指标采集与可视化

三 JMX 深度监控

四 关键指标与告警建议

五 一键健康检查脚本示例

#!/usr/bin/env bash
set -euo pipefail

HOST="${1:-localhost}"
PORT="${2:-2181}"
TIMEOUT=5

# 四字命令封装
zk_cmd() {
  local cmd="$1"
  echo "$cmd" | nc -w "$TIMEOUT" "$HOST" "$PORT" 2>/dev/null
}

# 1) 存活探测
if [[ "$(zk_cmd ruok)" != "imok" ]]; then
  echo "CRITICAL: ruok not ok from $HOST:$PORT"
  exit 2
fi

# 2) 角色检查
ROLE=$(zk_cmd stat 2>/dev/null | awk '/Mode:/ {print $2}')
if [[ -z "$ROLE" ]]; then
  echo "CRITICAL: Cannot get role from $HOST:$PORT"
  exit 2
fi
echo "OK: Role=$ROLE"

# 3) 关键指标阈值示例:平均/最大延迟
read -r _ _ _ _ _ _ _ AVG_LATENCY MAX_LATENCY < <(zk_cmd mntr 2>/dev/null | egrep '^avgLatency|^maxLatency')
if [[ -n "$AVG_LATENCY" && "$AVG_LATENCY" -gt 100 ]]; then
  echo "WARNING: avgLatency=$AVG_LATENCY ms > 100 ms"
  exit 1
fi
if [[ -n "$MAX_LATENCY" && "$MAX_LATENCY" -gt 1000 ]]; then
  echo "CRITICAL: maxLatency=$MAX_LATENCY ms > 1000 ms"
  exit 2
fi

echo "OK: Zookeeper $HOST:$PORT healthy"
exit 0

0
看了该问题的人还看了