Debian下GitLab监控与告警设置
一 架构与端口规划
- 组件与用途
- GitLab Omnibus:内置 Prometheus 抓取端点 与 自监控项目,提供 GitLab 自身与节点指标。
- Prometheus:采集 GitLab 与节点指标,执行告警规则。
- Alertmanager:去重、分组并发送通知(邮件、Slack、Webhook 等)。
- Grafana:可视化仪表盘与可选告警。
- 常用端口
- GitLab 内置 Prometheus:9090
- Alertmanager:9093
- Grafana:3000
- 建议:将抓取、告警、可视化分别部署在不同主机或容器,避免单点故障与端口冲突。
二 启用 GitLab 内置监控与自监控
- 启用监控组件(Omnibus 配置)
- 编辑 /etc/gitlab/gitlab.rb,确保以下配置存在并取消注释:
- gitlab_rails[‘monitoring_whitelist’] = [‘127.0.0.1’, ‘::1’, ‘你的Prometheus网段’] # 允许 Prometheus 拉取
- prometheus[‘enable’] = true
- node_exporter[‘enable’] = true
- redis_exporter[‘enable’] = true
- postgres_exporter[‘enable’] = true
- gitlab_exporter[‘enable’] = true
- 应用配置并重启:
- sudo gitlab-ctl reconfigure
- sudo gitlab-ctl restart
- 验证内置指标
- 访问 http:///metrics(GitLab Rails 指标)与 http://:9090/metrics(内置 Prometheus 指标),应返回大量指标文本。
- 打开自监控项目
- 管理员进入 Admin Area → Monitoring → Metrics and profiling → Self monitoring,启用并访问自监控项目仪表盘,查看 CPU、内存、请求延迟 等实例健康指标。
三 部署 Prometheus 与 Alertmanager
- 安装 Prometheus(Debian 包或二进制)
- 示例(二进制,便于版本可控):
- wget https://github.com/prometheus/prometheus/releases/download/v2.53.0/prometheus-2.53.0.linux-amd64.tar.gz
- tar xvf prometheus-2.53.0.linux-amd64.tar.gz
- cd prometheus-2.53.0.linux-amd64
- 配置 Prometheus(prometheus.yml 关键片段)
- global:
scrape_interval: 15s
- scrape_configs:
- job_name: ‘gitlab-omnibus’
static_configs:
- job_name: ‘node’
static_configs:
- targets: [‘:9100’] # node_exporter
- job_name: ‘redis’
static_configs:
- targets: [‘:9121’] # redis_exporter
- job_name: ‘postgres’
static_configs:
- targets: [‘:9187’] # postgres_exporter
- job_name: ‘gitlab-exporter’
static_configs:
- targets: [‘:9168’] # gitlab-exporter
- rule_files:
- “/etc/prometheus/rules/*.rules.yml”
- 启动 Prometheus
- ./prometheus --config.file=prometheus.yml --storage.tsdb.path=/var/lib/prometheus
- 安装与配置 Alertmanager
- wget https://github.com/prometheus/alertmanager/releases/download/v0.27.0/alertmanager-0.27.0.linux-amd64.tar.gz
- 配置 alertmanager.yml(示例:SMTP 邮件)
- route:
receiver: ‘email’
group_by: [‘alertname’, ‘severity’]
group_wait: 10s
group_interval: 5m
repeat_interval: 4h
- receivers:
- name: ‘email’
email_configs:
- to: ‘ops@example.com’
from: ‘gitlab-alert@example.com’
smarthost: ‘smtp.example.com:587’
auth_username: ‘gitlab-alert@example.com’
auth_password: ‘YOUR_SMTP_PASS’
require_tls: true
- inhibit_rules:
- source_match:
severity: ‘critical’
target_match:
severity: ‘warning’
equal: [‘alertname’, ‘instance’]
- 启动 Alertmanager
- ./alertmanager --config.file=alertmanager.yml --storage.path=/var/lib/alertmanager
- 防火墙放行
- sudo ufw allow 9090,3000,9093,9100,9121,9187,9168/tcp
四 告警规则与通知示例
- Prometheus 规则文件:/etc/prometheus/rules/gitlab.rules.yml
- groups:
- name: gitlab
rules:
- alert: GitLabUnavailable
expr: up{job=“gitlab-omnibus”} == 0
for: 1m
labels:
severity: critical
annotations:
summary: “GitLab 实例不可达”
description: “GitLab 抓取目标已宕机超过 1 分钟: {{ $labels.instance }}”
- alert: GitLabRailsHighLatency
expr: gitlab_transaction_duration_seconds_bucket{le=“+0.5”, environment=“production”} / gitlab_transaction_duration_seconds_count{environment=“production”} < 0.95
for: 5m
labels:
severity: warning
annotations:
summary: “GitLab Rails P95 延迟偏高”
description: “P95 延迟超过 500ms,当前值 {{ $value | humanizePercentage }}”
- alert: NodeHighCPU
expr: 1 - avg by(instance)(rate(node_cpu_seconds_total{mode=“idle”}[5m])) > 0.8
for: 3m
labels:
severity: warning
annotations:
summary: “节点 CPU 使用率过高”
description: “实例 {{ $labels.instance }} CPU 使用率超过 80%(当前 {{ $value | humanizePercentage }})”
- alert: NodeHighMemory
expr: (node_memory_MemTotal_bytes - node_memory_MemAvailable_bytes) / node_memory_MemTotal_bytes > 0.85
for: 5m
labels:
severity: warning
annotations:
summary: “节点内存使用率过高”
description: “实例 {{ $labels.instance }} 内存使用率超过 85%(当前 {{ $value | humanizePercentage }})”
- 在 Prometheus Web UI 的 Status → Rules 检查规则是否加载成功;告警触发后会在 Alertmanager 的 /alerts 页面可见,并按路由与抑制规则发送通知。
五 Grafana 可视化与可选告警
- 安装与启动
- wget https://dl.grafana.com/oss/release/grafana-11.2.0.linux-amd64.tar.gz
- tar xvf grafana-11.2.0.linux-amd64.tar.gz
- ./bin/grafana-server -config /etc/grafana/grafana.ini -homepath /usr/share/grafana
- 添加数据源
- 访问 http://:3000,默认账号 admin/admin;添加 Prometheus 数据源,URL 为 http://:9090。
- 导入仪表盘
- 推荐导入 GitLab 官方 Grafana Dashboards(如 Node Exporter Full、PostgreSQL、Redis、GitLab Runners 等),快速获得 CPU、内存、磁盘 I/O、数据库、缓存、Runner 队列 等全景视图。
- Grafana 告警(可选)
- 在面板中创建阈值告警,选择 Prometheus 数据源,设置 Evaluate every、For、Conditions 与 Notifications(如 Email、Slack、Webhook)。适合团队已有 Grafana 通知通道的场景。