CentOS环境下GitLab监控与告警设置指南
要监控GitLab实例,首先需开启其内置的指标端点。编辑GitLab主配置文件/etc/gitlab/gitlab.rb,添加或修改以下配置:
gitlab_rails['gitlab_metrics_enabled'] = true
gitlab_rails['gitlab_metrics_port'] = 9090 # 默认端口,可根据需求调整
global['monitoring_enabled'] = true
保存后执行sudo gitlab-ctl reconfigure应用配置,再通过sudo gitlab-ctl restart重启GitLab服务,使指标服务生效。
Prometheus是GitLab推荐的监控工具,需先安装并配置Prometheus以抓取GitLab指标:
yum install -y prometheus)。/etc/prometheus/prometheus.yml,添加GitLab的抓取配置:scrape_configs:
- job_name: 'gitlab'
static_configs:
- targets: ['your_gitlab_server_ip:9090'] # 替换为GitLab服务器IP
systemctl start prometheus并设置开机自启(systemctl enable prometheus),通过Prometheus Web界面(默认http://<IP>:9090)验证是否能成功抓取GitLab指标(如gitlab_rails_database_queries_seconds、gitlab_workhorse_http_requests_total)。Grafana可将Prometheus中的指标转化为直观的仪表盘:
yum install -y grafana),启动服务(systemctl start grafana-server)并设置开机自启。http://<IP>:3000,账号admin,密码admin),进入“Configuration→Data Sources”,添加Prometheus数据源(URL填写http://localhost:9090),测试连接后保存。4379,涵盖CPU、内存、请求延迟等指标),选择Prometheus数据源即可生成可视化面板。Alertmanager负责处理Prometheus触发的告警并发送通知,需先安装并配置:
yum install -y alertmanager),启动服务(systemctl start alertmanager)并设置开机自启。/etc/alertmanager/alertmanager.yml,添加邮件通知配置(以邮件为例):route:
receiver: 'email-notifications'
receivers:
- name: 'email-notifications'
email_configs:
- to: 'admin@example.com'
from: 'gitlab-alerts@example.com'
smarthost: 'smtp.example.com:587'
auth_username: 'gitlab_alerts'
auth_password: 'your_email_password'
send_resolved: true # 告警恢复后发送通知
保存后重启Alertmanager(systemctl restart alertmanager)。告警规则可通过两种方式定义:
alert.rules文件,添加如下规则(示例为内存使用率超过80%触发告警):groups:
- name: gitlab_alerts
rules:
- alert: HighMemoryUsage
expr: (node_memory_MemTotal_bytes - node_memory_MemAvailable_bytes) / node_memory_MemTotal_bytes > 0.8
for: 1m # 持续1分钟满足条件才触发
labels:
severity: warning
annotations:
summary: "High memory usage in GitLab"
description: "GitLab instance memory usage is above 80% (current: {{ $value }})"
在prometheus.yml中引入该文件(rule_files: - "alert.rules"),重启Prometheus使规则生效。.gitlab-ci.yml文件,添加监控和告警配置:monitoring:
script:
- apt-get update && apt-get install -y prometheus-node-exporter
- echo "gitlab_metrics{project=\"$CI_PROJECT_PATH\", ref=\"$CI_COMMIT_REF_NAME\"} 1" >> /etc/prometheus/exporters/gitlab_metrics.prom
artifacts:
paths:
- /etc/prometheus/exporters/gitlab_metrics.prom
expire_in: 1 week
alerting:
rules:
- alert: HighMemoryUsage
expr: sum(memory_usage) / sum(memory_total) > 0.8
for: 5m
labels:
severity: critical
annotations:
summary: "Critical memory usage in {{ $labels.project }}"
description: "Memory usage in project {{ $labels.project }} is above 80% for 5 minutes."
notify:
- name: email
when: alerting
email:
to: 'admin@example.com'
通过CI/CD流水线生成告警指标,并触发Alertmanager通知。为确保告警配置正确,可通过以下方式测试:
stress工具模拟高内存占用(stress --vm 1 --vm-bytes 80% --timeout 10m),观察Prometheus是否触发告警,以及Alertmanager是否发送邮件通知。journalctl -u alertmanager -f查看告警处理日志,确认是否有错误信息。9090),允许Prometheus访问。