Debian系统中Nginx监控与报警设置指南
要监控Nginx状态,需先启用其内置的stub_status模块,用于暴露基础连接数指标。
/etc/nginx/nginx.conf或在/etc/nginx/sites-available/default中添加):server {
    listen 80;
    server_name localhost;
    location /nginx_status {
        stub_status on;
        access_log off;
        allow 127.0.0.1;  # 仅允许本地访问,提升安全性
        deny all;
    }
}
sudo systemctl restart nginx
http://localhost/nginx_status,将显示类似以下信息:Active connections: 3 
server accepts handled requests
 100 100 200 
Reading: 0 Writing: 1 Waiting: 2
Active connections:当前活跃连接数(含Reading/Writing/Waiting状态);accepts:累计接受的连接数;handled:累计成功处理的连接数;requests:累计处理的请求数;Reading/Writing/Waiting:分别表示正在读取请求头、发送响应、保持空闲的连接数。Prometheus是一款开源监控系统,通过“拉取”模式收集指标。
wget https://github.com/prometheus/prometheus/releases/download/v2.30.3/prometheus-2.30.3.linux-amd64.tar.gz
tar xvfz prometheus-2.30.3.linux-amd64.tar.gz
cd prometheus-2.30.3.linux-amd64
prometheus.yml,添加Nginx Exporter抓取任务:scrape_configs:
  - job_name: 'nginx'
    static_configs:
      - targets: ['localhost:9113']  # Nginx Exporter的地址
./prometheus --config.file=prometheus.yml
http://localhost:9090可查看Prometheus Web界面。Nginx Exporter将stub_status的指标转换为Prometheus可识别的格式。
wget https://github.com/nginxinc/nginx-prometheus-exporter/releases/download/v0.11.0/nginx-prometheus-exporter-0.11.0.linux-amd64.tar.gz
tar xvfz nginx-prometheus-exporter-0.11.0.linux-amd64.tar.gz
cd nginx-prometheus-exporter-0.11.0.linux-amd64
./nginx-prometheus-exporter -nginx.scrape-uri=http://localhost/nginx_status
9113端口,输出指标示例:# HELP nginx_http_requests_total Total number of HTTP requests
# TYPE nginx_http_requests_total counter
nginx_http_requests_total{status="200",method="GET",handler="/"} 100
Grafana用于将Prometheus中的指标可视化,并设置报警规则。
sudo apt update && sudo apt install -y grafana
sudo systemctl enable --now grafana-server
http://localhost:3000,使用默认账号admin/admin登录。Configuration → Data Sources → Add data source → 选择Prometheus;http://localhost:9090,点击Save & Test(需显示“Data source is working”)。+ → Dashboard → Import;12708,官方Nginx基础看板),点击Import。Active connections、Requests per second、5xx error rate等关键指标的实时趋势。编辑Prometheus的rules.yml文件(或在prometheus.yml中添加rule_files),添加以下规则:
groups:
  - name: nginx_alerts
    rules:
      - alert: High5xxErrorRate
        expr: sum(rate(nginx_http_requests_total{status=~"5.."}[5m])) / sum(rate(nginx_http_requests_total[5m])) > 0.01  # 5xx错误率超过1%
        for: 5m  # 持续5分钟触发
        labels:
          severity: critical
        annotations:
          summary: "Nginx 5xx错误率过高 (instance {{ $labels.instance }})"
          description: "过去5分钟5xx错误占比 {{ $value }},超过1%阈值"
      - alert: HighRequestRate
        expr: sum(rate(nginx_http_requests_total[1m])) by (instance) > 1000  # 每秒请求数超过1000
        for: 2m
        labels:
          severity: warning
        annotations:
          summary: "Nginx请求率过高 (instance {{ $labels.instance }})"
          description: "当前请求率 {{ $value }},超过1000阈值"
prometheus.yml,添加:rule_files:
  - "rules.yml"
Alerting → Notification channels;New channel,配置通知方式(如Email、Slack):
Email Alerts;Email;Alerting → Alert rules,找到已创建的规则(如High5xxErrorRate),点击Edit → Notifications,选择对应的Notification channel(如Email Alerts)。编辑/etc/nginx/nginx.conf,添加结构化日志格式(如JSON):
http {
    log_format json_analytics escape=json '{"time":"$time_iso8601","host":"$host","status":"$status","request_time":"$request_time","remote_addr":"$remote_addr","request":"$request"}';
    access_log /var/log/nginx/access.log json_analytics;
    error_log /var/log/nginx/error.log;
}
重启Nginx使配置生效:
sudo systemctl restart nginx
日志字段说明:
status:HTTP状态码(如200、500);request_time:请求处理时间(秒);remote_addr:客户端IP;request:请求路径与方法(如GET /api/payment)。ngxtop是一款实时日志分析工具,可快速定位异常请求。
sudo apt install -y python3-pip
pip3 install ngxtop
ngxtop -i 'status >= 500' print request_path status request_time
Running for 10 seconds, 123 records processed: 12.3 req/sec
request_path       status  request_time
/api/payment       500     1.23
/api/user/create   502     0.45
ngxtop -i 'request_time > 1' top request_path request_time
Running for 10 seconds, 456 records processed: 45.6 req/sec
request_path       request_time
/api/upload        3.45
/api/report        2.12
Fail2Ban可监控Nginx日志,自动封禁频繁发起恶意请求的IP。
sudo apt update && sudo apt install -y fail2ban
/etc/fail2ban/jail.local中添加:[nginx-http-auth]
enabled = true
filter = nginx-http-auth
action = iptables[name=HTTP, port=80, protocol=tcp]
logpath = /var/log/nginx/error.log
maxretry = 3  # 3次失败后封禁
bantime = 3600  # 封禁1小时
sudo systemctl restart fail2ban
sudo fail2ban-client status nginx-http-auth
severity(如critical、warning、info),避免重要告警被淹没;#!/bin/bash
if ! pgrep nginx > /dev/null; then
    systemctl restart nginx
    echo "Nginx restarted at $(date)" >> /var/log/nginx_monitor.log
fi
cron定时任务(如每分钟执行一次):* * * * * /path/to/script.sh
通过以上步骤,可在Debian系统中构建一套完整的Nginx监控与报警体系,实现对服务状态、性能指标、日志异常的实时监控,及时发现并解决问题,保障服务高可用性。