RabbitMQ故障排查CentOS实用指南
排查RabbitMQ故障需遵循“先看日志、再查指标、最后用工具验证”的黄金法则,确保问题定位准确高效。
sudo rabbitmqctl statussudo rabbitmqctl list_connections peer_host peer_port statesudo rabbitmqctl list_queues name messages consumers statesudo rabbitmqctl list_permissions -p /sudo rabbitmq-diagnostics listensudo rabbitmq-diagnostics node_health_checksudo rabbitmq-diagnostics memory_breakdown --unit MB启用管理插件后,通过http://<服务器IP>:15672访问(默认用户名/密码:guest/guest,仅本地访问)。核心功能:
ready消息数过高)、消费者数量(为0则无人消费)。sudo systemctl status rabbitmq-server(若未运行,尝试sudo systemctl start rabbitmq-server)。/var/log/rabbitmq/rabbit@<hostname>.log,常见错误包括:
/etc/rabbitmq/rabbitmq.conf(或rabbitmq-env.conf)语法,如端口冲突、路径错误。erl -version验证。sudo netstat -tulnp | grep 5672(AMQP端口)或sudo ss -tulnp | grep 15672(管理界面端口)检查端口占用,停止冲突进程(如sudo systemctl stop冲突服务)。sudo setenforce 0)或修改配置(sudo sed -i 's/^SELINUX=enforcing$/SELINUX=permissive/' /etc/selinux/config)。df -h /var/lib/rabbitmq/(数据目录),清理旧日志或数据(如sudo rm -rf /var/log/rabbitmq/*.old)connection_closed_abruptly)ping <客户端IP>、telnet <服务器IP> 5672(确认网络可达)。sudo rabbitmq-diagnostics memory_breakdown检查内存(mem_used / mem_limit > 0.8需扩容),df -h检查磁盘(disk_free < disk_free_limit需清理)。openssl x509 -in /path/to/cert.pem -noout -dates),客户端信任CA证书flow control initiated表示触发流控,通过sudo rabbitmq-diagnostics node_health_check确认。mem_used / mem_limit > 0.8)或磁盘空间不足(disk_free < disk_free_limit)会触发流控,需扩容或清理。sudo rabbitmqctl list_queues name messages_ready consumers,若messages_ready增长快且consumers为0,说明消费者未启动或处理慢(优化消费者代码或增加实例)ready消息数过高)sudo rabbitmqctl list_queues name messages_ready,定位堆积严重的队列。sudo rabbitmqctl list_queues name consumers,若consumers=0,需添加消费者或修复消费者服务。messages_ready与messages_unacknowledged的比例,若messages_ready持续增长,说明消费速度慢(优化消费者逻辑或增加并行度)sudo rabbitmqctl list_permissions -p /,确认用户是否有对应虚拟主机的configure、write、read权限。sudo rabbitmqctl list_vhosts,若虚拟主机未创建,需通过sudo rabbitmqctl add_vhost <vhost_name>创建。sudo rabbitmqctl set_permissions -p /<vhost_name> <username> ".*" ".*" ".*"(授予所有权限,生产环境建议按需分配)/etc/rabbitmq/rabbitmq.conf配置日志滚动(如log.rotate),避免日志占满磁盘。rabbitmq_node_mem_used、rabbitmq_queue_messages_ready等指标,设置阈值告警(如内存使用率>80%时报警)。/var/lib/rabbitmq/mnesia目录(Mnesia数据库,包含队列、交换机等元数据),避免数据丢失。yum update rabbitmq-server),修复已知漏洞与bug