CentOS 上 RabbitMQ 故障排查手册
一 快速定位流程
systemctl status rabbitmq-server;必要时查看启动日志:journalctl -xe | tail -n 200。telnet localhost 5672、curl -I http://localhost:15672;远程连通性:telnet <服务器IP> 5672。ss -lntp | egrep ':(5672|15672|4369|25672)';若冲突,调整配置或释放端口后重启。rabbitmqctl status | egrep 'mem_alarm|disk_free'。df -h;日志与数据目录占用:du -sh /var/log/rabbitmq /var/lib/rabbitmq。/etc/rabbitmq/rabbitmq.conf、/etc/rabbitmq/conf.d/*.conf;Erlang/OTP 与 RabbitMQ 版本匹配。rabbitmq-plugins list | grep management;启用:rabbitmq-plugins enable rabbitmq_management。rabbitmqctl list_users、rabbitmqctl list_vhosts、rabbitmqctl list_permissions -p <vhost>。二 常见故障与修复
服务无法启动
listeners.tcp.default 端口或释放占用端口后 systemctl restart rabbitmq-server。/etc/rabbitmq/rabbitmq.conf 与 conf.d/*.conf 的语法与参数;修正后重启。rabbitmqctl set_vm_memory_high_watermark 0.4、rabbitmqctl set_disk_free_limit 500MB。本地能访问管理页,远程访问失败
firewall-cmd --zone=public --add-port=15672/tcp --permanent && firewall-cmd --reload;AMQP 端口:firewall-cmd --add-port=5672/tcp --permanent && firewall-cmd --reload。rabbitmq.conf 中 listeners.tcp.default 或 inet_dist_listen_min/max 未仅绑定 127.0.0.1。客户端连接被拒绝或超时
rabbitmqctl add_vhost <vhost>、rabbitmqctl set_permissions -p <vhost> <user> ".*" ".*" ".*";必要时扩容或关闭闲置连接。集群节点无法加入
/etc/hosts 正确映射各节点 hostname 与 IP;Erlang Cookie 一致且权限正确(/var/lib/rabbitmq/.erlang.cookie)。telnet <目标IP> 4369、telnet <目标IP> 25672。启动时报 Mnesia/recovery.dets 损坏
not_a_dets_file、recovery.dets 等。systemctl stop rabbitmq-server → 备份后清理 Mnesia 数据目录(/var/lib/rabbitmq/mnesia/<node>)→ systemctl start rabbitmq-server;若仍失败,检查磁盘与文件系统健康。插件启用失败(如 rabbitmq_management)
{:badrpc, :timeout}。/etc/hosts(如 127.0.0.1 localhost <hostname>),或在 /etc/rabbitmq/enabled_plugins 写入 [rabbitmq_management]. 后启用。三 日志与关键命令速查
日志与诊断
/var/log/rabbitmq/rabbit@<hostname>.log、/var/log/rabbitmq/rabbit@<hostname>-sasl.log;实时查看:tail -f /var/log/rabbitmq/rabbit@<hostname>.log | egrep -i 'error|crash|alarm'。systemctl status rabbitmq-server -l、journalctl -xe | tail -n 200。常用运维命令
systemctl start|stop|restart rabbitmq-server;状态:systemctl status rabbitmq-server。rabbitmqctl status、rabbitmqctl set_vm_memory_high_watermark <0.0-1.0>、rabbitmqctl set_disk_free_limit <limit>。rabbitmqctl add_user <u> <p>、rabbitmqctl set_user_tags <u> administrator、rabbitmqctl add_vhost <v>、rabbitmqctl set_permissions -p <v> <u> ".*" ".*" ".*"。rabbitmqctl cluster_status、rabbitmqctl join_cluster <node>,变更后 rabbitmqctl await_online_nodes <n>。四 网络与安全配置要点
防火墙与安全组
firewall-cmd --reload。监听与绑定
listeners.tcp.default 与 management.tcp.port;如需公网管理,限制来源 IP。主机名与解析
/etc/hosts 必须包含本机 IP <-> hostname 映射;集群节点间名称必须可解析且一致。代理与路径
/ 与 %2F 的处理,避免 405 或路由错误。五 客户端连接异常与重连建议
常见成因
排查与优化