Ubuntu Trigger异常行为的诊断流程
一 明确问题与范围
二 日志优先 建立时间线
journalctl -xe、journalctl -fjournalctl -u <service_name> -b(仅本次启动)、journalctl -u <service_name> --since "2025-12-10 10:00:00" --until "2025-12-10 11:00:00"journalctl -u <service_name> -p err..alert/var/log/syslog、/var/log/messages/var/log/auth.log/var/log/apache2/error.log、/var/log/apache2/access.logdmesg -T | tail -n 200、cat /var/log/kern.logjournalctl | grep -i "error\|fail\|trigger";必要时用 awk/sed 做字段提取sudo journalctl --vacuum-time=7d、sudo journalctl --vacuum-bytes=100M三 资源与依赖 排除系统性瓶颈
top/htop、vmstat 1、iostat -x 1ps auxf、pstree -p <pid>lsof -p <pid>、ss -lntp | grep <port>df -h、du -sh /var/log /var/lib/* | sort -h/var/log/syslog 中 I/O 错误;必要时在救援模式运行 fscksystemctl status <service>、journalctl -u <service> -xesystemctl list-dependencies <service>systemctl restart <service>(变更前先备份配置)sudo dpkg --configure -asudo apt update && sudo apt full-upgrade -y、sudo apt --fix-broken installapt policy <pkg>、apt changelog <pkg>四 针对 Trigger 场景的专项排查
systemctl list-units --type=service,timer,path | grep -i triggersystemctl show <unit> -p Triggers、systemctl show <unit> -p After/Requiressystemctl list-timers --all、journalctl -u <timer>.timer -u <timer>.servicesystemctl status <path_unit>、journalctl -u <path_unit> -fdmesg -T | grep -i "trigger\|watchdog\|oops\|panic"lsmod | grep <mod>、modinfo <mod>;必要时调整 /etc/modprobe.d/*.conf 并 update-initramfs -ustrace -f -o /tmp/strace.log <cmd>、ltrace -e 'malloc,free,open' <cmd>gdb <binary> <core>(需开启 ulimit -c unlimited 与 core_pattern)tcpdump -ni any -w /tmp/trigger.pcap 'tcp port <port>';必要时配合 Wireshark 分析curl -Iv <url>、ping/traceroute/mtr五 固化证据与求助
sudo tar czf /tmp/diag-$(date +%F).tgz /var/log /etc/<service> /run/<service> /tmp/*.log /tmp/*.pcapuname -a、lsb_release -a、lshw、df -h、free -m、systemctl list-units --failed