fix(ops): route container pressure alerts to controller
Some checks failed
CD Pipeline / workflow-shape (push) Successful in 1s
CD Pipeline / cancel-stale-cd (push) Has been skipped
CD Pipeline / build-and-deploy (push) Has been cancelled
CD Pipeline / post-deploy-checks (push) Has been cancelled
CD Pipeline / tests (push) Has been cancelled

This commit is contained in:
Your Name
2026-07-01 23:38:34 +08:00
parent d658f03ac5
commit fffebf9597
6 changed files with 85 additions and 7 deletions

View File

@@ -324,7 +324,7 @@ groups:
annotations:
summary: "110 sustained pressure needs triage"
description: "110 load5/core > 0.75 或 Gitea / StockPlatform 關鍵容器 CPU > 2.0 core 持續 1 分鐘;這是 critical 之前的主動偵測,避免等到 load5/core > 1.5 才反應。"
auto_repair_action: "ssh 192.168.0.110 '/home/wooo/scripts/host-sustained-load-controller.py --host 110 --load5-per-core-threshold 0.75 --metrics-file /home/wooo/node_exporter_textfiles/host_runaway_process.prom --docker-stats-file /home/wooo/node_exporter_textfiles/docker_stats.prom --json'"
auto_repair_action: "ssh 192.168.0.110 '/home/wooo/scripts/host-sustained-load-controller.py --host 110 --load5-per-core-threshold 0.75 --container-cpu-threshold 2.0 --metrics-file /home/wooo/node_exporter_textfiles/host_runaway_process.prom --docker-stats-file /home/wooo/node_exporter_textfiles/docker_stats.prom --json'"
runbook: "controller 只產生 controlled packet不讀 secret、不重啟服務。若分類為 gitea_queue_or_hook_backlog先跑 host-sustained-load-evidence.py 取得脫敏 top family / container再選 Gitea queue/hook backlog playbook若是 orphan browser 才允許 gated SIGTERM若是 StockPlatform postgres/API轉 Stock hot-query/source freshness playbook。禁止 Docker / systemd / Nginx / DB restart、reboot、firewall。"
- alert: HostCiRunnerLoadSaturation