fix(ops): route container pressure alerts to controller
Some checks failed
CD Pipeline / workflow-shape (push) Successful in 1s
CD Pipeline / cancel-stale-cd (push) Has been skipped
CD Pipeline / build-and-deploy (push) Has been cancelled
CD Pipeline / post-deploy-checks (push) Has been cancelled
CD Pipeline / tests (push) Has been cancelled
Some checks failed
CD Pipeline / workflow-shape (push) Successful in 1s
CD Pipeline / cancel-stale-cd (push) Has been skipped
CD Pipeline / build-and-deploy (push) Has been cancelled
CD Pipeline / post-deploy-checks (push) Has been cancelled
CD Pipeline / tests (push) Has been cancelled
This commit is contained in:
@@ -324,7 +324,7 @@ groups:
|
||||
annotations:
|
||||
summary: "110 sustained pressure needs triage"
|
||||
description: "110 load5/core > 0.75 或 Gitea / StockPlatform 關鍵容器 CPU > 2.0 core 持續 1 分鐘;這是 critical 之前的主動偵測,避免等到 load5/core > 1.5 才反應。"
|
||||
auto_repair_action: "ssh 192.168.0.110 '/home/wooo/scripts/host-sustained-load-controller.py --host 110 --load5-per-core-threshold 0.75 --metrics-file /home/wooo/node_exporter_textfiles/host_runaway_process.prom --docker-stats-file /home/wooo/node_exporter_textfiles/docker_stats.prom --json'"
|
||||
auto_repair_action: "ssh 192.168.0.110 '/home/wooo/scripts/host-sustained-load-controller.py --host 110 --load5-per-core-threshold 0.75 --container-cpu-threshold 2.0 --metrics-file /home/wooo/node_exporter_textfiles/host_runaway_process.prom --docker-stats-file /home/wooo/node_exporter_textfiles/docker_stats.prom --json'"
|
||||
runbook: "controller 只產生 controlled packet,不讀 secret、不重啟服務。若分類為 gitea_queue_or_hook_backlog,先跑 host-sustained-load-evidence.py 取得脫敏 top family / container,再選 Gitea queue/hook backlog playbook;若是 orphan browser 才允許 gated SIGTERM;若是 StockPlatform postgres/API,轉 Stock hot-query/source freshness playbook。禁止 Docker / systemd / Nginx / DB restart、reboot、firewall。"
|
||||
|
||||
- alert: HostCiRunnerLoadSaturation
|
||||
|
||||
Reference in New Issue
Block a user