Your Name
8fb0c5df33
feat(heartbeat): noise reduction — silent 6h + warnings hash dedup
...
Code Review / ai-code-review (push) Successful in 47s
CD Pipeline / tests (push) Successful in 2m11s
CD Pipeline / build-and-deploy (push) Failing after 31m12s
CD Pipeline / post-deploy-checks (push) Has been skipped
P0 #4 (徹底長期修系列) — 統帥鐵證:「INFO | AWOOOI 系統報告」每 30 分鐘
推一次,一天 48 條同樣內容,即使我修了 P0 #3 假警報,每天的「全系統正常」
重複推送本身就是噪音,讓統帥誤以為告警還在重複。
修法(不違反「監控工具必須被監控」鐵律 — 健康狀態仍每 6h 推 1 次「我活著」):
| 狀況 | 推送行為 |
|------|---------|
| 健康(無 warnings)| 6h 內最多 1 次「我活著」訊號 |
| 有 warnings 跟上次同 hash | 跳過 |
| 有 warnings 跟上次不同 | 立即推送(新狀況不漏)|
| 健康 ↔ 有事 切換 | 自動清掉相反 marker |
Redis keys:
- `heartbeat:silent_last_sent` — 健康狀態 silent marker, TTL=6h
- `heartbeat:warnings_hash` — 上次 warnings 的 md5[:12], TTL=24h
效果:統帥每天從 48 條 heartbeat → ~4 條(健康狀態 4×6h),有事立即推。
Tests: 6 passed (test_heartbeat_dedup_p0_4.py)
- healthy_first_send_goes_through
- healthy_second_send_within_6h_skipped
- warnings_unchanged_skipped
- warnings_changed_pushes
- warnings_to_healthy_clears_warnings_hash
- healthy_to_warnings_clears_silent_marker
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com >
2026-05-03 01:48:57 +08:00
Your Name
2ce722bda9
feat(heartbeat): full K8s pod lifecycle state machine + regression tests
...
Code Review / ai-code-review (push) Successful in 51s
CD Pipeline / tests (push) Successful in 2m59s
CD Pipeline / build-and-deploy (push) Has started running
CD Pipeline / post-deploy-checks (push) Has been cancelled
P0 #3 (徹底長期修系列) — 把 daily report 的 pod 健康判斷從「ready=False 一律告警」
升級到完整 K8s pod lifecycle state machine:
| Phase | 行為 |
|-------|------|
| Succeeded / Completed | 跳過(CronJob/Job 跑完正常) |
| Failed | 必告警 |
| Unknown | 必告警 |
| Pending <5min | 跳過(剛 schedule 合理) |
| Pending >=5min | 告警「image pull / scheduling 卡住」|
| Running ready=True | 健康,跳過 |
| Running ready=False <2min | 跳過(剛起來 probe 還沒過)|
| Running ready=False >=2min | 告警「readiness probe fail / 啟動異常」|
| restarts >=3 | 必告警(無論 phase)|
實作:
- PodInfo 加 start_time: Optional[str](從 .status.startTime)
- _get_pod_status kubectl custom-columns 加 STARTTIME
- _build_warnings 完整 state machine + 閾值常數
regression test (test_heartbeat_pod_state_machine.py 13 個) 覆蓋每個 phase
+ 邊界條件,含 2026-05-02 統帥截圖鐵證重現(3 個 drift-scanner Succeeded
pod 不該觸發「需關注 3 項」假警報)。
Tests: 13 passed (新增 test_heartbeat_pod_state_machine.py)
接續 a38d9112(單純 Succeeded skip),這次徹底處理 Pending/Failed/Unknown
+ 時間閾值 + 沒 start_time 的保守告警。
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com >
2026-05-03 01:44:58 +08:00
Your Name
f1362fcc8d
fix(governance): 修治理告警 4 個 silent failure + Prom sentinel 連鎖
...
Code Review / ai-code-review (push) Successful in 49s
CD Pipeline / tests (push) Successful in 2m9s
CD Pipeline / build-and-deploy (push) Failing after 31m11s
CD Pipeline / post-deploy-checks (push) Has been skipped
【全景檢測:12-agent 並行掃描定位 4 大 bug 與 1 個 P0 連鎖回歸】
Bug 1(P0 silent failure)— governance_agent.check_trust_drift
原 `await db.commit()` 縮排錯在 async with 區塊外(8 空格 vs 12),
session 已 auto-commit 關閉,二次 commit 拋 InvalidRequestError 被吞,
governance_trust_drift_auto_deprecated log 從不出現。修:commit/log 移回 with 內。
附 AST regression guard test 擋退化。
Bug 2 — flywheel_stats_service / W-3 fresh deploy 假告警
Redis 空時 total_exec=0 → rate=0.0 → watchdog `< 0.30` 立即觸發
「飛輪成功率 0%」假告警。修:total_exec < FLYWHEEL_MIN_SAMPLE(10) 回 None,
watchdog 判 None 跳過 W-3。Prometheus sentinel 用 NaN(非 -1.0)
避免觸發 ops/monitoring/alerts.yml:775 等 3 份 prom rule 的 `< 0.1`
條件造成 2h 後假告警連鎖。前端 type 同步 number | null。
Bug 3 — failover_alerter dedup key
原 key 只看 event_type 不看 payload,trust_drift 4→25 IDs 變動全被
1h dedup 吞掉。修:dedup key 加 sha256(impact subdict)[:8],event_type
sanitize 防特殊字元污染 Redis key。
Bug 4 — ai_slo_watchdog_job W-4 evolver 全封存初始化誤報
原邏輯 approved==0 即告警,未排除「playbooks 表初始化中」場景。
修:_count_approved_playbooks 回 (approved, total),total==0 → skip。
【執行結果】
- 39 個相關 unit test 全過(test_failover_alerter / test_governance_agent /
test_trust_drift_watchdog / test_check_trust_drift_commit_outside_context_poc)
- 6 個關鍵路徑實測:NaN sentinel / float 渲染 / hash 區分性 / dedup 同 impact
相同 hash / datetime 容錯 / 4 檔 py_compile 全過
【調度教訓 — 留作未來改進】
- 12-agent 並行調度時,vuln-verifier 與 fullstack-engineer 競態
導致 vuln-verifier 讀到已修代碼誤判 NOT REPRODUCIBLE。
未來:vuln-verifier 應在 fullstack 之前執行,或用 git show HEAD~1 對比修復前。
- fullstack-engineer 引入 P0 regression(f-string 內嵌 ternary 非法 format spec),
critic 抓到 + Prom sentinel 連鎖 — 證明 critic 審查必要不可省。
🤖 Generated with [Claude Code](https://claude.com/claude-code )
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com >
2026-05-03 00:18:57 +08:00
Your Name
314cb0e079
fix(test): align governance self_failure assertions with nested payload schema
...
Code Review / ai-code-review (push) Successful in 48s
CD Pipeline / tests (push) Successful in 2m18s
CD Pipeline / post-deploy-checks (push) Has been cancelled
CD Pipeline / build-and-deploy (push) Has been cancelled
Codex commits dedb1208 + b710f3f3 (governance enrich + normalize) 把
_alert("governance_self_failure", ...) 的 payload structure 重構成嵌套:
{status, impact: {failed_checks, total_checks, errors}, remediation, actionable}
(governance_agent.py:604-624,2026-04-29 critic M6 修),
但 3 個 test 還用舊路徑 `payload["total_checks"]` 直讀,KeyError 後 RuntimeError 模擬 cascading 失敗。
修法:3 個 assertion 改為讀正確嵌套路徑:
- test_governance_agent.py:601 → payload["impact"]["total_checks"|"failed_checks"]
- test_wave8_remaining_blockers.py:223 → 同
- test_wave8_remaining_blockers.py:268 → 同
Tests: 30 passed (test_governance_agent + test_wave8_remaining_blockers 全部)
效果:解開 dedb1208 / b710f3f3 / a38d9112 三個 commit 因 governance test fail
被擋在 build-and-deploy 之前的卡點,恢復 CD 鏈通暢。
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com >
2026-05-03 00:05:04 +08:00
Your Name
b5adf77a9f
fix(ci): make Telegram notifications non-blocking on CD pipeline
...
CD Pipeline / tests (push) Failing after 1m27s
CD Pipeline / build-and-deploy (push) Has been skipped
CD Pipeline / post-deploy-checks (push) Has been skipped
Code Review / ai-code-review (push) Successful in 48s
統帥鐵證:tests/build-and-deploy 步驟內 'Notify Pipeline Start/Failure'
curl 400 → 整個 job exit 22 → 從 5/1 起連續 14 個 commit 部署被擋。
根本問題:通知步驟是觀察用,不該成為 CI 主流程的 hard requirement。
curl -fS 預設 fail-on-HTTP-error,配上 Telegram bot 任何短暫故障
(token revoke、bot 被踢出 chat、API rate limit)就把整條 pipeline 擊垮。
修法:對齊 line 922 既有正確 pattern,5 處 curl 全部加
`|| echo "TG notify failed (non-fatal): exit=$?"`
涉及 step:
- Notify Pipeline Start (line 79)
- Notify Pipeline Failure × tests (line 236)
- Notify Pipeline Failure × build-and-deploy (line 779)
- Notify Pipeline Failure × post-deploy-checks (line 938)
- (line 924 已是正確 pattern, 不動)
副效應:notification 失敗從此只會在 log 留 warning,不擋 CI。
真正的 telegram 故障由系統其他監控機制(alertmanager_health 等)負責。
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com >
2026-05-03 00:00:20 +08:00
Your Name
b710f3f38f
feat(governance): normalize AI治理告警輸出與元告警解析度
CD Pipeline / tests (push) Failing after 25s
CD Pipeline / build-and-deploy (push) Has been skipped
CD Pipeline / post-deploy-checks (push) Has been skipped
Code Review / ai-code-review (push) Successful in 46s
2026-05-02 23:49:59 +08:00
Your Name
a38d911213
fix(heartbeat): exclude Succeeded/Completed CronJob pods from warnings
...
Code Review / ai-code-review (push) Successful in 50s
CD Pipeline / tests (push) Failing after 1m22s
CD Pipeline / build-and-deploy (push) Has been skipped
CD Pipeline / post-deploy-checks (push) Has been skipped
統帥 23:30 截圖鐵證:每日系統報告永遠列「需關注 3 項:
Pod drift-scanner-* 未就緒 (Succeeded)」,讓人誤以為告警重複。
實際上 Succeeded/Completed 是 CronJob/Job 跑完的成功狀態,
ready=False 是設計(容器已退出)— 不該算 warning。
修法:heartbeat_report_service.py:704 加判斷跳過 Succeeded/Completed pods。
預期效果:今天 23:30 的「需關注 3 項」明天起會降為 0 項,daily report
header 從「需關注 N 項」變回「全系統正常」。
Tests: 50 passed (heartbeat 相關)
注意:working tree 還有 statq Codex 未 commit 的 7 個檔案改動
(approval_execution.py 有 indentation error 半成品),本 commit 只動
heartbeat_report_service.py 單檔,不誤碰其他。
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com >
2026-05-02 23:48:31 +08:00
Your Name
ed0553c337
docs(governance): add AI governance alert schema and consolidation playbook
2026-05-02 23:47:00 +08:00
Your Name
dedb12085b
chore(governance,watchdog): enrich alerts and enable prometheus multiproc
CD Pipeline / tests (push) Failing after 1m22s
CD Pipeline / build-and-deploy (push) Has been skipped
CD Pipeline / post-deploy-checks (push) Has been skipped
Code Review / ai-code-review (push) Successful in 43s
Deploy Alert Rules / Deploy Prometheus Alert Rules (push) Successful in 57s
2026-05-02 23:44:12 +08:00
Your Name
b371edb70c
fix host alert auto-repair routing and backup false positives
2026-05-02 23:44:12 +08:00
AWOOOI CD
68e182381f
chore(cd): deploy da772a1 [skip ci]
2026-05-02 17:58:22 +08:00
Your Name
da772a1605
fix(decision): block kubectl actions on bare_metal host alerts
...
Code Review / ai-code-review (push) Successful in 54s
CD Pipeline / tests (push) Successful in 3m47s
CD Pipeline / build-and-deploy (push) Successful in 13m26s
CD Pipeline / post-deploy-checks (push) Successful in 5m45s
When HostHighCpuLoad / HostOutOfMemory fire on a bare-metal host
(192.168.0.110 et al, where Sentry / ClickHouse / Snuba are eating
CPU), the LLM kept proposing "kubectl rollout restart awoooi-api",
which is a wrong-domain action — restarting awoooi cannot fix a
third-party process's CPU usage on the host. Auto-execute would then
either run the no-op kubectl restart (wasted) or escalate after
ssh_diagnose because no safe action was found, producing the
"AI 自動修復失敗" Telegram noise the user just complained about.
Adds a guard at the top of DecisionManager._auto_execute: if the
incident's primary signal carries host_type=bare_metal AND the
proposed action starts with "kubectl", refuse to execute. The
incident is marked READY with a clear blocked_reason so human
operators see why automation declined, and emergency_escalation
records the event in AOL for audit.
Also patches /home/wooo/monitoring/alerts.yml on 110 (and the new
ops/monitoring/alerts.yml in repo) to add an explicit
auto_repair_action annotation on HostHighCpuLoad / HostOutOfMemory
that hints LLM toward `ssh ... ps aux` rather than kubectl restart.
Prometheus reload returned 200.
Tests: tests/test_decision_manager_bare_metal_kubectl_guard.py
covers (1) bare_metal+kubectl blocked, (2) kubectl get also blocked,
(3) bare_metal+ssh NOT blocked, (4) k8s host_type+kubectl NOT
blocked, (5) missing host_type label NOT blocked.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com >
2026-05-02 17:41:28 +08:00
Your Name
47342dfb34
fix(escalation): dedup escalation card by fingerprint + 24h TTL
...
Code Review / ai-code-review (push) Successful in 55s
CD Pipeline / build-and-deploy (push) Has been cancelled
CD Pipeline / post-deploy-checks (push) Has been cancelled
CD Pipeline / tests (push) Has been cancelled
接續 b3a0f0d7(decision card dedup)—— 統帥 17:35 鐵證:4 條 ESCALATION P0
連發(HostOutOfDiskSpace + 3×HostDiskUsageHigh,全 target=node-exporter-110,
全不同 INC ID C9CD6E/FB7944/559B54/C1BBF3)。
decision card 修了但 escalation card 走另一條路徑,根因相同:
- emergency_escalation_service.py:31 dedup key 綁 incident_id (uuid4 隨機)
- TTL 900s 比 sweeper 重觸週期 1h 短
修法:
- escalate_auto_repair_unavailable() 改用 alertname+target fingerprint dedup
- TTL 900s → 86400s,與 decision_manager.py:574 對齊
drift_auto_adopt 路徑暫不動(TTL 已 3600s + report_id 非隨機,非當前問題)。
Tests: 7 passed (escalation/emergency 相關用例)
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com >
2026-05-02 17:38:54 +08:00
AWOOOI CD
697e13b23a
chore(cd): deploy 297afb6 [skip ci]
2026-05-02 17:28:56 +08:00
Your Name
297afb6998
fix(ci): require all 4 host keys before overwriting ssh-mcp-key secret
...
Code Review / ai-code-review (push) Successful in 44s
CD Pipeline / tests (push) Successful in 2m17s
CD Pipeline / build-and-deploy (push) Successful in 12m44s
CD Pipeline / post-deploy-checks (push) Successful in 4m26s
When ssh-keyscan partially fails (e.g. one host is unreachable for a
moment) the previous logic still considered the file non-empty, so it
patched ssh-mcp-key/known_hosts with an incomplete set. asyncssh then
rejected any SSH to the missing host with "Host key is not trusted",
which routed every host disk-full / docker alert into the emergency
escalation channel and spammed Telegram (today's regression for 110).
Now we explicitly verify all four target IPs (110/120/121/188) appear
in the scan output before patching. Missing any of them aborts the
patch and keeps the previously-good secret untouched, plus logs the
ssh-keyscan stderr to help debug intermittent network issues.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com >
2026-05-02 17:14:30 +08:00
AWOOOI CD
a6409c39e2
chore(cd): deploy b3a0f0d [skip ci]
2026-05-02 16:49:00 +08:00
Your Name
b3a0f0d766
fix(telegram): dedup by fingerprint + 24h TTL to stop repeat alerts
...
CD Pipeline / tests (push) Successful in 2m22s
Code Review / ai-code-review (push) Successful in 57s
CD Pipeline / build-and-deploy (push) Successful in 21m3s
CD Pipeline / post-deploy-checks (push) Successful in 5m2s
Telegram 重複發告警鐵證(4 個 agent 真實數據):
- INC-6FE3BD (HostBackupFailed) 24h 內被推 15 次
- INC-FD6E21 (HostHighCpuLoad) 24h 內被推 6 次
- 06:44:18 同秒兩送 = pod 並發 race
根因:
1. `telegram_sent:{incident_id}` dedup key 綁 uuid4 隨機 INC ID,
同 fingerprint 換新 INC 完全不去重
2. dedup TTL=600s 比 incident_analysis_sweeper 重觸週期 1h、
alertmanager repeat_interval 4h 都短 → 每輪都過期通過
3. pod restart 走 _resend_unconfirmed_ready_tokens 用同一 incident_id key
→ 重啟必炸一波
修法(不消音、是「AI 認得這是同一事故」):
- decision_manager.py:207-225 dedup key 改 alertname+target fingerprint
- decision_manager.py:573-578 TTL 600s → 86400s (蓋住 sweeper 1h × alertmanager 4h)
- decision_manager.py:3189-3208 pod restart resend 路徑同步改 fingerprint
- incident_analysis_sweeper.py:37-42 sweeper_done TTL 3600s → 86400s
預期:同症狀 24h 內最多發 1 張 decision card;resolved 後 line 220-226
status check 會 early return,不影響復發偵測。
Tests: 35 passed (test_telegram_adr050 + test_decision_manager_docker_prune_routing)
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com >
2026-05-02 16:25:48 +08:00
Your Name
202071f7a8
chore(ci): force CD rebuild via .dockerignore touch
...
CD Pipeline / tests (push) Successful in 2m17s
CD Pipeline / build-and-deploy (push) Failing after 31m17s
CD Pipeline / post-deploy-checks (push) Has been skipped
Empty commits don't match cd.yaml paths filter (apps/** etc).
This adds a comment to .dockerignore to trigger build for sha
84ba3216's commits stack.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com >
2026-05-02 15:46:05 +08:00
Your Name
5c27bac686
chore(ci): retrigger build after runner restart
...
Previous build (task#1396) failed when act_runner daemon was restarted
to clear stuck job state.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com >
2026-05-02 15:44:42 +08:00
Your Name
899bfdb6d1
chore(ci): trigger build after Gitea restart
2026-05-02 15:38:24 +08:00
Your Name
1a09b0250a
chore(ci): trigger Gitea Actions again
2026-05-02 15:32:55 +08:00
Your Name
ed726253e2
chore(ci): trigger Gitea Actions
2026-05-02 15:20:54 +08:00
Your Name
ec5eaef31c
chore(ci): enable Gitea Actions workflows
2026-05-02 15:20:01 +08:00
Your Name
84ba3216ee
feat(notifications): tag autonomous repair actions with [AUTO] prefix
...
Code Review / ai-code-review (push) Successful in 57s
CD Pipeline / tests (push) Successful in 2m36s
CD Pipeline / build-and-deploy (push) Failing after 31m11s
CD Pipeline / post-deploy-checks (push) Has been skipped
Per user request: every AI-driven repair must surface a Telegram trace
even when it succeeds, so nobody can later deny what the autonomy did.
Adds 🤖 [AUTO] markers and an explicit `Actor: leWOOOgo (autonomous)`
line to both success and failure status messages emitted by
_push_auto_repair_result, making them clearly distinguishable from
human-clicked approval cards.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com >
2026-05-02 12:49:43 +08:00
Your Name
3059897318
feat(governance): auto-deprecate low-trust unused playbooks (>30d)
...
Code Review / ai-code-review (push) Successful in 41s
CD Pipeline / tests (push) Successful in 3m29s
CD Pipeline / post-deploy-checks (push) Has been cancelled
CD Pipeline / build-and-deploy (push) Has been cancelled
trust_drift previously fired alerts forever for playbooks stuck below
the 0.2 threshold. With user authorization for governance-class
auto-fixes, check_trust_drift now retires playbooks that have been
unused for 30+ days (or never used and created 30+ days ago) by
flipping status to 'deprecated' before alerting.
Alerts now report drifted_count, auto_deprecated_count, and the kept
playbook_ids that still need human review (those in their 30d trial
window). Existing alert noise from the four currently-drifted
playbooks should drop to whatever fraction is genuinely in trial.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com >
2026-05-02 12:31:37 +08:00
Your Name
607358c4dd
fix(approval): route SSH actions through SSHProvider on manual approve
...
parse_operation_from_action only knew kubectl and Chinese restart phrases,
so any "ssh host '...'" action approved via Telegram fell through to
"Could not parse operation type" and reported a fake failure even though
the LLM had proposed a valid host repair.
Adds OperationType.SSH_HOST, makes the parser detect ssh prefixes (with
optional flags / user@host) before kubectl patterns, and routes the
SSH_HOST branch in approval_execution.execute_in_background through
SSHProvider with the same tool keywords decision_manager uses
(ssh_docker_prune / ssh_docker_restart / ssh_systemctl_restart /
ssh_diagnose). Unroutable SSH actions now fail loudly with a descriptive
error instead of silently breaking.
Trigger: 2026-05-02 incidents INC-20260502-D6D0B7 / E12EE4 / 557055
were approved by the user but executor reported "Could not parse" and
left the alerts pending.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com >
2026-05-02 12:31:37 +08:00
Your Name
3156ff1c69
feat(aiops): add ssh_docker_prune to auto-repair flywheel for disk-full alerts
...
Adds Group B SSH MCP tool ssh_docker_prune (image+volume+builder prune
with ≥75% disk usage gate) and routes "docker prune" actions through it.
Flips HostDiskUsageHigh from auto_repair=false to true with mcp_provider
routing labels so the flywheel can self-heal next disk-full event without
hitting the emergency_channel Telegram path.
Trigger: 2026-05-01 → 05-02 Telegram alert storm (peak 53/hr) caused by
empty ssh-mcp-key/known_hosts secret rejecting all SSH and forcing every
disk-full alert through "Host key is not trusted → escalate" loop.
known_hosts patched live; this commit closes the playbook gap so the
next occurrence resolves without manual intervention.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com >
2026-05-02 12:31:37 +08:00
Your Name
8cf559215c
docs(awooop): add Phase 1 Isolation Foundation implementation plan (ADR-106 P1)
2026-05-02 12:28:33 +08:00
Your Name
443947ffa1
fix(ci): avoid code review sigpipe on large diffs [skip ci]
2026-05-01 20:59:14 +08:00
AWOOOI CD
329849a559
chore(cd): deploy 7795f02 [skip ci]
2026-05-01 20:53:02 +08:00
Your Name
7795f027d2
fix(aiops): persist emergency intervention traces
CD Pipeline / tests (push) Successful in 2m56s
Code Review / ai-code-review (push) Failing after 39s
CD Pipeline / build-and-deploy (push) Successful in 12m54s
CD Pipeline / post-deploy-checks (push) Successful in 4m40s
2026-05-01 20:34:33 +08:00
Your Name
8e49f2ea88
fix(ci): preserve ssh mcp known hosts [skip ci]
2026-05-01 17:18:32 +08:00
AWOOOI CD
b72eac0712
chore(cd): deploy 433f7b0 [skip ci]
2026-05-01 17:08:42 +08:00
Your Name
433f7b068e
fix(aiops): close ssh and telegram remediation gaps
CD Pipeline / tests (push) Successful in 2m7s
Code Review / ai-code-review (push) Successful in 42s
CD Pipeline / build-and-deploy (push) Successful in 13m14s
CD Pipeline / post-deploy-checks (push) Successful in 4m29s
2026-05-01 16:53:02 +08:00
Your Name
3650fc727a
docs(ci): record runner user service takeover state
Code Review / ai-code-review (push) Successful in 45s
2026-05-01 16:30:54 +08:00
Your Name
e7991b8e6c
fix(ci): keep runner installer idempotent without restart
Code Review / ai-code-review (push) Successful in 42s
2026-05-01 16:27:37 +08:00
Your Name
bc295eaec2
fix(ci): allow user service for gitea host runner
Code Review / ai-code-review (push) Has been cancelled
2026-05-01 16:24:45 +08:00
Your Name
cb5ab900c4
fix(ci): preserve gitea runner jobs on shutdown
Code Review / ai-code-review (push) Successful in 46s
2026-05-01 16:16:27 +08:00
AWOOOI CD
f72419dd17
chore(cd): deploy b0da6da [skip ci]
2026-05-01 15:27:48 +08:00
Your Name
b0da6da1e9
feat(aiops): structure agent loop shadow output
CD Pipeline / tests (push) Successful in 2m50s
Code Review / ai-code-review (push) Successful in 33s
CD Pipeline / build-and-deploy (push) Failing after 25m48s
CD Pipeline / post-deploy-checks (push) Has been cancelled
2026-05-01 15:09:57 +08:00
AWOOOI CD
f53d7e5584
chore(cd): deploy f8e4497 [skip ci]
2026-05-01 14:41:18 +08:00
Your Name
f8e44971c1
feat(aiops): enable read-only agent loop canary
CD Pipeline / tests (push) Successful in 1m43s
Code Review / ai-code-review (push) Successful in 31s
CD Pipeline / build-and-deploy (push) Successful in 10m22s
CD Pipeline / post-deploy-checks (push) Successful in 4m3s
2026-05-01 14:20:16 +08:00
AWOOOI CD
33a7148916
chore(cd): deploy b6cf616 [skip ci]
2026-05-01 14:02:59 +08:00
Your Name
b6cf616707
fix(aiops): harden agent tool permission names
CD Pipeline / tests (push) Successful in 1m32s
Code Review / ai-code-review (push) Successful in 27s
CD Pipeline / build-and-deploy (push) Successful in 8m26s
CD Pipeline / post-deploy-checks (push) Successful in 3m37s
2026-05-01 13:52:33 +08:00
AWOOOI CD
1fe75e9f99
chore(cd): deploy 6ec3f11 [skip ci]
2026-05-01 13:45:55 +08:00
Your Name
6ec3f116fd
fix(ci): normalize migration database url for psql
CD Pipeline / tests (push) Successful in 1m30s
Code Review / ai-code-review (push) Successful in 27s
CD Pipeline / build-and-deploy (push) Successful in 13m20s
CD Pipeline / post-deploy-checks (push) Successful in 3m36s
2026-05-01 13:30:32 +08:00
Your Name
7e4d995e4b
feat(aiops): add mcp agent loop foundation
CD Pipeline / tests (push) Successful in 1m59s
Code Review / ai-code-review (push) Successful in 28s
run-migration / migrate (push) Failing after 24s
CD Pipeline / post-deploy-checks (push) Has been cancelled
CD Pipeline / build-and-deploy (push) Has been cancelled
2026-05-01 13:21:19 +08:00
Your Name
9db87f177e
fix(aiops): suppress repeated llm alert loops
CD Pipeline / tests (push) Successful in 1m37s
Code Review / ai-code-review (push) Successful in 28s
CD Pipeline / post-deploy-checks (push) Has been cancelled
CD Pipeline / build-and-deploy (push) Has been cancelled
2026-05-01 13:02:07 +08:00
Your Name
3691402561
chore(cd): deploy 11673d80 api [skip ci]
2026-05-01 12:52:23 +08:00
Your Name
11673d80ea
fix(aiops): route backup decisions through ssh
CD Pipeline / tests (push) Successful in 1m35s
Code Review / ai-code-review (push) Successful in 34s
CD Pipeline / post-deploy-checks (push) Has been cancelled
CD Pipeline / build-and-deploy (push) Has been cancelled
2026-05-01 12:50:01 +08:00