diff --git a/docs/LOGBOOK.md b/docs/LOGBOOK.md index e7521d47..5ec011f1 100644 --- a/docs/LOGBOOK.md +++ b/docs/LOGBOOK.md @@ -7751,3 +7751,87 @@ auto_repair_24h=6 - 目前 production smoke 沒有新的 auto-repair 事件可驗證 fallback 寫入,因此仍不能宣稱完整閉環;這是正確保守判讀。 - 下一步 T14b:等下一筆 `auto_repair=true` 事件或設計安全 live-fire,驗證 `auto_repair_executions -> incident_evidence.verification_result -> learning/KM -> truth-chain auto_repaired_verified` 是否全鏈路成立;同時補 auto-approved approval execution 的 incident linkage / durable execution record。 - 目前整體進度更新:約 80%。 + +### 2026-05-13 — AwoooP truth-chain T14b:auto-approved execution 補 incident linkage 與 durable evidence(production deployed) + +**live diagnosis**: + +- CS2 `auto_approve_rule_engine` 與 CS3 `auto_approve_llm_cs3` 的高信心自動執行路徑,是先呼叫 `ApprovalExecutionService.execute_approved_action()`,再建立 incident。 +- executor 執行當下沒有 `incident_id`,因此 post-execution verifier、KM writeback、incident resolve、`auto_repair_executions` 都無法串回同一張告警。 +- CS3 另有一個實際斷點:auto approval 沒有把 DB 內 `approval.id` 帶給 executor,會讓執行狀態回寫到錯的 transient id。 + +**變更**: + +- `ApprovalExecutionService.finalize_auto_approved_execution()` 新增為「不重跑 action,只補證據鏈」的收斂點: + - 寫入 `auto_repair_executions`,`triggered_by=auto_approve_*`。 + - 補 incident-linked timeline event。 + - 以自動修復模式寫 KM。 + - 呼叫 `PostExecutionVerifier`,`action_taken=auto_repair_playbook:*`,讓 fallback evidence 可取得 `matched_playbook_id`。 + - 成功後 resolve incident。 + - `NO_ACTION` / `OBSERVE` / `INVESTIGATE` 不算自動修復,避免 KPI 污染。 +- CS2 / CS3 在 incident 建立與 `update_incident_id()` 後呼叫 finalize。 +- CS3 補 `_cs3_auto_approval.id = approval.id` 與 `service.update_execution_status()`。 +- `requested_by` 判斷從只接受 `auto_approve` 改成接受 `auto_approve*`,避免 `auto_approve_rule_engine` / `auto_approve_llm_cs3` 被 KM 誤標成人工修復。 + +**local verification**: + +```text +python3 -m py_compile apps/api/src/services/approval_execution.py apps/api/src/api/v1/webhooks.py apps/api/tests/test_approval_execution_auto_approved_finalize.py +OK + +ruff check --select F821 apps/api/src/services/approval_execution.py apps/api/src/api/v1/webhooks.py apps/api/tests/test_approval_execution_auto_approved_finalize.py +OK + +pytest tests/test_approval_execution_auto_approved_finalize.py tests/test_approval_execution_no_action.py tests/test_learning_chain_e2e.py tests/test_awooop_truth_chain_service.py -q +26 passed + +pytest tests/test_post_execution_verifier.py tests/test_learning_chain_e2e.py tests/test_awooop_truth_chain_service.py tests/test_platform_router_order.py tests/test_cs1_auto_execute.py tests/test_cs3_auto_execute.py tests/test_approval_execution_auto_approved_finalize.py -q +77 passed + +pytest tests/test_rule_engine_auto_execute.py tests/test_alertmanager_rule_bypass.py tests/test_approval_execution_auto_approved_finalize.py -q +31 passed +``` + +**production deploy / smoke(完成)**: + +```text +Commit: 596f2f68 fix(awooop): link auto approved execution evidence +Gitea: +2066 code-review 596f2f68 -> success +2065 CD Pipeline 596f2f68 -> success + tests -> success + build-and-deploy -> success + post-deploy-checks -> success +Deploy marker: edba52f4 chore(cd): deploy 596f2f6 [skip ci] + +K8s image: +awoooi-api 192.168.0.110:5000/awoooi/api:596f2f682094d0916f6a18a6f50e7667e4ca86ff +awoooi-worker 192.168.0.110:5000/awoooi/api:596f2f682094d0916f6a18a6f50e7667e4ca86ff +awoooi-web 192.168.0.110:5000/awoooi/web:596f2f682094d0916f6a18a6f50e7667e4ca86ff + +health: +https://awoooi.wooo.work/api/v1/health -> 200 + +quality summary, hours=24, limit=30: +verified_auto_repair_total=0 +production_claim.can_claim_full_auto_repair=false +by_verdict: + manual_required_no_action=17 + received_only=12 + approval_required=1 + +DB baseline after deploy time 2026-05-13T11:19:27Z: +auto_repair_since_deploy=0 +auto_approved_since_deploy=0 +verified_evidence_since_deploy=0 +auto_repair_24h=5 +auto_approved_24h=0 +verified_evidence_24h=0 +``` + +判讀: + +- T14b 已完成並推版:下一筆 CS2/CS3 auto-approved real execution 會留下 incident-linked `auto_repair_executions`、timeline、KM、verifier evidence,不再只停留在 Telegram / log。 +- production smoke 尚未出現部署後新的 auto-approved 或 auto-repair live event,因此仍不能宣稱完整閉環已被 production live-fire 證明。 +- 下一步 T14c:用安全 live-fire 或等待自然告警,驗證 `auto_approve_* -> auto_repair_executions -> incident_evidence.verification_result -> learning/KM -> truth-chain auto_repaired_verified` 實際打通;並把 Telegram 卡片改成明確顯示「目前跑到哪個節點 / 是否已自動修復 / 是否轉人工」。 +- 目前整體進度更新:約 82%。 diff --git a/docs/superpowers/specs/2026-04-15-MASTER-ai-autonomous-flywheel-v2.md b/docs/superpowers/specs/2026-04-15-MASTER-ai-autonomous-flywheel-v2.md index f210d36f..fba07b08 100644 --- a/docs/superpowers/specs/2026-04-15-MASTER-ai-autonomous-flywheel-v2.md +++ b/docs/superpowers/specs/2026-04-15-MASTER-ai-autonomous-flywheel-v2.md @@ -2040,6 +2040,14 @@ Phase 6 完成後 - Smoke:quality summary 仍為 `verified_auto_repair_total=0`、`production_claim=false`;deploy 後尚無新 auto-repair 事件(`auto_repair_since_deploy=0`),所以不能宣稱完整閉環,只能宣稱「未來 auto-repair verifier 結果會有 durable evidence target」。 - 下一步 T14b:等待下一筆 `auto_repair=true` 事件或設計安全 live-fire,驗證 `auto_repair_executions -> incident_evidence.verification_result -> learning/KM -> truth-chain auto_repaired_verified` 全鏈路;並補 auto-approved approval execution 的 incident linkage / durable execution record。 +**T14b auto-approved execution incident linkage production deployed(2026-05-13 台北)**: +- 觸發:CS2 `auto_approve_rule_engine` 與 CS3 `auto_approve_llm_cs3` 會先執行 action、再建立 incident;executor 當下沒有 `incident_id`,導致 `auto_repair_executions`、timeline、KM、PostExecutionVerifier、incident resolve 無法串回同一事件。CS3 另缺 `_cs3_auto_approval.id = approval.id`,會讓 execution status 回寫到 transient id。 +- 修正:新增 `ApprovalExecutionService.finalize_auto_approved_execution()`,在 incident 建立後補 durable trace,不重新執行 action;內容包含 `auto_repair_executions(triggered_by=auto_approve*)`、incident-linked timeline、KM、`PostExecutionVerifier(action_taken=auto_repair_playbook:*)`、成功後 resolve incident。`NO_ACTION` / `OBSERVE` / `INVESTIGATE` 不算自動修復。 +- Webhook:CS2 / CS3 在 `update_incident_id()` 後呼叫 finalize;CS3 補 DB approval id 與 `update_execution_status()`;`requested_by` 判斷改為 `auto_approve*`,避免 `auto_approve_rule_engine` / `auto_approve_llm_cs3` 被誤標成人工修復。 +- Production:`596f2f68 fix(awooop): link auto approved execution evidence` 已推 Gitea main;Gitea run `2066` code-review success、run `2065` tests/build-and-deploy/post-deploy-checks 全 success;deploy marker `edba52f4`;API/Worker image `192.168.0.110:5000/awoooi/api:596f2f682094d0916f6a18a6f50e7667e4ca86ff`,Web image `192.168.0.110:5000/awoooi/web:596f2f682094d0916f6a18a6f50e7667e4ca86ff`,health 200。 +- Smoke:quality summary 仍為 `verified_auto_repair_total=0`、`production_claim=false`;deploy 後尚無新 auto-approved 或 auto-repair live event(`auto_repair_since_deploy=0`、`auto_approved_since_deploy=0`、`verified_evidence_since_deploy=0`),所以仍不能宣稱完整閉環已 production live-fire verified。 +- 下一步 T14c:用安全 live-fire 或等待自然告警驗證 `auto_approve_* -> auto_repair_executions -> incident_evidence.verification_result -> learning/KM -> truth-chain auto_repaired_verified`;並把 Telegram 卡片改為明確顯示流程節點、是否自動修復、是否轉人工。 + --- ### 2026-04-20 晚 (台北) — C1-C4 全流程串接 — Playbook 鏈路保護(commit de2d34d)