docs(reboot): record cd readback closure [metadata-only]
All checks were successful
CD Pipeline / workflow-shape (push) Successful in 1s
CD Pipeline / cancel-stale-cd (push) Has been skipped
CD Pipeline / tests (push) Successful in 1m8s
CD Pipeline / build-and-deploy (push) Has been skipped
CD Pipeline / post-deploy-checks (push) Has been skipped

This commit is contained in:
Your Name
2026-07-03 10:46:49 +08:00
parent 12af61a580
commit da1f20ffcb
3 changed files with 26 additions and 10 deletions

View File

@@ -1,3 +1,19 @@
## 2026-07-03 — 10:44 CD bounded wrapper production readback 成功
**完成內容**
- Gitea CD `#4558` 已跑到本輪 `run_docker_step` wrapperAPI build / API push sha / API push latest / Web build / Web push sha / Web push latest 全部 `rc=0`,未觸發 `cd_docker_step_timeout`
- CD 推出 deploy marker `bc8d3af3f chore(cd): deploy dacdd90 [skip ci]`public queue log 顯示 `deploy_marker=dacdd90``job_succeeded=true``production_deploy_readback_matched=true`。Gitea remote `main` 已 fast-forward 到 deploy marker。
- Production `/api/v1/agents/reboot-auto-recovery-slo-scorecard` 已讀回最新 API 行為:`active_blocker_count=10``readiness_percent=60``runtime_metric_runtime_readback_added_blockers=["windows99_vmware_autostart_config_not_ready","windows99_update_no_auto_reboot_policy_not_ready"]``windows99_update_no_auto_reboot_ready=false`
- 這不是 SLO 完成;這是正確 fail-closed。先前被 nested evidence 壓住的 Windows99 VMware service / autostart config / Windows Update policy blocker 已進 `active_blockers` 與 fixed next action。下一步固定為 `restore_windows99_missing_vmx_source_for_aliases_then_rerun_no_secret_collector_and_scorecard_no_vm_power_change`
**已跑驗證**
- Gitea remote ref`main=bc8d3af3f`deploy marker subject `chore(cd): deploy dacdd90 [skip ci]`
- Public queue readbackCD `#4558` log success marker / production deploy readback matched。
- Production scorecard readbackactive blockers `10`,新增 Windows99 config / policy blockers`status=blocked_reboot_auto_recovery_slo_not_ready`
**仍維持**
- 未讀 secret / token / `.env` / raw sessions / SQLite / auth未使用 GitHub / gh未 workflow_dispatch未啟動或關閉 VM未重啟 host / service未 Docker / Nginx / K3s / DB / firewall restart未 DROP / TRUNCATE / restore / prune / delete / force push。
## 2026-07-03 — 10:30 CD Docker build/push bounded timeout 與 queue classifier
**完成內容**

View File

@@ -13,20 +13,20 @@
本段覆蓋舊的「單次重啟後人工排查」做法。所有後續狀態回報必須依此順序推進;噪音若會遮蔽 P0就掛回同一列不另開支線。
### 2026-07-03 10:30 最新 P0 覆蓋排序
### 2026-07-03 10:44 最新 P0 覆蓋排序
下表覆蓋 2026-06-30 初始事故列;舊表保留為歷史追蹤。所有新插入需求必須掛在本表,不得再分散成臨時支線。
| 優先 | 狀態 | 工作項 | 最新證據 | 下一步 / 完成條件 |
|------|------|--------|----------|-------------------|
| P0-1 | BLOCKED_HOST_WINDOWS | 全主機 reboot auto-detection / auto-trigger / 10 分鐘恢復 SLO | 2026-07-03 08:59 production scorecard`status=blocked_reboot_auto_recovery_slo_not_ready``active_blocker_count=8``readiness_percent=67``primary_blocker=reboot_event_required_host_unreachable``can_claim_all_services_recovered_within_target=false`。no-write artifact `/tmp/awoooi-reboot-continue-20260703-085205`111 socket probe 仍 timeout / unreachable99 RDP / VMConnect reachable 但 uptime unknown188 socket probe reachable前次 host probe 仍為 `systemd_state=degraded` / `startup_active=failed`Production readback 同步顯示 `windows99_vmware_readback_present=true``windows99_vmware_config_ready=false``windows99_vmware_power_ready=false``windows99_missing_vmx_aliases=["111"]``windows99_powered_off_aliases=["111","112","120","121","188"]`。09:22 source/API contract 已把 unreliable console artifact 額外投影成 `windows99_console_clipboard_unreliable` 類 active blocker。 | 先收斂 99 / Windows99 / VMware 與 111恢復 no-secret management channel 或取得 validator 可接受的完整 console Verify stdout讀回 VMX / VM power / host uptime再 rerun host probe + reboot-event detector不得 reboot、不得 VM power change、不得讀 Windows 密碼,不得用 RDP clipboard 片段當完成證據。 |
| P0-1 | BLOCKED_HOST_WINDOWS | 全主機 reboot auto-detection / auto-trigger / 10 分鐘恢復 SLO | 2026-07-03 10:44 production scorecard`status=blocked_reboot_auto_recovery_slo_not_ready``active_blocker_count=10``readiness_percent=60`active blockers 仍含 reboot event / host unreachable / 99+111+Windows99 VMX / guest power且新增 `windows99_vmware_autostart_config_not_ready``windows99_update_no_auto_reboot_policy_not_ready`。no-write artifact `/tmp/awoooi-reboot-continue-20260703-085205`111 socket probe 仍 timeout / unreachable99 RDP / VMConnect reachable 但 uptime unknown188 socket probe reachable前次 host probe 仍為 `systemd_state=degraded` / `startup_active=failed`。 | 先收斂 99 / Windows99 / VMware 與 111恢復 no-secret management channel 或取得 validator 可接受的完整 console Verify stdout讀回 VMX / VM power / host uptime再 rerun host probe + reboot-event detector不得 reboot、不得 VM power change、不得讀 Windows 密碼,不得用 RDP clipboard 片段當完成證據。 |
| P0-2 | BLOCKED_EDGE_PRIVILEGED_APPLY | Deploy / reboot 期間 public 502 維護頁與外部 fallback | Gitea CD `#4519` 已推 deploy marker `3aca484 -> a94ddd5`,但 marker 後 public probe 仍讀到 `https://awoooi.wooo.work/api/v1/health` raw `502`、fallback header/body 空live 188 `/etc/nginx/sites-enabled/awoooi.wooo.work.conf` 缺 maintenance fallback`/var/www/maintenance/maintenance.html` 缺失,`ollama@188` 無 passwordless sudo。 | 先跑 `scripts/reboot-recovery/public-maintenance-edge-fallback-apply.sh --check` 留 drift receipt具備 privileged channel 後執行 `--apply`,要求 backup、`nginx -t`、reload、public route probe 全綠。不得讀密碼、不得用 app restart 掩蓋 edge fallback drift。 |
| P0-3 | BLOCKED_DEPLOY_READBACK | 所有產品 / 網站版本與資料最新性 | Gitea `main=647d81163` 已含 Windows99 collected verifier / next safe action sourceproduction SLO route 仍未讀回該 source`active_blocker_count=8``runtime_metric_runtime_readback_added_blockers=[]``windows99_update_no_auto_reboot_ready=true`。Public Gitea CD `#4556` 仍為 `Running`build log 顯示 Docker build lock 等待後自清空鎖並進入 API/Web build/push但尚無 deploy marker / production deploy readback。10:30 source 已補 CD Docker build/push bounded timeout 與 queue classifier避免下一輪只剩無 ETA 的 `Running`。Stock freshness 仍為 `status=ok``latest_trading_date=2026-07-02`、blockers `[]`。 | 先讓 CD 完成 deploy marker / production readback完成條件是 source SHA、deploy marker、runtime endpoint、public route watch、freshness 四層一致。CD `Running`、Gitea main 最新或 test green 都不得單獨宣稱 production 最新。 |
| P0-4 | BLOCKED_WINDOWS99_AUTOSTART | 192.168.0.99 VMware 自動啟動與 VM guest 111 / 188 / 120 / 121 / 112 | 10:15 source/API patchProduction Windows99 nested readback 仍為 `missing_vmx_aliases=["111"]``powered_off_aliases=["111","112","120","121","188"]`,且 nested `service_blockers=["VMAuthdService","VMnetDHCP"]``policy_blockers=["windows_update_policy_readback_missing"]` 不能再被壓扁成單一 guest power blockerAPI overlay 已補 `windows99_vmware_autostart_config_not_ready` / `windows99_update_no_auto_reboot_policy_not_ready` runtime-readback-added blocker。09:53 no-secret management probe99 reachable、RDP / Hyper-V VMConnect reachableWinRM timeoutMac vantage collector 仍為 `ssh_batchmode_auth_ready=0` / `blocked_ssh_publickey_auth_missing`production vantage 的 no-secret collector readback 曾回 collected但完成條件仍取決於 VMX / service / policy / power 全綠。 | 先修 111 VMX source 與 VMware service / policy readback使 check-mode package 能解釋 powered-off aliases不得從 scorecard 直接重啟 Windows service、改 registry、啟動 / 關閉 VM 或 host reboot。完成條件是 VMX config ready、guest power ready、99 uptime known、all required host reachable且 active blocker matrix 顯示 service / policy / VMX / power 分開收斂。 |
| P0-3 | PARTIAL_GREEN_DEPLOYED_SOURCE | 所有產品 / 網站版本與資料最新性 | Gitea `main=bc8d3af3f chore(cd): deploy dacdd90 [skip ci]`CD `#4558` log 顯示 API/Web build/push wrapper 全部 `rc=0`、deploy marker `dacdd90`、production deploy readback matched。Production SLO route 已讀回最新 API 行為:`runtime_metric_runtime_readback_added_blockers=["windows99_vmware_autostart_config_not_ready","windows99_update_no_auto_reboot_policy_not_ready"]``windows99_update_no_auto_reboot_ready=false`。Stock freshness 仍為 `status=ok``latest_trading_date=2026-07-02`、blockers `[]`。 | 版本層已補到 AWOOOI production完成整體 P0 仍需收斂 Windows99 VMX / service / policy / power 與 host reboot-event blockers。後續每個 public product 仍要維持 source SHA、deploy marker、runtime endpoint、public route watch、freshness 四層一致。 |
| P0-4 | BLOCKED_WINDOWS99_AUTOSTART | 192.168.0.99 VMware 自動啟動與 VM guest 111 / 188 / 120 / 121 / 112 | 10:44 production readback 已把 nested Windows99 evidence 上卷到 active blockers`windows99_vmware_vmx_missing``windows99_vmware_guest_power_not_ready``windows99_vmware_autostart_config_not_ready``windows99_update_no_auto_reboot_policy_not_ready``runtime_metric_runtime_readback_added_blockers` 含 config / policy 兩項,`windows99_update_no_auto_reboot_ready=false`。09:53 no-secret management probe99 reachable、RDP / Hyper-V VMConnect reachableWinRM timeoutMac vantage collector 仍為 `ssh_batchmode_auth_ready=0` / `blocked_ssh_publickey_auth_missing`completion 仍取決於 VMX / service / policy / power 全綠。 | 先修 111 VMX source 與 VMware service / policy readback使 check-mode package 能解釋 powered-off aliases不得從 scorecard 直接重啟 Windows service、改 registry、啟動 / 關閉 VM 或 host reboot。完成條件是 VMX config ready、guest power ready、99 uptime known、all required host reachable且 active blocker matrix 顯示 service / policy / VMX / power 分開收斂。 |
| P0-5 | RUNTIME_READY_BACKUP_RECEIPT_GAP | Gitea / 主機 / DB / 網站 / 服務 / 套件 / 工具 / log 備份監控告警 | Gitea repo bundle readback readyexpected `12`、rows `12`、missing `0`、failed `0`、sample restore dry-run okbackup core green。2026-07-03 source / runtime 已部署 `awoooi_backup_alert_receipt_*` 指標與 Prometheus rules110 exporter 讀回 88 個 stage requirement、188 讀回 12 個 stage requirement`BackupAlertReceiptMetricMissing*` inactive`BackupAlertReceiptStageMissing` 已修成每 `host / receipt_channel` 聚合 pending110 一條、188 一條。 | 補 `/backup/alert-receipts/*.last_success` 脫敏 marker下一層仍要補 Gitea full dump、DB/settings/issues/packages/LFS、所有工具與 log 全量備份監控。 |
| P0-6 | RUNTIME_READY_ALERT_RECEIPT_GAP | 主機關機 / 重啟 / SLO miss / backup failure Telegram 告警 | Reboot per-blocker alert 與 backup receipt alert rules 已 deploy/readbackbackup receipt 缺段不再產生 100 條 stage 噪音,現在聚合成 110 / 188 兩條 host-level pending。scorecard 仍有 8 個 reboot active blockers尚未完成 shutdown / reboot / backup alert 的 production 脫敏 delivery receipt 全矩陣。 | 補 alert receipt readbackhost down、host up、SLO miss、Windows99 blocker、backup stale/failed、deploy 502、freshness stale完成條件是每類告警都有 sent / received / dedup / escalation evidence。 |
| P0-6 | RUNTIME_READY_ALERT_RECEIPT_GAP | 主機關機 / 重啟 / SLO miss / backup failure Telegram 告警 | Reboot per-blocker alert 與 backup receipt alert rules 已 deploy/readbackbackup receipt 缺段不再產生 100 條 stage 噪音,現在聚合成 110 / 188 兩條 host-level pending。scorecard 目前有 10 個 reboot active blockers其中 Windows99 config / policy blocker 已可供 per-blocker alert routing 使用;尚未完成 shutdown / reboot / backup alert 的 production 脫敏 delivery receipt 全矩陣。 | 補 alert receipt readbackhost down、host up、SLO miss、Windows99 blocker、backup stale/failed、deploy 502、freshness stale完成條件是每類告警都有 sent / received / dedup / escalation evidence。 |
| P0-7 | SOURCE_READY_SLA_AUTOMATION | 固定排查順序、ETA / wait reason、自動化判斷與修復 | Scorecard 已固定 `current_phase=host_boot_detection_blocked``eta_or_wait_reason=reboot_event_readback_missing_eta_unavailable``primary_blocker=reboot_event_required_host_unreachable`、fixed triage order 與 next safe action08:23 artifact 固定下一步為 no-secret Windows99 verify / host probe rerun。09:22 `active_blocker_action_matrix` 已能把 unreliable console artifact 指向 `windows99_console_or_no_secret_management_channel` 與固定 next safe action。10:30 `read-public-gitea-actions-queue.py` 已能把 `BLOCKER cd_docker_step_timeout` 上卷成 `blocked_cd_docker_step_timeout` / `latest_visible_cd_docker_step_timeout*` / fixed safe-next-action10 分鐘內自動恢復仍未達標。 | 把每個 blocker 的 next_safe_action、post_verifier、forbidden_actions 接到自動 work item / Telegram / scorecard完成條件是重啟後自動判斷、主動告警、主動 rerun verifier不再人工臨場猜流程。 |
| P0-8 | PARTIAL_READY_POLICY | Windows99 禁止 Windows Update 無預警重啟 | 10:15 source/API patch若 Windows99 nested verifier 回 `policy_blockers=["windows_update_policy_readback_missing"]`production API 必須 fail-closed 重新提升 `windows99_update_no_auto_reboot_policy_not_ready`不得只信舊 Prometheus `windows99_update_no_auto_reboot_ready=true`。 | 保留 no-secret verifier補週期性 policy readback 與 Telegram drift alert完成條件是 Windows Update policy drift / missing readback 會自動告警且不需讀 secret不得從 scorecard 直接 apply registry。 |
| P0-8 | BLOCKED_POLICY_READBACK | Windows99 禁止 Windows Update 無預警重啟 | 10:44 production readback 已 fail-closed`windows99_update_no_auto_reboot_ready=false`active blockers 含 `windows99_update_no_auto_reboot_policy_not_ready`來源是 nested verifier 的 `policy_blockers=["windows_update_policy_readback_missing"]`。 | 保留 no-secret verifier補週期性 policy readback 與 Telegram drift alert完成條件是 Windows Update policy drift / missing readback 會自動告警且不需讀 secret不得從 scorecard 直接 apply registry。 |
| 優先 | 狀態 | 工作項 | 2026-06-30 證據 | 下一步 / 完成條件 |
|------|------|--------|------------------|-------------------|

View File

@@ -65,11 +65,11 @@
| 6 | CIR-P0-RBT-006 | P0 | 「所有主機關機立刻 Telegram 告警,重啟後也要告警,其他告警一併完整思考」 | Down / shutdown suspected / reboot detected / reboot recovered / SLO missed / backup failed / freshness stale / CPU pressure / Gitea queue 告警矩陣 | HostDown / HostRebootEventDetected / RebootAutoRecoverySLOMissed 已存在per-blocker reboot alerts 與 backup receipt rules 已 deploy/readback。Backup receipt 缺段已從 100 條 stage 噪音收斂為 110 / 188 兩條 host-level pending仍需完整 shutdown/up E2E receipt | 補 Prometheus / Alertmanager active/resolved 與 outbound receiptbackup alert 先補 `/backup/alert-receipts/*.last_success` 脫敏 marker不送測試 secret、不重啟主機 |
| 7 | CIR-P0-RBT-007 | P0 | 「所有備份包含主機、DB、網站、服務、套件、工具、日誌都沒有監控告警」 | Backup observability coveragebackup job inventory、last success、freshness、offsite、restore drill、Telegram/AwoooP receipt | 已有 backup health exporter / alert rules / Gitea bundle restore dry-run2026-07-03 runtime 讀回 110 有 88 個 receipt stage requirement、188 有 12 個,`BackupAlertReceiptMetricMissing*` inactive`BackupAlertReceiptStageMissing` 聚合 pending 110 / 188 各一條 | 補 `/backup/alert-receipts/*.last_success`;再補 Gitea full dump / DB / settings / issues / packages / LFS 與所有工具/log 全量備份監控 |
| 8 | CIR-P0-RBT-008 | P0 | 「每次重啟排查都不一樣,也不知道多久恢復,不符合 SLA」 | 固定化 reboot runbookfixed triage order、ETA、active blocker、remaining seconds、owner lane、next command | Production scorecard readback 已固定 `status=blocked_reboot_auto_recovery_slo_not_ready`、readiness `67%`、active blockers `8`、primary `reboot_event_required_host_unreachable`09:22 source/API contract 已把 unreliable console artifact 接到 `active_blocker_action_matrix`、owner lane 與 next safe action | 優先收斂 99 no-secret Verify / 111 reachability / 188 startup failed/degraded不得用不同排查路徑繞過 scorecard |
| 9 | CIR-P0-RBT-009 | P0 | 「所有產品、網站都要是最新版本;版本和數據是否最新要驗證」 | Product freshness/version matrixsource commit、deploy marker、runtime image、public health、data freshness、latest source availability | AWOOOI Gitea `main=647d81163` 已含 Windows99 collected verifier next-step source但 production SLO route 仍未讀回該 source`active_blocker_count=8``runtime_metric_runtime_readback_added_blockers=[]``windows99_update_no_auto_reboot_ready=true`Gitea CD `#4556` 仍 Running尚無 deploy marker / production deploy readback。StockPlatform public freshness / ingestion 讀回 `ok`latest trading date `2026-07-02`10:30 已補 CD Docker build/push bounded timeout 與 queue classifier source避免下一輪無 ETA Running。 | 先讀回 AWOOOI deploy marker / production runtime SHA / scorecard 行為;再建立全產品 readback 表product、canonical repo、main SHA、deploy marker、public URL、data freshness、blocked reason |
| 9 | CIR-P0-RBT-009 | P0 | 「所有產品、網站都要是最新版本;版本和數據是否最新要驗證」 | Product freshness/version matrixsource commit、deploy marker、runtime image、public health、data freshness、latest source availability | AWOOOI Gitea `main=bc8d3af3f chore(cd): deploy dacdd90 [skip ci]`CD `#4558` log 顯示 API/Web build/push wrapper 全部 `rc=0`、deploy marker `dacdd90`、production deploy readback matched。Production SLO route 已讀回最新 API 行為`active_blocker_count=10``runtime_metric_runtime_readback_added_blockers=["windows99_vmware_autostart_config_not_ready","windows99_update_no_auto_reboot_policy_not_ready"]``windows99_update_no_auto_reboot_ready=false`。StockPlatform public freshness / ingestion 仍為 `ok`latest trading date `2026-07-02` | 版本層已補到 AWOOOI production下一步仍要建立全產品 readback 表,且收斂 Windows99 VMX / service / policy / power 與 host reboot-event blockers |
| 10 | CIR-P0-GIT-001 | P0 | 「Gitea 儲存庫都不見了Gitea 沒完整備份嗎?」 | Gitea repository identity + backup proof + restore drill不能只看 UI visible要比對 SSH heads、repo path、bundle backup、restore sample | 2026-07-02 production `/api/v1/agents/gitea-repo-bundle-backup-readback` 已 ready9 expected repos present/ok、missing=0、failed=0、checksum_missing=0、bundle_fresh=true、all_expected_ok=true、sample_restore_dry_run_ok=truerepo bundle / restore dry-run 層已關閉,不是 repo missing。 | 維持每日 bundle backup + restore dry-run monitoring另補 Gitea full dump / DB / settings / issues / packages / LFS 備份 readback。禁止刪 repo / 改 visibility / 讀 token / restore 到 production |
| 11 | CIR-P0-CPU-001 | P0 | 「110 / 188 CPU 負載持續過高,為什麼沒監控告警、沒主動修復」 | Sustained CPU pressure automationAlertmanager → controller → evidence → service playbook → verifier → KM writeback | 110 已有 `Host110SustainedModeratePressure`、Gitea playbook、Stock/Postgres evidence188 仍需同級 controller/alerts readback | 下一步接 `postgres_hot_query_or_backup_export_playbook`;並補 188 equivalent readback不以單次下降結案 |
| 12 | CIR-P0-CPU-002 | P0 | 「噪音會影響真問題,要整合一起做」 | Alert noise / real issue correlationbackup aggregate noise、CPU pressure、Gitea queue、Stock freshness 要分清主因與次因 | 部分已在 SOP 註記;仍需統一 correlation scorecard | 建立 incident correlation readbackprimary_blocker、secondary_noise、ignored_noise_reason、evidence_ref |
| 13 | CIR-P0-CD-001 | P0 | 「所有專案都不能推版 / 要看到實作結果」 | Gitea-only CD baseline每次 main push 要有 visible run、deploy marker、production readbackGitHub 不作解法 | AWOOOI main 可推,但目前 latest CD `#4556` 仍 Running 且 production 尚未 readback 最新 sourcesource 已補 `cd_docker_step_timeout` bounded marker 與 queue classifier避免 CD 卡住時無具名 blocker | 先推送 / 讀回 bounded CD classifier接著讀 deploy marker / production runtime將 product governance matrix 接入各產品 Gitea CD readiness |
| 13 | CIR-P0-CD-001 | P0 | 「所有專案都不能推版 / 要看到實作結果」 | Gitea-only CD baseline每次 main push 要有 visible run、deploy marker、production readbackGitHub 不作解法 | AWOOOI latest source 已成功經 Gitea CD `#4558` 部署bounded API/Web build/push wrapper 實跑 `rc=0`deploy marker `bc8d3af3f chore(cd): deploy dacdd90 [skip ci]`production deploy readback matchedqueue reader 也已具備 `cd_docker_step_timeout` classifier | 將 product governance matrix 接入各產品 Gitea CD readinessCD `Running` 未 matched 時不得宣稱最新版已上 production |
| 14 | CIR-P1-AI-001 | P1 | 「AI 專業在哪?要能主動發現、主動修復」 | AI controlled repair loopdetect → classify → candidate → check-mode → controlled apply → post verifier → KM / PlayBook trust | CPU / Gitea / Telegram receipt 已部分落地;全域 AI loop 未全部接上 | 將每個 P0 runbook 補 `candidate_action``controlled_apply_allowed``post_verifier``trust_writeback` |
| 15 | CIR-P1-KM-001 | P1 | 「修復過程、經驗完整沉澱進 SOP整合到目前版本」 | 所有 P0 修復必須同步 LOGBOOK、SOP、PlayBook、workplan ledger不能只留在對話 | 本台帳、LOGBOOK、SOP 已開始補09:22 已把 Windows99 console clipboard 不可靠經驗寫入 SOP v1.108、P0 workplan 與 scorecard regression仍需 API/UI read model | 把本台帳轉成 read-only API / governance UI row並建立 `last_updated` / `evidence_count` |
| 16 | CIR-P1-WORK-001 | P1 | 「所有已開始、進行中、已完成工作全部看清楚」 | 工作狀態盤點Done / In Progress / Blocked / Deferred / Next Action + evidence | 本台帳已有初版 Done/In Progress/Blocked需納入本節新 P0 | 更新下方 Done/In Progress/Blocked把 reboot/backup/VMware/maintenance/CPU 全列入 |
@@ -115,7 +115,7 @@
| Public maintenance fallback runtime readback | Gitea CD `#4459` / deploy marker `8d7a6faaf` 已讓 production scorecard 讀回 `public_maintenance_fallback.ready=true`、raw 5xx=`0`、P0 blockers `11`、readiness `47` |
| Reboot SLO per-blocker 告警投影 | Source 已補 `awoooi_reboot_auto_recovery_slo_active_blocker{blocker=...}``RebootAutoRecoveryActiveBlocker``RebootAutoRecoveryActiveBlockerMetricMissing` 與契約測試 |
| Backup alert receipt runtime contract | Source / runtime 已補 `awoooi_backup_alert_receipt_expected_info``awoooi_backup_alert_receipt_stage_fresh``BackupAlertReceiptMetricMissing*``BackupAlertReceiptStageMissing`、baseline contract、live visibility checker 與 focused testsPrometheus rule 已部署,缺段 alert 已聚合成 110 / 188 host-level pending不送 Telegram、不讀 token |
| CD Docker build/push timeout classifier source | `.gitea/workflows/cd.yaml` 已把 API/Web `docker build` / `docker push` 包進 `run_docker_step`timeout 會輸出 `BLOCKER cd_docker_step_timeout``read-public-gitea-actions-queue.py` 已上卷 `blocked_cd_docker_step_timeout` / `latest_visible_cd_docker_step_timeout*` / fixed safe-next-action本地 focused tests `92 passed` |
| CD Docker build/push timeout classifier production proof | `.gitea/workflows/cd.yaml` 已把 API/Web `docker build` / `docker push` 包進 `run_docker_step`timeout 會輸出 `BLOCKER cd_docker_step_timeout``read-public-gitea-actions-queue.py` 已上卷 `blocked_cd_docker_step_timeout` / `latest_visible_cd_docker_step_timeout*` / fixed safe-next-action本地 focused tests `92 passed`。CD `#4558` live log 已證明 API/Web build/push wrapper 全部 `rc=0`deploy marker `dacdd90`production deploy readback matched |
### In Progress
@@ -135,7 +135,7 @@
| OpenClaw / Gather-style 持續動畫工作室 | route 已存在,已列為 P1 工作項 | 補 production desktop/mobile smoke、AwoooP 導流與截圖證據 |
| AI 專業 UI / 非文字牆 cockpit | 已列為 P2 UX 驗收 | 將長文字區塊收斂成 first-viewport cockpit、cards、flow rows 與 expandable details |
| 10 分鐘 reboot auto-recovery SLA | 2026-07-03 08:23 production scorecard 固定 `8` blockers / readiness `67%`,但 `can_claim_all_services_recovered_within_target=false`artifact `/tmp/awoooi-reboot-verify-only-20260703-082310` 保留 host probe / reboot detector / scorecard | 收斂 99 no-secret Verify、111 reachability、188 startup failed/degraded再 rerun host probe + reboot-event detector |
| AWOOOI latest source production deploy readback | Gitea `main=647d81163` 已前進,public CD `#4556` Runningproduction SLO route 仍是舊版 blocker 行為,尚未讀回 latest source本輪 source 已補 CD timeout classifier等待 push / CD / deploy marker / production readback | 推送 bounded classifier commit 後讀 `read-public-gitea-actions-queue.py --json`、deploy marker、production scorecard未 matched 前不得宣稱最新版本已上 production |
| AWOOOI latest source production deploy readback | Gitea `main=bc8d3af3f`deploy marker `dacdd90`public CD `#4558` log job succeeded / production deploy readback matchedproduction scorecard 已讀回 latest API 行為並把 Windows99 config / policy blockers 上卷 | 下一步轉回 Windows99 VMX / service / policy / power 與 host reboot-event blockers不得因 deploy/readback 成功宣稱 10 分鐘 SLO 完成 |
| 99 Windows / VMware autostart | Source verifier / parser / API readback / collection packet 已完成live 99 Verify output 尚未收集08:23 management probe 顯示 RDP / Hyper-V VMConnect reachable但 SSH BatchMode / WinRM blockedcollector `blocked_ssh_publickey_auth_missing` | 收集 99 no-secret Verify output確認 VM 111/188/120/121/112 running、scheduled task / services / Windows Update policy 全綠 |
| Reboot SLO blocker 收斂 | production scorecard 已固定 `8` blockers / readiness `67%`source / runtime 已補 per-blocker metric/alert剩餘主 blocker 是 host boot detection + Windows99 VMware VMX / guest power | 依具名 blocker 收斂 99/111/188不得用 route green 或 RDP 可見宣稱 SLA 完成 |
| 全備份監控告警 coverage | exporter/rule 已有 host/DB/site/service/package/tool/log coverage 與 backup alert receipt requirementruntime rules 已部署,缺段 alert 聚合成 110 / 188 host-level pendingproduction receipt markers / full Gitea dump / DB/settings/issues/packages/LFS 尚未全綠 | 補 `/backup/alert-receipts/*.last_success`、Gitea full dump 與 restore drill verifier |