fix(reboot): surface windows99 console channel readback
Some checks failed
CD Pipeline / workflow-shape (push) Successful in 0s
CD Pipeline / cancel-stale-cd (push) Has been skipped
CD Pipeline / build-and-deploy (push) Has been cancelled
CD Pipeline / post-deploy-checks (push) Has been cancelled
CD Pipeline / tests (push) Has been cancelled
Some checks failed
CD Pipeline / workflow-shape (push) Successful in 0s
CD Pipeline / cancel-stale-cd (push) Has been skipped
CD Pipeline / build-and-deploy (push) Has been cancelled
CD Pipeline / post-deploy-checks (push) Has been cancelled
CD Pipeline / tests (push) Has been cancelled
This commit is contained in:
@@ -57,9 +57,9 @@
|
||||
|
||||
| 順序 | ID | 優先序 | 使用者插入要求 | 正規化工作項 | 目前狀態 | 下一個可驗證動作 |
|
||||
| --- | --- | --- | --- | --- | --- | --- |
|
||||
| 1 | CIR-P0-RBT-001 | P0 | 「主機重啟後 10 分鐘內全部恢復,且要自動判斷所有主機被重啟」 | 建立 99/110/111/112/120/121/188 reboot event detector + 10 分鐘 SLO scorecard + fixed triage order | 2026-07-02 15:08 live scorecard 已更新:readiness `43%`、active blockers `11`;`windows99_verify_collection` 與 `windows99_management_channel` 已進 API / scorecard;仍缺 fresh all-host 10 分鐘證明,111 不可達,99 uptime / VMware verifier 未閉環 | 優先收斂 99 no-secret management channel / verifier readback 與 111 reachability;不可宣稱 10 分鐘 SLA 已證明 |
|
||||
| 1 | CIR-P0-RBT-001 | P0 | 「主機重啟後 10 分鐘內全部恢復,且要自動判斷所有主機被重啟」 | 建立 99/110/111/112/120/121/188 reboot event detector + 10 分鐘 SLO scorecard + fixed triage order | 2026-07-02 18:28 live scorecard 已更新:readiness `43%`、active blockers `11`;`windows99_verify_collection`、`windows99_management_channel` 與 `windows99_local_console_channel_reachable` 已進 API / scorecard;仍缺 fresh all-host 10 分鐘證明,111 不可達,99 uptime / VMware verifier 未閉環 | 優先收斂 99 local console Verify output / no-secret management channel 與 111 reachability;不可宣稱 10 分鐘 SLA 已證明 |
|
||||
| 2 | CIR-P0-RBT-002 | P0 | 「沒有偵測到主機重啟」 | 修正 host reboot/shutdown/up detection:boot_id / uptime / node exporter / Windows exporter / VMware VM power state 都要進同一事件 | Scorecard 已接 collection packet + management probe;99 host reachable 但 uptime unknown,111 unreachable,stale hosts 仍存在 | 讓 99 verifier / Windows exporter 或等效 no-secret readback 進入 host boot event,並補 111 reachability 證據 |
|
||||
| 3 | CIR-P0-RBT-003 | P0 | 「192.168.0.99 VMWare 要自動啟動,裡面 111/188/120/121/112 也自動啟動」 | Windows 99 VMware host autostart + guest VM autostart contract;VM host 111/188/120/121/112 開機順序與 readback | Source verifier / parser / API readback / collection packet 已完成;management probe 讀回 `host_reachable=true`、RDP open、SSH BatchMode `permission_denied`、WinRM timeout;snapshot active blockers=`windows99_remote_execution_channel_unavailable`、`windows99_vmware_autostart_readback_missing` | 恢復 no-secret management channel 或收集 local console Verify output,再確認 `VMRUN_PRESENT`、scheduled task、VMware services、VM power、VMX present 全綠 |
|
||||
| 3 | CIR-P0-RBT-003 | P0 | 「192.168.0.99 VMWare 要自動啟動,裡面 111/188/120/121/112 也自動啟動」 | Windows 99 VMware host autostart + guest VM autostart contract;VM host 111/188/120/121/112 開機順序與 readback | Source verifier / parser / API readback / collection packet 已完成;management probe 讀回 `host_reachable=true`、RDP open、`2179` VMConnect / console channel open、SSH BatchMode `permission_denied`、WinRM timeout;snapshot active blockers=`windows99_remote_execution_channel_unavailable`、`windows99_vmware_autostart_readback_missing` | 收集 local console Verify output 或恢復 no-secret management channel,再確認 `VMRUN_PRESENT`、scheduled task、VMware services、VM power、VMX present 全綠 |
|
||||
| 4 | CIR-P0-RBT-004 | P0 | 「192.168.0.99 不可因 Windows Update 無預警重開」 | Windows Update reboot policy:active hours / no auto-restart / maintenance window / update notification audit | Source verifier 已補 `WINDOWS_UPDATE_POLICY` 與 `WINDOWS_UPDATE_NO_AUTO_REBOOT_READY`;collection packet 已列 forbidden actions;99 management channel 尚不能收 policy readback | 取得 Verify output;若 policy 不綠,再走 controlled apply,禁止要求或記錄 Windows 密碼 |
|
||||
| 5 | CIR-P0-RBT-005 | P0 | 「網站重啟後 502 嚴重影響體驗,要維護頁,外部雲端或專業做法」 | Public maintenance fallback:Nginx / edge / external static maintenance page / status page / fail-open UX,避免 502 直出 | 尚未完整落地;目前是需求缺口 | 產生 `public_maintenance_fallback` decision record:DNS/edge/外部雲端/本地 Nginx fallback 風險比較,先做不切流量的 check-mode |
|
||||
| 6 | CIR-P0-RBT-006 | P0 | 「所有主機關機立刻 Telegram 告警,重啟後也要告警,其他告警一併完整思考」 | Down / shutdown suspected / reboot detected / reboot recovered / SLO missed / backup failed / freshness stale / CPU pressure / Gitea queue 告警矩陣 | 部分已有 Alertmanager rule 與 Telegram receipt 補強;仍缺完整 shutdown/up E2E receipt | 建立 Telegram alert matrix + receipt verifier,逐項讀回 Alertmanager active/resolved 與 outbound receipt,不送測試 secret |
|
||||
|
||||
Reference in New Issue
Block a user