fix(recovery): bound post reboot summary readbacks
Some checks failed
CD Pipeline / workflow-shape (push) Successful in 0s
CD Pipeline / cancel-stale-cd (push) Has been skipped
CD Pipeline / build-and-deploy (push) Has been cancelled
CD Pipeline / post-deploy-checks (push) Has been cancelled
CD Pipeline / tests (push) Has been cancelled
Some checks failed
CD Pipeline / workflow-shape (push) Successful in 0s
CD Pipeline / cancel-stale-cd (push) Has been skipped
CD Pipeline / build-and-deploy (push) Has been cancelled
CD Pipeline / post-deploy-checks (push) Has been cancelled
CD Pipeline / tests (push) Has been cancelled
This commit is contained in:
@@ -15,12 +15,12 @@
|
||||
|
||||
| 優先 | 狀態 | 工作項 | 2026-06-30 證據 | 下一步 / 完成條件 |
|
||||
|------|------|--------|------------------|-------------------|
|
||||
| P0-1 | BLOCKED | 全主機 cold-start / 10 分鐘自動恢復 SLO | 最新 `full-stack-cold-start-check.sh --monitor-read-only --no-color` 回 `PASS=67 WARN=4 BLOCKED=5`;110 registry external `/v2`、110 SSH read-only check、K3s registry pull refused、AWOOOI internal API probe、SigNoz TLS/public route 仍 blocked。 | 先修第一個 cold-start blocker,重跑同一 scorecard 到 `BLOCKED=0`;不可只用 route 200 宣稱恢復。 |
|
||||
| P0-1 | BLOCKED | 全主機 cold-start / 10 分鐘自動恢復 SLO | 20:18 `post-reboot-readiness-summary.sh --no-color` artifact `/tmp/awoooi-post-reboot-readiness-20260630-201642/summary.txt` 回 `POST_START_PASS=33 WARN=6 BLOCKED=8`、`SERVICE_GREEN=0`、`OVERALL_DECLARATION=SERVICE_BLOCKED`;110 registry `/v2`、110 SSH / backup / CPU / runner readback、K3s registry pull refused、SignOz 502/TLS、Stock `postgres_not_ready`、188 hygiene 仍 blocked。summary SSH 已 bounded,不再無限卡住。 | 先修第一個 runtime blocker:110 control path / Harbor registry `/v2`。重跑同一 summary 到 `SERVICE_GREEN=1` 且 `POST_START_BLOCKED=0`;不可只用 route 200 宣稱恢復。 |
|
||||
| P0-2 | DONE_THIS_INCIDENT | 使用者可見 502:Tsenyang | `www.tsenyang.com` / `tsenyang.com` 由 502 恢復為 200;188 `tsenyang-website` container running;local `127.0.0.1:3000` 回 200。 | 下次同類 502 先查 release symlink / image / container;不先動 Nginx、DNS、DB、主機重啟。 |
|
||||
| P0-3 | BLOCKED | StockPlatform data freshness | public `/healthz`、`/api/healthz` 回 200;freshness / ingestion 回 `not_configured`、`postgres_not_ready`。 | 恢復 110 control path 後,read-only 查 `/home/wooo/stockplatform-v2` compose / DB schema / migration status;禁止 fake freshness、manual DB rows、restore/prune。 |
|
||||
| P0-4 | BLOCKED | AWOOOI production 版本最新性 | Gitea `main` 已多次前進但 production runtime readback 仍為 `7890778b83`,`runtime_build_readback_status=runtime_build_diverges_from_committed_deploy_readback`。Public Gitea visible run `cd.yaml #4043` 是 Failure;jobs API 與 visible run `head_sha` 不一致,已標 `cd_jobs_stale_or_mismatched`。 | 補 deploy marker / runtime SHA / endpoint readback 一致;未一致前不可宣稱 AWOOOI 最新。 |
|
||||
| P0-5 | BLOCKED | 110 control path | `diagnose-110-ssh-publickey-auth.sh`:node-exporter / SSH banner 正常;`NODE_LOAD_CLASSIFIER=high_load`、`NODE_PROCS_BLOCKED=0`;`wooo` publickey `publickey_offer_timeout`,`root` publickey `permission_denied`,`git` / `ollama` `preauth_timeout`。 | 集中查 110 sshd publickey auth / authorized_keys / PAM / account lookup path,並把 load / runner pressure 視為同一 blocker 的共因;可在 110 console/local 跑 `repair-110-ssh-publickey-auth-local.sh --check` / `--apply` 修 metadata 權限。恢復 SSH read-only command path 後才能驗證 Stock DB、Gitea dump、110 backup completeness。 |
|
||||
| P0-6 | BLOCKED_BACKUP_COMPLETENESS | Gitea repo visibility 與完整備份 | Gitea version API 200;public repo search 只列 4 個 public repo;`stockplatform-v2` public page/API 404,但 internal `git ls-remote` 成功;188 `/home/ollama/backup/110/gitea` 起初為空。已建立 verified emergency bundle `/home/ollama/backup/110/gitea/git-bundles/20260630-190931`:4 個 public/internal repo bundle verify + checksum 成功,`AwoooGo`、`stockplatform-v2`、`vibework` 因 private auth fail-closed。 | 188 `gitea_repo_mirror_from_110` subtree metric / alert 已補;下一步仍是恢復 110 SSH command path 後跑正式 `gitea dump`、private repo 非互動備份、repo count 與 restore drill readback。 |
|
||||
| P0-6 | BLOCKED_BACKUP_COMPLETENESS | Gitea repo visibility 與完整備份 | Gitea version API 200;public repo search 只列 4 個 public repo;`stockplatform-v2` public page/API 404,但 internal `git ls-remote` 成功;188 `/home/ollama/backup/110/gitea` 起初為空。已建立 verified emergency bundle `/home/ollama/backup/110/gitea/git-bundles/20260630-190931`:4 個 public/internal repo bundle verify + checksum 成功,`AwoooGo`、`stockplatform-v2`、`vibework` 因 private auth fail-closed。20:18 summary 因 110 `backup-status` 不可讀回,`BACKUP_CORE_GREEN=0`、`DR_ESCROW_BLOCKED=1`、`DR_ESCROW_EVIDENCE_UNKNOWN=1`。 | 188 `gitea_repo_mirror_from_110` subtree metric / alert 已補;下一步仍是恢復 110 SSH command path 後跑正式 `gitea dump`、private repo 非互動備份、repo count、backup-status 與 restore drill readback。unknown 不得當作 backup / DR green。 |
|
||||
| P0-7 | SOURCE_READY_RUNTIME_BLOCKED | 99 VMware / VM autostart | repo 已有 `windows99-vmware-autostart.ps1`;最新只讀 readback:99 ping OK、RDP 3389 OK、SSH 22 OK、WinRM 5985 fail,`administrator@192.168.0.99` SSH publickey denied;VM host 111 仍不可達。 | 恢復 99 可控通道或由 console 套用腳本;完成後讀回 111/188/120/121/112 boot evidence。 |
|
||||
| P0-8 | SOURCE_READY_RUNTIME_BLOCKED | 502 maintenance fallback / Telegram / backup alert | L0/L1 fallback runbook、Nginx snippet、reboot / backup alert rules 已在 source;runtime 尚需部署與外部 L1 provider readback。 | L0 以測試 vhost 驗證 `X-AWOOOI-Fallback`;L1 需外部雲端/CDN probe;Telegram 以脫敏 alert receipt 驗證。 |
|
||||
|
||||
|
||||
Reference in New Issue
Block a user