diff --git a/docs/LOGBOOK.md b/docs/LOGBOOK.md index 53e5be79..cd81a854 100644 --- a/docs/LOGBOOK.md +++ b/docs/LOGBOOK.md @@ -1,3 +1,51 @@ +## 2026-06-27|MOMO daily-sales source absence readback 與 cold-start blocker + +**背景**:110 runner / StockPlatform smoke 壓力已止血後,重新跑全主機 cold-start scorecard 與資料 freshness。AWOOOI / IwoooS / Stock / 188 主要 public routes 可用,但整體 cold-start 仍不能宣告 full green;目前主要業務資料 blocker 是 188 MOMO daily sales freshness。 + +**執行邊界**: +- 本輪只做 read-only preflight、log readback、檔名 / mtime / size 層級來源搜尋與 scorecard 彙整。 +- 未做 DB write / truncate / restore / manual import,未移動 Drive 檔案,未重啟 Docker / Nginx / K3s / scheduler,未讀 token value、raw session、SQLite、`.env` 或 secret。 + +**cold-start / scorecard 結果**: +- `scripts/reboot-recovery/post-reboot-readiness-summary.sh` artifact:`/tmp/awoooi-post-reboot-readiness-20260627-codex-rerun/summary.txt`。 +- `POST_START_RESULT=BLOCKED`、`POST_START_PASS=37`、`POST_START_WARN=3`、`POST_START_BLOCKED=2`、`SERVICE_GREEN=0`。 +- `PRODUCT_DATA_GREEN=1`、Stock freshness `ok`,latest trading date `2026-06-26`,`STOCK_BLOCKERS=none`。 +- `BACKUP_CORE_GREEN=1`,但 `DR_ESCROW_BLOCKED=1`、`ESCROW_MISSING_COUNT=5`。 +- Wazuh route `200`,但 `WAZUH_MANAGER_REGISTRY_ACCEPTED=0`、`WAZUH_RUNTIME_GATE=0`、`RUNTIME_ACTION_AUTHORIZED=0`。 +- 直接 cold-start rerun:`PASS=88`、`WARN=0`、`BLOCKED=1`;唯一 blocker 是 `188 momo daily sales data stale beyond 3 days`。 +- 20:48 next-gate dispatch 使用同一份 summary 回傳 `DISPATCH_RC=2`、`SERVICE_GREEN=0`、`NEXT_REQUIRED_GATES=credential_escrow_evidence,wazuh_manager_registry_export`、`DISPATCH_AUTHORIZED=0`、`REQUEST_SENT_COUNT=0`、`HOST_WRITE_AUTHORIZED=0`、`SECRET_VALUE_COLLECTION_ALLOWED=0`,並停在 `NEXT_STEP=restore_service_before_boundary_dispatch`;因此目前不可把 escrow / Wazuh gates 當成已可送出的 owner packet。 + +**MOMO readback 結果**: +- `scripts/reboot-recovery/momo-drive-token-source-recovery-preflight.sh` 結果:`PASS=20`、`WARN=3`、`BLOCKED=2`。 +- MOMO health:local / public health 皆 `200`,runtime version `V10.725`,app health `healthy`。 +- DB daily range:`109061|2025-07-01|2026-06-24`;freshness `3|2026-06-24`。 +- current monthly 與 sync snapshot parity:`15383|15383|2026-06-01|2026-06-24|2026-06-01|2026-06-24`。 +- latest import job `57`:`completed|即時業績_當日.xlsx|15383|15383|0`,表示 2026-06-25 13:16-13:18 的已匯入來源處理乾淨,但資料仍只到 `2026-06-24`。 +- Drive pending intake:`LOCAL_EXACT_DAILY_SOURCE_COUNT=0`;archive / global latest evidence 仍停在 `2026-06-25T04:21:47.000Z`。 +- `momo-scheduler` 36h log 顯示 Google Drive 連線成功、定期檢查 `當日業績匯入`,但多次回報找不到 Excel;scheduler 是 healthy / registered,不是目前 freshness blocker 的主因。 +- 188 與本機安全範圍檔名搜尋只找到舊 `即時業績_當日_20260112.xlsx` 候選,未找到可用於 2026-06-26 / 2026-06-27 的合法 daily-sales source。 +- 20:54 二次 preflight 仍為 `DRIVE_INTAKE_COUNT=0`、archive / global latest `2026-06-25T04:21:47.000Z`、DB daily freshness `3|2026-06-24`、latest import job `57 completed`;`momo-pro-system` / `momo-scheduler` containers 仍 healthy,且最近 scheduler log 只有排程註冊與一般 warning,沒有新 Excel 入站或成功匯入證據。 + +**DR / Wazuh gate readback**: +- 110 `/backup/scripts/offsite-escrow-evidence-report.sh --no-color` 顯示 rclone offsite configured、full offsite marker fresh、local backup repos checkable,但 5 個 credential escrow marker 全缺:`restic_repository_password`、`offsite_provider_credentials`、`break_glass_admin_credentials`、`dns_registrar_recovery`、`oauth_ai_provider_recovery`。 +- `scripts/security/wazuh-manager-registry-reviewer-validation.py` 通過 repo contract validation,但 snapshot 仍是 `received=0 accepted=0 runtime_gate=0`;route / transport / index pattern 不能替代 manager registry accepted。 +- 本輪沒有寫 escrow marker,沒有產生 owner response,沒有查 Wazuh live API / secret,也沒有 Wazuh active response、agent re-enroll、restart、host write 或 Kali active scan。 + +**2026-06-27 21:00 gate 補強**: +- 新增 `scripts/reboot-recovery/momo-source-arrival-gate.py`,只解析 `momo-drive-token-source-recovery-preflight.sh` 產出的 log 或 stdin,不連線、不查 token、不 import、不移動 Drive、不寫 DB。 +- 真實 20:54 preflight log 驗證:`MOMO_SOURCE_ARRIVAL_GATE status=blocked_source_absent_fail_closed source_intake=0 freshness=3|2026-06-24 safe_import_preflight_allowed=0 runtime_write_authorized=0 db_write_authorized=0 drive_move_authorized=0 next_step=wait_for_legitimate_daily_sales_source_then_rerun_gate`,exit code `2`。 +- 合成 source-arrived case 驗證:Drive intake count `1` 且 freshness stale 時,只回 `source_arrived_ready_for_safe_import_preflight`、`safe_import_preflight_allowed=1`,仍固定 `runtime_write_authorized=0`、`db_write_authorized=0`、`drive_move_authorized=0`。 +- 合成 freshness-green case 驗證:freshness `1|2026-06-26` 時回 `freshness_already_green_recheck_cold_start`,下一步仍是重跑 post-reboot summary,不得直接宣告 full green。 + +**結論**: +- 目前狀態是 `SERVICE_BLOCKED_MOMO_SOURCE_ABSENCE` / `SOURCE_ABSENT_FAIL_CLOSED`,不是 runner、Docker、Nginx、K3s 或 scheduler 事故。 +- 禁止用舊 archive、舊 sample、本機舊檔、手寫 DB、truncate / restore 或 manual Drive movement 製造 freshness 假綠。 +- 解除 blocker 需要新的合法 `即時業績_當日` source 出現在 `當日業績匯入`,或 owner-approved safe source evidence ref;之後才可在 maintenance-safe path 執行匯入,並要求 `sync_success=true`、source 只在成功後移動、daily snapshot / realtime monthly bounds 一致、freshness `<=2`,再重跑 cold-start scorecard。 + +**下一步**: +- 保持 fail-closed,等待合法來源到位後做 read-only preflight recheck。 +- 若有 owner-approved source evidence ref,另開 maintenance window 走安全匯入路徑;仍不得在沒有來源證據時宣告 all-green。 + ## 2026-06-27|110 Gitea runner 降壓防回彈與 workflow label 收斂 **背景**:110 CPU 事故已確認主因是 Gitea runner 反覆拉起 StockPlatform headless Chrome smoke;前一輪已停止 `gitea-act-runner-host.service`、清掉 Actions / smoke,並把 live runner labels 收斂為 `awoooi-ubuntu` / `awoooi-host`。本輪目標是防止 cold-start / startup 流程把 runner 又自動拉起,並補齊 AWOOI workflow label 與 post-deploy pressure gate。 diff --git a/docs/runbooks/FULL-STACK-COLD-START-SOP.md b/docs/runbooks/FULL-STACK-COLD-START-SOP.md index 4c2122fb..fc7896ec 100644 --- a/docs/runbooks/FULL-STACK-COLD-START-SOP.md +++ b/docs/runbooks/FULL-STACK-COLD-START-SOP.md @@ -296,6 +296,8 @@ NO-GO: truncate, whole-DB restore, manual Drive movement, or manual import witho UNBLOCK: new legitimate PChome daily-sales source appears in 當日業績匯入 or an owner-approved safe import path; import job succeeds with sync_success=true; source file moves only after success; daily_sales_snapshot and realtime_sales_monthly bounds match; MOMO_DAILY_FRESHNESS <= 2. ``` +2026-06-27 起,若已有 `momo-drive-token-source-recovery-preflight.sh` log,先跑 `python3 scripts/reboot-recovery/momo-source-arrival-gate.py --preflight-log ` 做機器判讀:`blocked_source_absent_fail_closed` 代表繼續等合法來源;`source_arrived_ready_for_safe_import_preflight` 只代表可進另一個 safe import preflight,不代表 DB write、Drive move、manual import 或 runtime write 已授權;`freshness_already_green_recheck_cold_start` 仍必須重跑同一 evidence chain 的 post-reboot summary 後才能更新恢復宣告。 + 所有回報必須使用這組詞,避免把「服務面可用」誤報成「整體 DR 完成」。 ### 0.3 Codex 工作站交接判定 diff --git a/scripts/reboot-recovery/momo-source-arrival-gate.py b/scripts/reboot-recovery/momo-source-arrival-gate.py new file mode 100755 index 00000000..35496f55 --- /dev/null +++ b/scripts/reboot-recovery/momo-source-arrival-gate.py @@ -0,0 +1,253 @@ +#!/usr/bin/env python3 +"""Classify MOMO daily-sales source arrival from a read-only preflight log. + +This parser never connects to MOMO, never imports files, never moves Drive +artifacts, and never authorizes DB / host / Drive writes. It turns the existing +`momo-drive-token-source-recovery-preflight.sh` evidence into a compact gate so +operators can tell whether they should keep waiting for a legitimate source or +start a separate safe-import preflight. +""" + +from __future__ import annotations + +import argparse +import json +import re +import sys +from pathlib import Path +from typing import Any + + +EXPECTED_IMPORT_CONFIG = "當日業績匯入|即時業績_當日" +SUMMARY_RE = re.compile( + r"^MOMO_DRIVE_TOKEN_SOURCE_PREFLIGHT " + r"PASS=(?P\d+) WARN=(?P\d+) BLOCKED=(?P\d+) " + r"HOST=(?P\S+) FRESHNESS_MAX_DAYS=(?P\d+)" +) + + +def parse_args() -> argparse.Namespace: + parser = argparse.ArgumentParser( + description="Classify MOMO source-arrival readiness from preflight output.", + ) + parser.add_argument( + "--preflight-log", + required=True, + help="Path to momo-drive-token-source-recovery-preflight output, or '-' for stdin.", + ) + parser.add_argument("--json", action="store_true", help="Print JSON result.") + return parser.parse_args() + + +def load_text(source: str) -> str: + if source == "-": + return sys.stdin.read() + return Path(source).read_text(encoding="utf-8") + + +def parse_int(value: Any, default: int | None = None) -> int | None: + try: + return int(str(value).strip()) + except (TypeError, ValueError): + return default + + +def parse_pipe(value: str, expected_parts: int) -> list[str]: + parts = str(value or "").split("|") + if len(parts) < expected_parts: + parts.extend([""] * (expected_parts - len(parts))) + return parts[:expected_parts] + + +def parse_preflight(text: str) -> dict[str, Any]: + values: dict[str, str] = {} + messages = {"ok": [], "warn": [], "blocked": []} + summary: dict[str, Any] = {} + + for raw_line in text.splitlines(): + line = raw_line.strip() + if not line: + continue + summary_match = SUMMARY_RE.match(line) + if summary_match: + summary = { + key: parse_int(value) if key != "host" else value + for key, value in summary_match.groupdict().items() + } + continue + if line.startswith("OK: "): + messages["ok"].append(line[4:]) + continue + if line.startswith("WARN: "): + messages["warn"].append(line[6:]) + continue + if line.startswith("BLOCKED: "): + messages["blocked"].append(line[9:]) + continue + if re.match(r"^[A-Z][A-Z0-9_]+(?:\s|$)", line): + key, _, value = line.partition(" ") + values[key] = value.strip() + + return {"values": values, "messages": messages, "summary": summary} + + +def monthly_sync_ok(value: str) -> bool: + snapshot_count, monthly_count, dmin, dmax, mmin, mmax = parse_pipe(value, 6) + snapshot_n = parse_int(snapshot_count, 0) or 0 + return ( + snapshot_n > 0 + and snapshot_count == monthly_count + and bool(dmin) + and bool(dmax) + and dmin == mmin + and dmax == mmax + ) + + +def latest_import_clean(value: str) -> bool: + job_id, status, _file_name, _created, _completed, total, success, errors = parse_pipe( + value, 8 + ) + return ( + parse_int(job_id) is not None + and status == "completed" + and parse_int(total, -1) == parse_int(success, -2) + and parse_int(errors, -1) == 0 + ) + + +def classify(parsed: dict[str, Any]) -> dict[str, Any]: + values = parsed["values"] + summary = parsed["summary"] + messages = parsed["messages"] + + freshness_days_text, latest_daily_date = parse_pipe(values.get("DB_DAILY_FRESHNESS", ""), 2) + freshness_days = parse_int(freshness_days_text) + freshness_max_days = parse_int(summary.get("freshness_max_days"), 2) or 2 + drive_intake_count = parse_int(values.get("DRIVE_INTAKE_COUNT"), 0) or 0 + drive_failed_count = parse_int(values.get("DRIVE_FAILED_COUNT"), 0) or 0 + drive_archive_latest = values.get("DRIVE_ARCHIVE_LATEST_MODIFIED", "none") or "none" + drive_global_latest = values.get("DRIVE_GLOBAL_LATEST_MODIFIED", "none") or "none" + + service_ready = ( + values.get("MOMO_PUBLIC_HEALTH_CODE") == "200" + and values.get("MOMO_HEALTH_CODE") == "200" + and values.get("MOMO_APP_HEALTH") == "healthy" + and values.get("SCHEDULER_RUNNING") == "true" + and values.get("SCHEDULER_HEALTH") == "healthy" + ) + import_config_ok = EXPECTED_IMPORT_CONFIG in values.get("IMPORT_CONFIG", "") + sync_ok = monthly_sync_ok(values.get("DB_MONTHLY_SYNC", "")) + clean_import = latest_import_clean(values.get("DB_LATEST_DAILY_IMPORT_JOB", "")) + freshness_green = ( + freshness_days is not None and 0 <= freshness_days <= freshness_max_days + ) + freshness_stale = freshness_days is not None and freshness_days > freshness_max_days + + blockers: list[str] = [] + warnings: list[str] = [] + status = "blocked_preflight_evidence_incomplete" + next_step = "rerun_momo_drive_token_source_recovery_preflight" + safe_import_preflight_allowed = False + exit_code = 2 + + if not summary: + blockers.append("preflight_summary_missing") + if not service_ready: + blockers.append("momo_service_or_scheduler_not_ready") + if not import_config_ok: + blockers.append("drive_import_config_not_expected_intake") + if not sync_ok: + blockers.append("current_month_snapshot_realtime_sync_not_proven") + if drive_failed_count > 0: + warnings.append("drive_failed_folder_has_matching_candidates") + + if blockers: + status = "blocked_service_or_evidence_not_ready" + next_step = "repair_readonly_preflight_evidence_before_source_or_import_decision" + elif freshness_green: + status = "freshness_already_green_recheck_cold_start" + next_step = "rerun_post_reboot_readiness_summary_with_same_evidence_chain" + exit_code = 0 + elif drive_intake_count > 0 and freshness_stale: + status = "source_arrived_ready_for_safe_import_preflight" + next_step = "run_owner_approved_safe_import_preflight_no_db_or_drive_write_yet" + safe_import_preflight_allowed = True + exit_code = 0 + elif drive_intake_count > 0: + status = "source_arrived_freshness_unknown_recheck_before_import" + next_step = "rerun_momo_preflight_and_validate_freshness_before_import" + safe_import_preflight_allowed = True + exit_code = 1 + elif freshness_stale: + status = "blocked_source_absent_fail_closed" + next_step = "wait_for_legitimate_daily_sales_source_then_rerun_gate" + else: + status = "blocked_freshness_unknown_fail_closed" + next_step = "rerun_preflight_or_repair_readonly_freshness_readback" + + if not clean_import: + warnings.append("latest_daily_import_job_not_clean_completed") + + return { + "schema_version": "momo_source_arrival_gate_v1", + "status": status, + "exit_code": exit_code, + "next_step": next_step, + "safe_import_preflight_allowed": safe_import_preflight_allowed, + "runtime_write_authorized": False, + "db_write_authorized": False, + "drive_move_authorized": False, + "manual_import_authorized": False, + "secret_value_collection_allowed": False, + "service_ready": service_ready, + "import_config_ok": import_config_ok, + "current_month_sync_ok": sync_ok, + "latest_import_clean": clean_import, + "freshness_days": freshness_days, + "freshness_latest_date": latest_daily_date or "unknown", + "freshness_max_days": freshness_max_days, + "drive_intake_count": drive_intake_count, + "drive_archive_latest_modified": drive_archive_latest, + "drive_global_latest_modified": drive_global_latest, + "drive_failed_count": drive_failed_count, + "preflight_pass": summary.get("pass", 0), + "preflight_warn": summary.get("warn", len(messages["warn"])), + "preflight_blocked": summary.get("blocked", len(messages["blocked"])), + "blockers": blockers, + "warnings": warnings, + "no_false_green_rules": [ + "source_arrived_does_not_authorize_import", + "safe_import_preflight_allowed_does_not_authorize_db_write", + "freshness_green_requires_post_reboot_summary_recheck", + "archive_or_local_old_file_does_not_count_as_new_source", + ], + } + + +def print_human(result: dict[str, Any]) -> None: + print( + "MOMO_SOURCE_ARRIVAL_GATE " + f"status={result['status']} " + f"source_intake={result['drive_intake_count']} " + f"freshness={result['freshness_days']}|{result['freshness_latest_date']} " + f"safe_import_preflight_allowed={int(result['safe_import_preflight_allowed'])} " + "runtime_write_authorized=0 " + "db_write_authorized=0 " + "drive_move_authorized=0 " + f"next_step={result['next_step']}" + ) + + +def main() -> int: + args = parse_args() + result = classify(parse_preflight(load_text(args.preflight_log))) + if args.json: + print(json.dumps(result, ensure_ascii=False, indent=2, sort_keys=True)) + else: + print_human(result) + return int(result["exit_code"]) + + +if __name__ == "__main__": + raise SystemExit(main())