fix(recovery): remove 188 deploy sudo secret
Some checks failed
CD Pipeline / workflow-shape (push) Successful in 0s
CD Pipeline / cancel-stale-cd (push) Has been skipped
CD Pipeline / tests (push) Failing after 3m13s
CD Pipeline / build-and-deploy (push) Has been skipped
CD Pipeline / post-deploy-checks (push) Has been skipped

This commit is contained in:
Your Name
2026-06-30 22:40:24 +08:00
parent 1c3c7279ac
commit b474f80c66
3 changed files with 30 additions and 11 deletions

View File

@@ -50787,11 +50787,13 @@ production browser smoke:
- 讀回 188 host hygiene`awoooi-startup.service` 因 300 秒 timeout 失敗,卡在 ClawBot `docker compose build --no-cache`;但 live `clawbot` / `clawbot-redis` 容器已 healthy證明舊 startup path 會把單一 rebuild 誤放大成整機 post-reboot failed。
- 修正 `scripts/reboot-recovery/awoooi-startup.sh`ClawBot startup 改成 bounded `docker compose up -d`rebuild 預設 opt-in (`CLAWBOT_STARTUP_REBUILD_ALLOWED=0`) 且有 timeout移除預設 `--no-cache` rebuild。
- 修正 `scripts/reboot-recovery/awoooi-startup.service``TimeoutStartSec=600`,對齊 10 分鐘 SLO但不允許單一服務 rebuild 佔滿整個恢復窗口。
- 修正 `scripts/reboot-recovery/deploy-to-188.sh`:移除硬編碼 sudo credential 與 `sudo -S` 路徑deploy helper 改為 passwordless sudo / explicit TTY sudo fail-closed不保存、不讀取、不回顯密碼。
- 補 `scripts/reboot-recovery/tests/test_188_host_hygiene_checklist.py`,鎖住 bounded / opt-in rebuild 與 600 秒 unit timeout。
**本地驗證結果**
- `bash -n scripts/reboot-recovery/awoooi-startup.sh scripts/reboot-recovery/awoooi-startup.service scripts/reboot-recovery/188-host-hygiene-maintenance-checklist.sh`:通過。
- `DATABASE_URL=sqlite+aiosqlite:////tmp/awoooi-codex-api-test.db PYTHONPATH=apps/api python3.11 -m pytest scripts/reboot-recovery/tests/test_188_host_hygiene_checklist.py scripts/reboot-recovery/tests/test_reboot_auto_recovery_slo_scorecard.py ops/runner/test_cd_controlled_runtime_profile.py -q``42 passed`
- `bash -n scripts/reboot-recovery/deploy-to-188.sh``pytest scripts/reboot-recovery/tests/test_188_host_hygiene_checklist.py -q`通過deploy helper 本體不再含 `sudo -S` / `PASS=` / hardcoded credential。
- `python3.11 ops/runner/guard-gitea-runner-pressure.py --root .`:通過,`auto_branch_events_on_110=0``generic_runner_labels=0`
- `node scripts/ci/check-gitea-step-env-secrets.js .gitea/workflows/cd.yaml .gitea/workflows/harbor-110-local-repair.yaml`:通過。
@@ -50799,6 +50801,7 @@ production browser smoke:
- `https://registry.wooo.work/v2/``http://192.168.0.110:5000/v2/``https://harbor.wooo.work/api/v2.0/health` 仍回 502。
- public Gitea queue 仍是 `blocked_harbor_110_repair_no_matching_runner`CD `#4100` runningHarbor repair `#4099` Waitingno matching label `awoooi-host`
- Production delivery / SLO API 仍含 2026-06-29 舊資料,不可作為本輪恢復證據;本輪以 live `/tmp/awoooi-reboot-slo-live-20260630-2231-scorecard.json` 為準。
- 188 live `/usr/local/bin/awoooi-startup.sh` / unit 仍是舊版;`ollama@192.168.0.188` 目前 `sudo -n` 需要密碼,因此未使用舊 credential 路徑套用 live 檔案,避免把 secret 問題擴大。
**仍維持**
- 沒有讀 secret / token / `.env` / raw sessions / SQLite / auth沒有讀 `.runner` 內容。

View File

@@ -3,8 +3,7 @@
# 執行方式: bash deploy-to-188.sh
set -euo pipefail
HOST="ollama@192.168.0.188"
PASS="0936223270"
HOST="${HOST:-ollama@192.168.0.188}"
echo "=== 部署 awoooi-startup 到 192.168.0.188 ==="
@@ -18,16 +17,22 @@ scp awoooi-startup.service "$HOST:/tmp/awoooi-startup.service"
# 3. 安裝
echo "[3/4] 安裝..."
ssh "$HOST" "
echo '$PASS' | sudo -S bash -c '
cp /tmp/awoooi-startup.sh /usr/local/bin/awoooi-startup.sh
chmod +x /usr/local/bin/awoooi-startup.sh
cp /tmp/awoooi-startup.service /etc/systemd/system/awoooi-startup.service
systemctl daemon-reload
systemctl enable awoooi-startup.service
echo done
ssh "$HOST" 'set -euo pipefail
if sudo -n true >/dev/null 2>&1; then
SUDO="sudo -n"
elif [ "${AWOOOI_ALLOW_INTERACTIVE_SUDO:-0}" = "1" ] && [ -t 0 ]; then
SUDO="sudo"
else
echo "BLOCKED sudo_password_required_passwordless_sudo_or_tty_required" >&2
exit 77
fi
$SUDO install -m 0755 /tmp/awoooi-startup.sh /usr/local/bin/awoooi-startup.sh
$SUDO install -m 0644 /tmp/awoooi-startup.service /etc/systemd/system/awoooi-startup.service
$SUDO systemctl daemon-reload
$SUDO systemctl enable awoooi-startup.service
echo done
'
"
# 4. 驗證
echo "[4/4] 驗證安裝..."

View File

@@ -7,6 +7,7 @@ ROOT = Path(__file__).resolve().parents[3]
SCRIPT = ROOT / "scripts" / "reboot-recovery" / "188-host-hygiene-maintenance-checklist.sh"
STARTUP_188 = ROOT / "scripts" / "reboot-recovery" / "awoooi-startup.sh"
STARTUP_188_SERVICE = ROOT / "scripts" / "reboot-recovery" / "awoooi-startup.service"
DEPLOY_188 = ROOT / "scripts" / "reboot-recovery" / "deploy-to-188.sh"
def test_188_and_110_default_to_reachable_runtime_identities() -> None:
@@ -49,3 +50,13 @@ def test_188_startup_unit_timeout_matches_reboot_slo() -> None:
text = STARTUP_188_SERVICE.read_text(encoding="utf-8")
assert "TimeoutStartSec=600" in text
def test_188_deploy_helper_does_not_embed_sudo_password() -> None:
text = DEPLOY_188.read_text(encoding="utf-8")
assert "sudo -S" not in text
assert "PASS=" not in text
assert "PASSWORD" not in text
assert "sudo -n true" in text
assert "BLOCKED sudo_password_required_passwordless_sudo_or_tty_required" in text