From 00db624e5ff5a95510d24caf6577a2c07c05610b Mon Sep 17 00:00:00 2001 From: Your Name Date: Sun, 28 Jun 2026 09:46:46 +0800 Subject: [PATCH] fix(reboot): fail closed direct cd lane pressure path [skip ci] --- AGENTS.md | 2 +- docs/HARD_RULES.md | 8 ++-- docs/LOGBOOK.md | 4 ++ ops/runner/README.md | 6 ++- scripts/reboot-recovery/awoooi-startup-110.sh | 38 ++++++++++++++++++- .../full-stack-cold-start-check.sh | 24 ++++++++---- .../p3-controlled-release-gate.sh | 20 +++++++--- .../reboot-recovery/post-start-quick-check.sh | 24 ++++++++---- 8 files changed, 95 insertions(+), 31 deletions(-) diff --git a/AGENTS.md b/AGENTS.md index 791feadf..2f831555 100644 --- a/AGENTS.md +++ b/AGENTS.md @@ -46,7 +46,7 @@ 正確動作是 AI 自動補齊 target selector、source-of-truth diff、check-mode / dry-run、rollback、post-apply verifier、KM / PlayBook trust writeback,然後推進可驗證、可回滾、低爆炸半徑的實作。 -**110 runner 壓力事故例外**:Gitea / act-runner / direct transient runner 對 110 造成 CPU / headless smoke 壓力時,屬事故級容量保護,不得用「全面授權」直接重開 runner、移除 mask、還原 runner binary、用 `systemd-run` 直啟 `.real` binary,或把 host pressure gate 改成 warn-only。正確動作是先做 runner 搬遷 / 限流 / label isolation / smoke 排程,再以 check-mode、rollback 與 post-apply verifier 受控恢復。 +**110 runner / direct CD lane 壓力事故例外**:Gitea / act-runner / direct transient runner / direct CD lane 對 110 造成 CPU / headless smoke / Docker build 壓力時,屬事故級容量保護,不得用「全面授權」直接重開 runner、移除 mask、還原 runner / cd-lane binary、用 `systemd-run` 直啟 `.real` binary,或把 host pressure gate 改成 warn-only。正確動作是先做 runner / CD lane 搬遷、限流、label isolation、smoke 排程,再以 check-mode、rollback 與 post-apply verifier 受控恢復。 --- diff --git a/docs/HARD_RULES.md b/docs/HARD_RULES.md index 236fffc2..28a16774 100644 --- a/docs/HARD_RULES.md +++ b/docs/HARD_RULES.md @@ -287,17 +287,17 @@ OpenClaw 核心替換、仲裁模型升級、SDK / runtime 新依賴正式引入 force push / 刪 repo / 刪 refs / 改 repo visibility / raw runtime secret volume 讀寫 ``` -### 110 runner 壓力事故例外 +### 110 runner / direct CD lane 壓力事故例外 -2026-06-28 事故後,110 上的 Gitea / act-runner / direct transient runner、StockPlatform headless smoke、host-side Next build 與 Docker / BuildKit 壓力屬容量事故保護面。即使收到「批准 / 繼續 / 全面授權」,也不得直接重開 runner、解除 service mask、還原 live runner binary、用 `systemd-run` 直啟 `.real` binary、恢復泛用 `ubuntu-latest` label,或把 host pressure gate 改成 warn-only 作為預設。 +2026-06-28 事故後,110 上的 Gitea / act-runner / direct transient runner / direct CD lane、StockPlatform headless smoke、host-side Next build 與 Docker / BuildKit 壓力屬容量事故保護面。即使收到「批准 / 繼續 / 全面授權」,也不得直接重開 runner、解除 service mask、還原 live runner / cd-lane binary、用 `systemd-run` 直啟 `.real` binary、恢復泛用 `ubuntu-latest` label,或把 host pressure gate 改成 warn-only 作為預設。 -允許的 controlled apply 是降壓與防再發:停止 / disable / mask runner、mask direct transient unit、quarantine runner binary、收斂 labels、補 source fail-closed guard、搬遷 runner、限制 concurrency、把 smoke 改成排程 / 非 110 runner,以及執行只讀 pressure / cold-start verifier。 +允許的 controlled apply 是降壓與防再發:停止 / disable / mask runner、mask direct transient / direct CD lane unit、quarantine runner / cd-lane binary、收斂 labels、補 source fail-closed guard、搬遷 runner / CD lane、限制 concurrency、把 smoke 改成排程 / 非 110 runner,以及執行只讀 pressure / cold-start verifier。 恢復 runner 必須同時具備: 1. target selector:明確列出 service、runner dir、label 與承接 repo。 2. source-of-truth diff:repo / unit / startup script / runner config 都有一致變更。 -3. 限流或搬遷:不再由 110 production host 承接泛用重型 build / smoke。 +3. 限流或搬遷:不再由 110 production host 承接泛用或 direct lane 重型 build / smoke。 4. rollback:能回到 inactive / masked / fail-closed stub。 5. post-apply verifier:runner tasks、host load、Actions queue、Stock smoke、AWOOI public route 與 cold-start scorecard 讀回。 diff --git a/docs/LOGBOOK.md b/docs/LOGBOOK.md index e3f4b286..60bc34a6 100644 --- a/docs/LOGBOOK.md +++ b/docs/LOGBOOK.md @@ -39,6 +39,10 @@ **邊界**:本段沒有重啟 Docker / Nginx / firewall / K3s / DB,沒有讀 raw sessions / SQLite / auth / `.env` / runner token,也沒有恢復 110 runner。 +**09:24 追加**:又確認 `awoooi-cd-lane.service` 會在 110 透過 `/home/wooo/awoooi-manual-deploy` 連續啟動 Web Docker build,造成 pressure gate 阻擋。已停止並 mask `awoooi-cd-lane.service`,quarantine `/home/wooo/awoooi-cd-lane/awoooi_cd_lane` 原 ELF,改為 immutable fail-closed stub;source guard 已把 `awoooi-cd-lane.service` 與 cd-lane binary 一併納入 startup / cold-start / post-start / P3 release gate。這仍不代表 CD lane 搬遷完成;恢復前必須先完成非 110 build path 或硬限流。 + +**09:44 追加**:09:40 readback 抓到 `awoooi-cd-lane.service` 又被還原為 `enabled / active / Restart=always`,且 `/home/wooo/awoooi-cd-lane/awoooi_cd_lane` 又回到 ELF。已再次停止 / disable / kill,移除 `multi-user.target.wants` symlink,將 unit 改成 immutable regular fail-closed unit:`ConditionPathExists=/run/awoooi-cd-lane-enabled` + `ExecStart=/bin/false`,並將 cd-lane binary 改回 immutable shell stub。09:43 延遲讀回:cd-lane `load=loaded active=inactive unitfile=static ExecStart=/bin/false`、direct / Gitea runner units `masked / inactive`、runner/CD lane process `0`、五條 binary path 全部 shell stub、pressure gate `0`。runner-only quick-check `PASS=13 WARN=0 BLOCKED=0 RESULT=GREEN`;cold-start 單次仍 `PASS=90 WARN=1 BLOCKED=1`,唯一 blocker 是 `188 momo daily sales data stale beyond 3 days`;P3 gate `BAD_RUNNER_GUARDRAILS 0`,整體仍 HOLD,剩餘 blocker 是 cold-start / 188 backup stale / 188 litellm not running。 + ## 2026-06-28 — 08:45 110 runner 壓力事故 source / live fail-closed 收斂 **背景**:統帥全面授權打開非事故級 gate,但 110 Gitea runner 反覆拉起 StockPlatform headless Chrome smoke,已造成 production host CPU / CI 壓力事故;runner 未搬遷 / 限流前不得直接重開。 diff --git a/ops/runner/README.md b/ops/runner/README.md index 2a85bf9b..4ac2f8d1 100644 --- a/ops/runner/README.md +++ b/ops/runner/README.md @@ -397,8 +397,10 @@ Gitea service 名稱。四條 live runner 入口已改為 immutable fail-closed - `/home/wooo/act-runner-controlled/act_runner` - `/home/wooo/awoooi-controlled-runner/awoooi_controlled_runner` -必須一併維持 masked 的 unit 名稱: +必須一併維持 fail-closed 的 unit 名稱;Gitea / direct runner 維持 masked, +`awoooi-cd-lane.service` 維持 static `/bin/false` unit: +- `awoooi-cd-lane.service` - `awoooi-direct-runner-open.service` - `awoooi-direct-runner.service` - `gitea-act-runner-host.service` @@ -406,7 +408,7 @@ Gitea service 名稱。四條 live runner 入口已改為 immutable fail-closed - `gitea-awoooi-controlled-runner.service` - `gitea-act-runner-awoooi-open.service` -未完成 runner 搬遷 / 限流 / smoke 排程前,不得解除 mask、還原 ELF、恢復 +未完成 runner / CD lane 搬遷、限流、smoke 排程前,不得解除 mask、還原 ELF、恢復 泛用 runner label,或把 host pressure gate 預設改成 warn-only。 --- diff --git a/scripts/reboot-recovery/awoooi-startup-110.sh b/scripts/reboot-recovery/awoooi-startup-110.sh index 59653cf2..ead3684c 100644 --- a/scripts/reboot-recovery/awoooi-startup-110.sh +++ b/scripts/reboot-recovery/awoooi-startup-110.sh @@ -195,6 +195,7 @@ RUNNER_ENABLE_SENTINEL="/run/awoooi-runner-host-enabled" START_GITEA_RUNNER_ON_BOOT="${AWOOOI_START_GITEA_RUNNER_ON_BOOT:-0}" START_GITEA_RUNNER_ALLOWED=0 RUNNER_FAIL_CLOSED_SERVICES=( + "awoooi-cd-lane.service" "awoooi-direct-runner-open.service" "awoooi-direct-runner.service" "gitea-act-runner-host.service" @@ -203,6 +204,7 @@ RUNNER_FAIL_CLOSED_SERVICES=( "gitea-act-runner-awoooi-open.service" ) RUNNER_FAIL_CLOSED_BINARY_PATHS=( + "/home/wooo/awoooi-cd-lane/awoooi_cd_lane" "/home/wooo/act-runner/act_runner" "/home/wooo/act-runner/act_runner.real-20260628-runner-pressure-guard" "/home/wooo/act-runner-controlled/act_runner" @@ -264,6 +266,33 @@ EOF chattr +i "$path" >/dev/null 2>&1 || true } +install_cd_lane_fail_closed_unit() { + local unit_file="/etc/systemd/system/awoooi-cd-lane.service" + local tmp + local quarantine_stamp + quarantine_stamp="$(date +%Y%m%d%H%M%S)" + + if [ -e "$unit_file" ] || [ -L "$unit_file" ]; then + chattr -i "$unit_file" >/dev/null 2>&1 || true + if ! grep -q "AWOOOI direct CD lane fail-closed" "$unit_file" 2>/dev/null; then + mv "$unit_file" "${unit_file}.quarantined-runner-incident-${quarantine_stamp}" >/dev/null 2>&1 || true + fi + fi + tmp="$(mktemp)" + cat >"$tmp" <<'EOF' +[Unit] +Description=AWOOOI direct CD lane fail-closed after 2026-06-28 pressure incident +ConditionPathExists=/run/awoooi-cd-lane-enabled + +[Service] +Type=oneshot +ExecStart=/bin/false +EOF + install -o root -g root -m 0444 "$tmp" "$unit_file" >/dev/null 2>&1 || true + rm -f "$tmp" + chattr +i "$unit_file" >/dev/null 2>&1 || true +} + ensure_host_runner_fail_closed() { local unit local binary @@ -273,8 +302,12 @@ ensure_host_runner_fail_closed() { systemctl kill --signal=SIGKILL "$unit" >/dev/null 2>&1 || true systemctl reset-failed "$unit" >/dev/null 2>&1 || true systemctl disable "$unit" >/dev/null 2>&1 || true - systemctl mask "$unit" >/dev/null 2>&1 || mask_runner_unit_file "$unit" "/etc/systemd/system" - mask_runner_unit_file "$unit" "/etc/systemd/system" + if [ "$unit" = "awoooi-cd-lane.service" ]; then + install_cd_lane_fail_closed_unit + else + systemctl mask "$unit" >/dev/null 2>&1 || mask_runner_unit_file "$unit" "/etc/systemd/system" + mask_runner_unit_file "$unit" "/etc/systemd/system" + fi done systemctl daemon-reload >/dev/null 2>&1 || true @@ -289,6 +322,7 @@ ensure_host_runner_fail_closed() { fi pkill -KILL -f "^${RUNNER_DIR}/act_runner(\\.real-[^ ]*)? daemon" >/dev/null 2>&1 || true + pkill -KILL -f "^/home/wooo/awoooi-cd-lane/awoooi_cd_lane daemon" >/dev/null 2>&1 || true for binary in "${RUNNER_FAIL_CLOSED_BINARY_PATHS[@]}"; do guard_runner_binary_fail_closed "$binary" done diff --git a/scripts/reboot-recovery/full-stack-cold-start-check.sh b/scripts/reboot-recovery/full-stack-cold-start-check.sh index b7d64bd2..669318ab 100755 --- a/scripts/reboot-recovery/full-stack-cold-start-check.sh +++ b/scripts/reboot-recovery/full-stack-cold-start-check.sh @@ -286,16 +286,24 @@ echo "ACTION_RUNNER_ENABLED_COUNT $(systemctl list-unit-files "actions.runner.*" for u in $(systemctl list-units "actions.runner.*" --all --no-legend --plain 2>/dev/null | awk "{print \$1}"); do systemctl show "$u" -p ActiveState -p SubState -p CPUQuotaPerSecUSec -p MemoryMax -p WatchdogUSec -p NRestarts | sed "s/^/RUNNER $u /" done -for u in awoooi-direct-runner-open.service awoooi-direct-runner.service gitea-act-runner-host.service gitea-act-runner-awoooi-controlled.service gitea-awoooi-controlled-runner.service gitea-act-runner-awoooi-open.service; do +for u in awoooi-cd-lane.service awoooi-direct-runner-open.service awoooi-direct-runner.service gitea-act-runner-host.service gitea-act-runner-awoooi-controlled.service gitea-awoooi-controlled-runner.service gitea-act-runner-awoooi-open.service; do load=$(systemctl show "$u" -p LoadState --value 2>/dev/null || true) unitfile=$(systemctl show "$u" -p UnitFileState --value 2>/dev/null || true) active=$(systemctl show "$u" -p ActiveState --value 2>/dev/null || true) mainpid=$(systemctl show "$u" -p MainPID --value 2>/dev/null || true) - echo "RUNNER_FAILCLOSED_UNIT $u load=$load unitfile=$unitfile active=$active mainpid=$mainpid" + execstart=$(systemctl show "$u" -p ExecStart --value 2>/dev/null || true) + unit_ok=0 + if [ "$load" = "masked" ] && [ "$unitfile" = "masked" ] && [ "$active" = "inactive" ]; then + unit_ok=1 + fi + if [ "$u" = "awoooi-cd-lane.service" ] && [ "$active" = "inactive" ] && echo "$execstart" | grep -q "/bin/false"; then + unit_ok=1 + fi + echo "RUNNER_FAILCLOSED_UNIT $u load=$load unitfile=$unitfile active=$active mainpid=$mainpid ok=$unit_ok" done -direct_runner_count=$(pgrep -f "^/home/wooo/act-runner/act_runner|^/home/wooo/act-runner-controlled/act_runner|^/home/wooo/awoooi-controlled-runner/awoooi_controlled_runner" 2>/dev/null | wc -l | tr -d " ") +direct_runner_count=$(pgrep -f "^/home/wooo/awoooi-cd-lane/awoooi_cd_lane|^/home/wooo/act-runner/act_runner|^/home/wooo/act-runner-controlled/act_runner|^/home/wooo/awoooi-controlled-runner/awoooi_controlled_runner" 2>/dev/null | wc -l | tr -d " ") echo "RUNNER_DIRECT_PROCESS_COUNT $direct_runner_count" -for p in /home/wooo/act-runner/act_runner /home/wooo/act-runner/act_runner.real-20260628-runner-pressure-guard /home/wooo/act-runner-controlled/act_runner /home/wooo/awoooi-controlled-runner/awoooi_controlled_runner; do +for p in /home/wooo/awoooi-cd-lane/awoooi_cd_lane /home/wooo/act-runner/act_runner /home/wooo/act-runner/act_runner.real-20260628-runner-pressure-guard /home/wooo/act-runner-controlled/act_runner /home/wooo/awoooi-controlled-runner/awoooi_controlled_runner; do kind=$(file -b "$p" 2>/dev/null || echo missing) echo "RUNNER_FAILCLOSED_BINARY $p kind=$kind" echo "$kind" | grep -qi "ELF" && echo "RUNNER_FAILCLOSED_BINARY_ELF $p" @@ -323,12 +331,12 @@ docker ps --format "DOCKER {{.Names}}\t{{.Status}}" | head -120 else warn "runner watchdog state not confirmed" fi - if awk '$1 == "RUNNER_FAILCLOSED_UNIT" && ($3 != "load=masked" || $4 != "unitfile=masked") {bad=1} END {exit bad}' <<<"$out"; then - ok "110 direct/Gitea runner fail-closed units are masked" + if awk '$1 == "RUNNER_FAILCLOSED_UNIT" && $NF != "ok=1" {bad=1} END {exit bad}' <<<"$out"; then + ok "110 direct runner/CD lane units are fail-closed" else - fail "110 direct/Gitea runner fail-closed units are not all masked" + fail "110 direct runner/CD lane units are not fail-closed" fi - grep -q "RUNNER_DIRECT_PROCESS_COUNT 0" <<<"$out" && ok "110 direct runner process count is zero" || fail "110 direct runner process detected" + grep -q "RUNNER_DIRECT_PROCESS_COUNT 0" <<<"$out" && ok "110 direct runner/CD lane process count is zero" || fail "110 direct runner/CD lane process detected" grep -q "RUNNER_FAILCLOSED_BINARY_ELF" <<<"$out" && fail "110 runner fail-closed binary path restored to ELF" || ok "110 runner binary paths are fail-closed stubs or missing" grep -q "sentry-self-hosted-clickhouse-1.*Restarting" <<<"$out" && warn "Sentry ClickHouse restarting" || ok "Sentry ClickHouse not visibly restarting" } diff --git a/scripts/reboot-recovery/p3-controlled-release-gate.sh b/scripts/reboot-recovery/p3-controlled-release-gate.sh index 1b42e4f6..14f6a94f 100755 --- a/scripts/reboot-recovery/p3-controlled-release-gate.sh +++ b/scripts/reboot-recovery/p3-controlled-release-gate.sh @@ -306,17 +306,25 @@ check_runner_guardrails() { local out bad if ! out=$(ssh_cmd "wooo@192.168.0.110" ' bad=0 -for u in awoooi-direct-runner-open.service awoooi-direct-runner.service gitea-act-runner-host.service gitea-act-runner-awoooi-controlled.service gitea-awoooi-controlled-runner.service gitea-act-runner-awoooi-open.service; do +for u in awoooi-cd-lane.service awoooi-direct-runner-open.service awoooi-direct-runner.service gitea-act-runner-host.service gitea-act-runner-awoooi-controlled.service gitea-awoooi-controlled-runner.service gitea-act-runner-awoooi-open.service; do load=$(systemctl show "$u" -p LoadState --value 2>/dev/null || true) unitfile=$(systemctl show "$u" -p UnitFileState --value 2>/dev/null || true) active=$(systemctl show "$u" -p ActiveState --value 2>/dev/null || true) - echo "RUNNER_FAILCLOSED_UNIT $u load=$load unitfile=$unitfile active=$active" - [ "$load" = "masked" ] && [ "$unitfile" = "masked" ] && [ "$active" = "inactive" ] || bad=1 + execstart=$(systemctl show "$u" -p ExecStart --value 2>/dev/null || true) + unit_ok=0 + if [ "$load" = "masked" ] && [ "$unitfile" = "masked" ] && [ "$active" = "inactive" ]; then + unit_ok=1 + fi + if [ "$u" = "awoooi-cd-lane.service" ] && [ "$active" = "inactive" ] && echo "$execstart" | grep -q "/bin/false"; then + unit_ok=1 + fi + echo "RUNNER_FAILCLOSED_UNIT $u load=$load unitfile=$unitfile active=$active ok=$unit_ok" + [ "$unit_ok" = "1" ] || bad=1 done -direct_runner_count=$(pgrep -f "^/home/wooo/act-runner/act_runner|^/home/wooo/act-runner-controlled/act_runner|^/home/wooo/awoooi-controlled-runner/awoooi_controlled_runner" 2>/dev/null | wc -l | tr -d " ") +direct_runner_count=$(pgrep -f "^/home/wooo/awoooi-cd-lane/awoooi_cd_lane|^/home/wooo/act-runner/act_runner|^/home/wooo/act-runner-controlled/act_runner|^/home/wooo/awoooi-controlled-runner/awoooi_controlled_runner" 2>/dev/null | wc -l | tr -d " ") echo "RUNNER_DIRECT_PROCESS_COUNT $direct_runner_count" [ "$direct_runner_count" = "0" ] || bad=1 -for p in /home/wooo/act-runner/act_runner /home/wooo/act-runner/act_runner.real-20260628-runner-pressure-guard /home/wooo/act-runner-controlled/act_runner /home/wooo/awoooi-controlled-runner/awoooi_controlled_runner; do +for p in /home/wooo/awoooi-cd-lane/awoooi_cd_lane /home/wooo/act-runner/act_runner /home/wooo/act-runner/act_runner.real-20260628-runner-pressure-guard /home/wooo/act-runner-controlled/act_runner /home/wooo/awoooi-controlled-runner/awoooi_controlled_runner; do kind=$(file -b "$p" 2>/dev/null || echo missing) echo "RUNNER_FAILCLOSED_BINARY $p kind=$kind" echo "$kind" | grep -qi "ELF" && bad=1 @@ -338,7 +346,7 @@ echo "BAD_RUNNER_GUARDRAILS $bad" return fi echo "$out" - grep -q "BAD_RUNNER_GUARDRAILS 0" <<<"$out" && ok "all discovered runner units have watchdog disabled and CPU/memory limits" || blocked "runner guardrails incomplete" + grep -q "BAD_RUNNER_GUARDRAILS 0" <<<"$out" && ok "runner/CD lane fail-closed guardrails complete" || blocked "runner/CD lane guardrails incomplete" } check_job_containers() { diff --git a/scripts/reboot-recovery/post-start-quick-check.sh b/scripts/reboot-recovery/post-start-quick-check.sh index 24dad37b..b9b560b8 100755 --- a/scripts/reboot-recovery/post-start-quick-check.sh +++ b/scripts/reboot-recovery/post-start-quick-check.sh @@ -538,16 +538,24 @@ fi section "110 runner fail-closed guard" runner_tmp="$(mktemp -t post-start-runner.XXXXXX)" if ssh_read "wooo@192.168.0.110" ' -for u in awoooi-direct-runner-open.service awoooi-direct-runner.service gitea-act-runner-host.service gitea-act-runner-awoooi-controlled.service gitea-awoooi-controlled-runner.service gitea-act-runner-awoooi-open.service; do +for u in awoooi-cd-lane.service awoooi-direct-runner-open.service awoooi-direct-runner.service gitea-act-runner-host.service gitea-act-runner-awoooi-controlled.service gitea-awoooi-controlled-runner.service gitea-act-runner-awoooi-open.service; do load=$(systemctl show "$u" -p LoadState --value 2>/dev/null || true) unitfile=$(systemctl show "$u" -p UnitFileState --value 2>/dev/null || true) active=$(systemctl show "$u" -p ActiveState --value 2>/dev/null || true) mainpid=$(systemctl show "$u" -p MainPID --value 2>/dev/null || true) - echo "RUNNER_FAILCLOSED_UNIT $u load=$load unitfile=$unitfile active=$active mainpid=$mainpid" + execstart=$(systemctl show "$u" -p ExecStart --value 2>/dev/null || true) + unit_ok=0 + if [ "$load" = "masked" ] && [ "$unitfile" = "masked" ] && [ "$active" = "inactive" ]; then + unit_ok=1 + fi + if [ "$u" = "awoooi-cd-lane.service" ] && [ "$active" = "inactive" ] && echo "$execstart" | grep -q "/bin/false"; then + unit_ok=1 + fi + echo "RUNNER_FAILCLOSED_UNIT $u load=$load unitfile=$unitfile active=$active mainpid=$mainpid ok=$unit_ok" done -direct_runner_count=$(pgrep -f "^/home/wooo/act-runner/act_runner|^/home/wooo/act-runner-controlled/act_runner|^/home/wooo/awoooi-controlled-runner/awoooi_controlled_runner" 2>/dev/null | wc -l | tr -d " ") +direct_runner_count=$(pgrep -f "^/home/wooo/awoooi-cd-lane/awoooi_cd_lane|^/home/wooo/act-runner/act_runner|^/home/wooo/act-runner-controlled/act_runner|^/home/wooo/awoooi-controlled-runner/awoooi_controlled_runner" 2>/dev/null | wc -l | tr -d " ") echo "RUNNER_DIRECT_PROCESS_COUNT $direct_runner_count" -for p in /home/wooo/act-runner/act_runner /home/wooo/act-runner/act_runner.real-20260628-runner-pressure-guard /home/wooo/act-runner-controlled/act_runner /home/wooo/awoooi-controlled-runner/awoooi_controlled_runner; do +for p in /home/wooo/awoooi-cd-lane/awoooi_cd_lane /home/wooo/act-runner/act_runner /home/wooo/act-runner/act_runner.real-20260628-runner-pressure-guard /home/wooo/act-runner-controlled/act_runner /home/wooo/awoooi-controlled-runner/awoooi_controlled_runner; do kind=$(file -b "$p" 2>/dev/null || echo missing) echo "RUNNER_FAILCLOSED_BINARY $p kind=$kind" echo "$kind" | grep -qi "ELF" && echo "RUNNER_FAILCLOSED_BINARY_ELF $p" @@ -560,12 +568,12 @@ else blocked "110 runner fail-closed readback failed" fi cat "$runner_tmp" -if awk '$1 == "RUNNER_FAILCLOSED_UNIT" && ($3 != "load=masked" || $4 != "unitfile=masked") {bad=1} END {exit bad}' "$runner_tmp"; then - ok "110 direct/Gitea runner fail-closed units are masked" +if awk '$1 == "RUNNER_FAILCLOSED_UNIT" && $NF != "ok=1" {bad=1} END {exit bad}' "$runner_tmp"; then + ok "110 direct runner/CD lane units are fail-closed" else - blocked "110 direct/Gitea runner fail-closed units are not all masked" + blocked "110 direct runner/CD lane units are not fail-closed" fi -grep -q "RUNNER_DIRECT_PROCESS_COUNT 0" "$runner_tmp" && ok "110 direct runner process count is zero" || blocked "110 direct runner process detected" +grep -q "RUNNER_DIRECT_PROCESS_COUNT 0" "$runner_tmp" && ok "110 direct runner/CD lane process count is zero" || blocked "110 direct runner/CD lane process detected" grep -q "RUNNER_FAILCLOSED_BINARY_ELF" "$runner_tmp" && blocked "110 runner fail-closed binary path restored to ELF" || ok "110 runner binary paths are fail-closed stubs or missing" grep -q "RUNNER_PRESSURE_GATE_RC 0" "$runner_tmp" && ok "110 host pressure gate returned 0" || blocked "110 host pressure gate is blocking" rm -f "$runner_tmp"