fix(runner): preserve harbor ssh probe failure rc
Some checks failed
CD Pipeline / workflow-shape (push) Successful in 0s
CD Pipeline / cancel-stale-cd (push) Has been skipped
CD Pipeline / tests (push) Successful in 37s
CD Pipeline / build-and-deploy (push) Failing after 28s
CD Pipeline / post-deploy-checks (push) Has been skipped

This commit is contained in:
Your Name
2026-07-01 09:54:36 +08:00
parent f50e363a59
commit f1149e560e
3 changed files with 4 additions and 1 deletions

View File

@@ -76,8 +76,9 @@ jobs:
if timeout 30 "${ssh_base[@]}" "$@"; then
echo "harbor_110_remote_ssh_probe_attempt=${attempt} result=success"
return 0
else
rc=$?
fi
rc=$?
echo "harbor_110_remote_ssh_probe_attempt=${attempt} result=failure rc=${rc}"
if [ "${attempt}" -lt "${SSH_PROBE_ATTEMPTS}" ]; then
sleep "${SSH_PROBE_SLEEP_SECONDS}"

View File

@@ -4,6 +4,7 @@
- 最新 live truthCD `#4215` 仍因 Harbor public `/v2/` = `502` 失敗Harbor repair `#4212` 的具體 blocker 是 `harbor_110_remote_control_channel_unavailable`
- 188 non-110 runner lane 讀回 ready、host pressure 正常;但 188 → 110 bounded SSH probe 呈現間歇性,一次 `true` 可成功,下一次 `recover-110-control-path-and-harbor-local.sh --check` 又 timeout。
- `.gitea/workflows/harbor-110-local-repair.yaml` 對非寫入的 SSH probe / verifier 加 bounded retry預設 `6` 次、每次仍受 `ConnectTimeout=8``ServerAlive*` 與外層 `timeout 30` 限制,並輸出 `harbor_110_remote_ssh_probe_attempt=...` receipt。`run_recovery --apply-all` 不自動 retry避免半套用被重跑。
- follow-up 修正retry failure branch 必須在 `else` 內保存原始 `rc`,避免 shell `if` compound status 把連續 timeout 誤記為 `rc=0` / success。
**驗證**
- `python3.11 -m pytest ops/runner/test_cd_controlled_runtime_profile.py -q``35 passed`

View File

@@ -141,6 +141,7 @@ def test_harbor_110_local_repair_workflow_is_dispatch_only_and_bounded() -> None
'SSH_PROBE_SLEEP_SECONDS="${AWOOOI_110_SSH_PROBE_SLEEP_SECONDS:-10}"'
in text
)
assert "else\n rc=$?" in text
assert "harbor_110_remote_ssh_probe_attempt=" in text
assert "operation_boundary_remote_ssh_bounded=true" in text
assert "harbor_110_remote_control_channel_unavailable" in text