diff --git a/docs/LOGBOOK.md b/docs/LOGBOOK.md index 3dfada63..ec0210aa 100644 --- a/docs/LOGBOOK.md +++ b/docs/LOGBOOK.md @@ -1,3 +1,56 @@ +## 2026-05-12 | RLS Preflight 與 188 Registry Certbot 修復包 + +**背景**:Wave 1 已確認 production RLS 是 P0,但不可直接熱開;188 `registry.wooo.work` certbot 也已確認失效,但目前 `ollama` SSH 帳號沒有免密 sudo。這輪把兩個紅燈轉成可重跑、可交接、可審批的 remediation 前置包。 + +**新增 RLS preflight**: +- `scripts/ops/awooop_rls_preflight.py`: + - 設計為在 production API pod 內執行,使用 pod-local `DATABASE_URL`,不輸出 DB URL 或密碼。 + - read-only 檢查 DB role、`set_config('app.project_id')`、target table `project_id` 欄位、RLS enabled/forced/policy、fail-open policy expression。 + - `--exact-counts` 才執行精確 `COUNT(*)` / `NULL project_id` 掃描。 +- `scripts/ops/awooop-rls-preflight.sh`: + - 預設透過 `wooo@192.168.0.120` 執行 `sudo kubectl -n awoooi-prod exec deployment/awoooi-api -c api -- python -`。 + - 支援 `--local`、`--json`、`--exact-counts`。 + - exit `2` 表示 RLS gate blocked,不可啟用 RLS。 +- `docs/runbooks/AWOOOP-RLS-PREFLIGHT.md`: + - 記錄 2026-05-12 production preflight 結果與 remediation order。 + +**RLS live preflight 結果**: +- `bash scripts/ops/awooop-rls-preflight.sh --exact-counts` → exit `2`,符合 blocked gate。 +- `PASS=5 WARN=0 BLOCKED=2`。 +- PASS: + - current DB user `awoooi` 不是 superuser / bypassrls。 + - `set_config('app.project_id', 'awoooi', TRUE)` 可用。 + - 所有已存在 target tables 都有 `project_id`。 + - production DB 目前沒有 fail-open policy expression。 + - exact counts 顯示已存在 target tables `NULL project_id = 0`。 +- BLOCKED: + - `awooop_app`、`awooop_platform_admin`、`awooop_migration` roles 不存在。 + - target tables 尚未 RLS enabled / forced / policied。 +- 判讀:下一步不是回填資料,而是 role bootstrap + DB access path audit + staged policy enablement;目前 production app user 是 `awoooi`,policy 設計必須先決定是 grant `awooop_app` membership 還是切 connection role。 + +**新增 188 registry certbot 修復包**: +- `scripts/ops/188-registry-certbot-fix.sh`: + - root-only helper;預設 dry-run,必須 `--apply` 才會改 188。 + - 建立 `/var/www/certbot`。 + - 安裝 `/etc/nginx/conf.d/registry-acme-http.conf`,讓 `registry.wooo.work` HTTP-01 不再落到 `aiops.wooo.work` default vhost。 + - `nginx -t` 後 reload。 + - 用 `/snap/bin/certbot renew --cert-name registry.wooo.work` renew。 + - snap certbot 存在時停用 broken apt `certbot.timer` 並 reset failed apt certbot service。 +- `docs/runbooks/REGISTRY-CERTBOT-188.md`: + - 記錄 expired cert、錯誤 route、apt/snap certbot owner split,以及 post-fix 驗證命令。 + +**驗證**: +- `python3 -m py_compile scripts/ops/awooop_rls_preflight.py` → passed。 +- `bash -n scripts/ops/awooop-rls-preflight.sh scripts/ops/188-registry-certbot-fix.sh` → passed。 +- `scripts/ops/188-registry-certbot-fix.sh` dry-run → 印出預期動作,未修改本機或 188。 +- RLS preflight 已對 production API pod 跑通;blocked 結果符合預期,未改 DB。 +- 已同步 helper 到 188 `/home/ollama/awoooi-ops/188-registry-certbot-fix.sh`。 +- 188 remote `bash -n` passed;remote dry-run 印出預期 root actions,未改 Nginx / certbot。 + +**下一步**: +- 由具 sudo 權限的 operator 在 188 執行 `sudo /home/ollama/awoooi-ops/188-registry-certbot-fix.sh --apply`。 +- RLS 先做 role bootstrap 設計審查,再產出 batch migration;不可直接套既有 RLS migration。 + ## 2026-05-12 | Wave 1 Claude P0 紅燈驗證與 GitHub CD 封堵 **背景**:Claude Code 盤點只能作為候選清單,必須逐項用 production DB、主機狀態、provider logs、repo artifacts 驗證;本輪先處理可快速證實且風險高的紅燈。 diff --git a/docs/runbooks/AWOOOP-RLS-PREFLIGHT.md b/docs/runbooks/AWOOOP-RLS-PREFLIGHT.md new file mode 100644 index 00000000..859de509 --- /dev/null +++ b/docs/runbooks/AWOOOP-RLS-PREFLIGHT.md @@ -0,0 +1,88 @@ +# AwoooP RLS Preflight Runbook + +> Purpose: verify whether production is ready for PostgreSQL Row-Level Security +> without enabling RLS or changing data. + +## Command + +Default path runs the probe inside the production API pod through the 120 +control-plane host. `DATABASE_URL` stays inside Kubernetes and is not printed. + +```bash +bash scripts/ops/awooop-rls-preflight.sh +``` + +Before enabling RLS, run exact backfill counts: + +```bash +bash scripts/ops/awooop-rls-preflight.sh --exact-counts +``` + +Useful variants: + +```bash +bash scripts/ops/awooop-rls-preflight.sh --json +bash scripts/ops/awooop-rls-preflight.sh --local +AWOOOP_RLS_SSH_TARGET=wooo@192.168.0.120 bash scripts/ops/awooop-rls-preflight.sh +``` + +Exit code `2` means the gate is blocked and RLS must not be enabled yet. + +## 2026-05-12 Production Result + +`--exact-counts` returned: + +- `PASS current_role_rls_enforced`: current DB user is `awoooi`, not superuser and not `BYPASSRLS`. +- `PASS project_context_set_config`: `set_config('app.project_id', 'awoooi', TRUE)` works in the API pod. +- `BLOCKED required_roles`: `awooop_app`, `awooop_platform_admin`, and `awooop_migration` do not exist. +- `PASS project_id_columns`: every existing target table has `project_id`. +- `BLOCKED rls_enabled_forced_policy`: existing target tables are not yet RLS enabled, forced, or policied. +- `PASS fail_open_policies`: production DB currently has no fail-open policy expressions. +- `PASS project_id_backfill`: exact counts found zero `NULL project_id` rows in counted target tables. + +Current blocker summary: + +```text +PASS=5 WARN=0 BLOCKED=2 +``` + +Important exact counts from the same run: + +| Table | Rows | NULL project_id | +| --- | ---: | ---: | +| `audit_logs` | 686 | 0 | +| `awooop_mcp_tool_registry` | 4 | 0 | +| `awooop_outbound_message` | 228 | 0 | +| `awooop_projects` | 2 | 0 | +| `awooop_run_state` | 106 | 0 | +| `incidents` | 1518 | 0 | +| `knowledge_entries` | 2099 | 0 | +| `playbooks` | 220 | 0 | + +## Remediation Order + +1. Create or reconcile RLS roles. + - Current production app user is `awoooi`; policy design must either grant it + membership in `awooop_app` or update the application connection role before + policies are enforced. + - Do not create passworded LOGIN roles in a migration unless the K8s Secret + rotation path is ready. +2. Verify all DB access paths use `get_db()` / `get_db_context()` or otherwise set + `app.project_id` before queries. +3. Apply policies first in staging or a canary DB. +4. In production, enable one batch at a time. +5. After each batch, run: + +```bash +bash scripts/ops/awooop-rls-preflight.sh --exact-counts +``` + +6. Validate AwoooP Runs, Approvals, Monitoring, Tickets, Cost, alert ingestion, + background workers, and TelegramGateway mirror paths. + +## Do Not + +- Do not enable all policies in production before the role path is decided. +- Do not rely on fail-open `IS NULL` or empty-string policies as the target state. +- Do not run destructive rollback SQL unless the incident commander explicitly + approves it. diff --git a/docs/runbooks/REGISTRY-CERTBOT-188.md b/docs/runbooks/REGISTRY-CERTBOT-188.md new file mode 100644 index 00000000..2d1e7f52 --- /dev/null +++ b/docs/runbooks/REGISTRY-CERTBOT-188.md @@ -0,0 +1,62 @@ +# 188 Registry Certbot Recovery + +> Scope: `registry.wooo.work` on host `192.168.0.188`. + +## Verified State On 2026-05-12 + +- `registry.wooo.work` certificate expired at `May 8 04:16:08 2026 GMT`. +- HTTP-01 route check: + +```text +http://registry.wooo.work/.well-known/acme-challenge/codex-route-check +-> 301 https://aiops.wooo.work/.well-known/acme-challenge/codex-route-check +-> 404 +``` + +- `/usr/bin/certbot` is broken by Python/OpenSSL mismatch. +- `/snap/bin/certbot` exists and should be the renewal owner. +- Both apt `certbot.timer` and snap `snap.certbot.renew.timer` were enabled. +- The `ollama` SSH user is in sudo group but has no passwordless sudo in this + session, so Codex could not apply the root-level fix directly. + +## Fix Script + +The repo includes a root-only helper. It is dry-run by default: + +```bash +bash scripts/ops/188-registry-certbot-fix.sh +``` + +To apply on 188: + +```bash +sudo bash /home/ollama/awoooi-ops/188-registry-certbot-fix.sh --apply +``` + +The script: + +- creates `/var/www/certbot`; +- installs `/etc/nginx/conf.d/registry-acme-http.conf`; +- routes `registry.wooo.work` HTTP-01 to `/var/www/certbot`; +- reloads Nginx after `nginx -t`; +- renews `registry.wooo.work` via `/snap/bin/certbot`; +- disables the broken apt `certbot.timer` when snap certbot is present; +- prints the renewed certificate dates. + +## Post-Fix Verification + +Run from any host with network access: + +```bash +curl -sI --max-redirs 0 http://registry.wooo.work/.well-known/acme-challenge/codex-route-check +openssl s_client -servername registry.wooo.work -connect registry.wooo.work:443 /dev/null \ + | openssl x509 -noout -subject -issuer -dates +``` + +Expected: + +- HTTP challenge path returns `404` from the `registry.wooo.work` vhost, not a + redirect to `aiops.wooo.work`. +- `notAfter` is renewed to a future date. +- `systemctl --failed` no longer lists apt `certbot.service` after failed state + reset. diff --git a/scripts/ops/188-registry-certbot-fix.sh b/scripts/ops/188-registry-certbot-fix.sh new file mode 100755 index 00000000..eeadbfd0 --- /dev/null +++ b/scripts/ops/188-registry-certbot-fix.sh @@ -0,0 +1,117 @@ +#!/usr/bin/env bash +# Repair helper for 188 registry.wooo.work HTTP-01 renewal. +# Default is dry-run. Use --apply on 188 as root after reviewing the plan. +set -euo pipefail + +APPLY=0 +DOMAIN="${REGISTRY_CERTBOT_DOMAIN:-registry.wooo.work}" +WEBROOT="${REGISTRY_CERTBOT_WEBROOT:-/var/www/certbot}" +NGINX_SNIPPET="${REGISTRY_CERTBOT_NGINX_SNIPPET:-/etc/nginx/conf.d/registry-acme-http.conf}" +CERTBOT_BIN="${REGISTRY_CERTBOT_BIN:-/snap/bin/certbot}" + +usage() { + cat <<'USAGE' +Usage: sudo bash scripts/ops/188-registry-certbot-fix.sh [--apply] + +Fixes the known 188 drift where registry.wooo.work HTTP-01 traffic falls through +to the aiops.wooo.work default server and certbot cannot renew the registry cert. + +Default mode is dry-run and prints the exact actions. --apply requires root. + +Environment: + REGISTRY_CERTBOT_DOMAIN Default: registry.wooo.work + REGISTRY_CERTBOT_WEBROOT Default: /var/www/certbot + REGISTRY_CERTBOT_NGINX_SNIPPET Default: /etc/nginx/conf.d/registry-acme-http.conf + REGISTRY_CERTBOT_BIN Default: /snap/bin/certbot +USAGE +} + +while [ "$#" -gt 0 ]; do + case "$1" in + --apply) + APPLY=1 + ;; + -h|--help) + usage + exit 0 + ;; + *) + echo "Unknown argument: $1" >&2 + usage >&2 + exit 64 + ;; + esac + shift +done + +run() { + if [ "$APPLY" -eq 1 ]; then + "$@" + else + printf 'DRY-RUN:' + printf ' %q' "$@" + printf '\n' + fi +} + +write_snippet() { + local tmp + tmp="$(mktemp)" + cat > "$tmp" <&2 + exit 77 +fi + +if [ "$APPLY" -eq 1 ] && [ ! -x "$CERTBOT_BIN" ]; then + echo "certbot binary not executable: $CERTBOT_BIN" >&2 + exit 69 +fi + +echo "Plan: repair HTTP-01 route for ${DOMAIN}, renew via ${CERTBOT_BIN}, reload nginx." +run install -d -m 0755 "$WEBROOT" +write_snippet +run nginx -t +run systemctl reload nginx + +if [ "$APPLY" -eq 1 ]; then + code="$(curl -s -o /dev/null -w '%{http_code}' --max-time 8 "http://${DOMAIN}/.well-known/acme-challenge/codex-route-check" || true)" + if [ "$code" != "404" ]; then + echo "Unexpected ACME route status after nginx reload: ${code}; expected 404 from ${DOMAIN}, not redirect/default vhost" >&2 + exit 1 + fi +fi + +run "$CERTBOT_BIN" renew --cert-name "$DOMAIN" --deploy-hook "systemctl reload nginx" + +if [ -x /snap/bin/certbot ]; then + run systemctl disable --now certbot.timer + run systemctl reset-failed certbot.service +fi + +if [ "$APPLY" -eq 1 ]; then + openssl x509 -noout -subject -issuer -dates -in "/etc/letsencrypt/live/${DOMAIN}/fullchain.pem" + systemctl status snap.certbot.renew.timer --no-pager -l | sed -n '1,25p' || true +else + echo "Dry-run only. Re-run with --apply on 188 as root to execute." +fi diff --git a/scripts/ops/awooop-rls-preflight.sh b/scripts/ops/awooop-rls-preflight.sh new file mode 100755 index 00000000..09bf5010 --- /dev/null +++ b/scripts/ops/awooop-rls-preflight.sh @@ -0,0 +1,100 @@ +#!/usr/bin/env bash +# Read-only AwoooP RLS preflight runner. +# +# Default path runs inside the production API pod through the 120 control-plane +# host, so DATABASE_URL stays inside Kubernetes and is never printed locally. +set -euo pipefail + +SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)" +PY_SCRIPT="${SCRIPT_DIR}/awooop_rls_preflight.py" + +NAMESPACE="${AWOOOP_RLS_NAMESPACE:-awoooi-prod}" +DEPLOYMENT="${AWOOOP_RLS_DEPLOYMENT:-deployment/awoooi-api}" +CONTAINER="${AWOOOP_RLS_CONTAINER:-api}" +SSH_TARGET="${AWOOOP_RLS_SSH_TARGET:-wooo@192.168.0.120}" +REMOTE_KUBECTL="${AWOOOP_RLS_REMOTE_KUBECTL:-sudo kubectl}" +KUBECTL="${AWOOOP_RLS_KUBECTL:-kubectl}" +USE_SSH=1 +PY_ARGS=() +SSH_OPTS=(-o BatchMode=yes -o ConnectTimeout=8) + +usage() { + cat <<'USAGE' +Usage: bash scripts/ops/awooop-rls-preflight.sh [options] + +Read-only checks for AwoooP PostgreSQL RLS readiness. The script runs the Python +probe inside the API pod and exits 2 when RLS is not ready to enable. + +Options: + --exact-counts Run exact COUNT(*) project_id backfill checks. + --json Print JSON output from the pod. + --local Use local kubectl instead of SSH to 120. + --ssh USER@HOST Override SSH target. Default: wooo@192.168.0.120. + -h, --help Show this help. + +Environment: + AWOOOP_RLS_NAMESPACE Default: awoooi-prod + AWOOOP_RLS_DEPLOYMENT Default: deployment/awoooi-api + AWOOOP_RLS_CONTAINER Default: api + AWOOOP_RLS_REMOTE_KUBECTL Default: sudo kubectl + AWOOOP_RLS_KUBECTL Default: kubectl +USAGE +} + +while [ "$#" -gt 0 ]; do + case "$1" in + --exact-counts) + PY_ARGS+=(--exact-counts) + ;; + --json) + PY_ARGS+=(--json) + ;; + --local) + USE_SSH=0 + ;; + --ssh) + shift + SSH_TARGET="${1:-}" + if [ -z "$SSH_TARGET" ]; then + echo "--ssh requires USER@HOST" >&2 + exit 64 + fi + USE_SSH=1 + ;; + -h|--help) + usage + exit 0 + ;; + *) + echo "Unknown argument: $1" >&2 + usage >&2 + exit 64 + ;; + esac + shift +done + +if [ ! -f "$PY_SCRIPT" ]; then + echo "Missing Python probe: $PY_SCRIPT" >&2 + exit 66 +fi + +if [ "$USE_SSH" -eq 1 ]; then + printf -v namespace_q "%q" "$NAMESPACE" + printf -v deployment_q "%q" "$DEPLOYMENT" + printf -v container_q "%q" "$CONTAINER" + remote_cmd="${REMOTE_KUBECTL} -n ${namespace_q} exec -i ${deployment_q} -c ${container_q} -- python -" + if [ "${#PY_ARGS[@]}" -gt 0 ]; then + for arg in "${PY_ARGS[@]}"; do + printf -v arg_q "%q" "$arg" + remote_cmd="${remote_cmd} ${arg_q}" + done + fi + ssh "${SSH_OPTS[@]}" "$SSH_TARGET" "$remote_cmd" < "$PY_SCRIPT" +else + if [ "${#PY_ARGS[@]}" -gt 0 ]; then + "$KUBECTL" -n "$NAMESPACE" exec -i "$DEPLOYMENT" -c "$CONTAINER" -- python - "${PY_ARGS[@]}" < "$PY_SCRIPT" + else + "$KUBECTL" -n "$NAMESPACE" exec -i "$DEPLOYMENT" -c "$CONTAINER" -- python - < "$PY_SCRIPT" + fi +fi diff --git a/scripts/ops/awooop_rls_preflight.py b/scripts/ops/awooop_rls_preflight.py new file mode 100755 index 00000000..440c4bd8 --- /dev/null +++ b/scripts/ops/awooop_rls_preflight.py @@ -0,0 +1,332 @@ +#!/usr/bin/env python3 +""" +Read-only AwoooP RLS preflight. + +This script is designed to run inside the production API pod. It uses the +pod-local DATABASE_URL and never prints the URL or credentials. +""" + +from __future__ import annotations + +import argparse +import asyncio +import json +import os +import sys +from dataclasses import asdict, dataclass +from typing import Any + +from sqlalchemy import text +from sqlalchemy.ext.asyncio import create_async_engine + + +TARGET_TABLES = [ + "incidents", + "knowledge_entries", + "playbooks", + "audit_logs", + "budget_ledger", + "awooop_projects", + "awooop_contracts", + "awooop_contract_revisions", + "awooop_published_contracts", + "awooop_run_state", + "awooop_run_event", + "awooop_cost_ledger", + "awooop_mcp_tool_registry", + "awooop_mcp_grants", + "awooop_mcp_credential_refs", + "awooop_mcp_gateway_audit", + "awooop_conversation_event", + "awooop_outbound_message", +] + +REQUIRED_ROLES = [ + "awooop_app", + "awooop_platform_admin", + "awooop_migration", +] + + +@dataclass +class Check: + name: str + status: str + detail: str + + +def add(checks: list[Check], name: str, status: str, detail: str) -> None: + checks.append(Check(name=name, status=status, detail=detail)) + + +async def scalar(conn: Any, sql: str, params: dict[str, Any] | None = None) -> Any: + return await conn.scalar(text(sql), params or {}) + + +async def rows(conn: Any, sql: str, params: dict[str, Any] | None = None) -> list[dict[str, Any]]: + result = await conn.execute(text(sql), params or {}) + return [dict(row._mapping) for row in result.fetchall()] + + +async def collect(exact_counts: bool) -> tuple[list[Check], dict[str, Any]]: + database_url = os.environ.get("DATABASE_URL") + if not database_url: + return [Check("database_url", "BLOCKED", "DATABASE_URL is not set in this environment")], {} + + engine = create_async_engine(database_url, pool_pre_ping=True) + checks: list[Check] = [] + evidence: dict[str, Any] = {} + + async with engine.connect() as conn: + current_role = await rows( + conn, + """ + SELECT + current_user AS current_user, + session_user AS session_user, + r.rolsuper AS current_user_superuser, + r.rolbypassrls AS current_user_bypassrls + FROM pg_roles r + WHERE r.rolname = current_user + """, + ) + evidence["current_role"] = current_role[0] if current_role else {} + role = evidence["current_role"] + if role.get("current_user_superuser") or role.get("current_user_bypassrls"): + add( + checks, + "current_role_rls_enforced", + "BLOCKED", + f"current_user={role.get('current_user')} can bypass RLS", + ) + else: + add( + checks, + "current_role_rls_enforced", + "PASS", + f"current_user={role.get('current_user')} is subject to RLS", + ) + + before = await scalar(conn, "SELECT current_setting('app.project_id', TRUE)") + await scalar(conn, "SELECT set_config('app.project_id', :pid, TRUE)", {"pid": "awoooi"}) + after = await scalar(conn, "SELECT current_setting('app.project_id', TRUE)") + evidence["project_context_probe"] = {"before": before, "after": after} + if after == "awoooi": + add(checks, "project_context_set_config", "PASS", "set_config app.project_id works") + else: + add(checks, "project_context_set_config", "BLOCKED", f"expected awoooi, got {after!r}") + + roles = await rows( + conn, + """ + WITH required_roles(rolname) AS ( + SELECT jsonb_array_elements_text(CAST(:roles_json AS jsonb)) + ) + SELECT + rr.rolname, + r.rolsuper, + r.rolbypassrls, + r.oid IS NOT NULL AS exists + FROM required_roles rr + LEFT JOIN pg_roles r ON r.rolname = rr.rolname + ORDER BY rr.rolname + """, + {"roles_json": json.dumps(REQUIRED_ROLES)}, + ) + evidence["required_roles"] = roles + present_roles = {row["rolname"] for row in roles if row["exists"]} + missing_roles = [role_name for role_name in REQUIRED_ROLES if role_name not in present_roles] + if missing_roles: + add(checks, "required_roles", "BLOCKED", f"missing roles: {', '.join(missing_roles)}") + else: + add(checks, "required_roles", "PASS", "all required RLS roles exist") + + table_rows = await rows( + conn, + """ + WITH target(relname) AS ( + SELECT jsonb_array_elements_text(CAST(:tables_json AS jsonb)) + ), + rels AS ( + SELECT + t.relname, + c.oid, + c.relrowsecurity, + c.relforcerowsecurity, + COALESCE(c.reltuples, 0)::bigint AS estimated_rows + FROM target t + LEFT JOIN pg_class c + ON c.relname = t.relname + AND c.relkind IN ('r', 'p') + AND c.relnamespace = 'public'::regnamespace + ), + project_columns AS ( + SELECT table_name, TRUE AS has_project_id + FROM information_schema.columns + WHERE table_schema = 'public' + AND column_name = 'project_id' + AND table_name IN (SELECT relname FROM target) + ), + policy_stats AS ( + SELECT + p.polrelid, + COUNT(*) AS policy_count, + BOOL_OR( + COALESCE(pg_get_expr(p.polqual, p.polrelid), '') ILIKE '%current_setting(''app.project_id'', true) IS NULL%' + OR COALESCE(pg_get_expr(p.polwithcheck, p.polrelid), '') ILIKE '%current_setting(''app.project_id'', true) IS NULL%' + ) AS has_null_fail_open_policy, + BOOL_OR( + COALESCE(pg_get_expr(p.polqual, p.polrelid), '') ILIKE '%current_setting(''app.project_id'', true) = ''''%' + OR COALESCE(pg_get_expr(p.polwithcheck, p.polrelid), '') ILIKE '%current_setting(''app.project_id'', true) = ''''%' + ) AS has_empty_string_fail_open_policy + FROM pg_policy p + GROUP BY p.polrelid + ) + SELECT + r.relname AS table_name, + r.oid IS NOT NULL AS exists, + COALESCE(pc.has_project_id, FALSE) AS has_project_id, + COALESCE(r.relrowsecurity, FALSE) AS rls_enabled, + COALESCE(r.relforcerowsecurity, FALSE) AS rls_forced, + COALESCE(ps.policy_count, 0) AS policy_count, + COALESCE(ps.has_null_fail_open_policy, FALSE) AS has_null_fail_open_policy, + COALESCE(ps.has_empty_string_fail_open_policy, FALSE) AS has_empty_string_fail_open_policy, + r.estimated_rows + FROM rels r + LEFT JOIN project_columns pc ON pc.table_name = r.relname + LEFT JOIN policy_stats ps ON ps.polrelid = r.oid + ORDER BY r.relname + """, + {"tables_json": json.dumps(TARGET_TABLES)}, + ) + evidence["tables"] = table_rows + + existing = [row for row in table_rows if row["exists"]] + missing_project_id = [row["table_name"] for row in existing if not row["has_project_id"]] + if missing_project_id: + add(checks, "project_id_columns", "BLOCKED", f"missing project_id: {', '.join(missing_project_id)}") + else: + add(checks, "project_id_columns", "PASS", "all existing target tables have project_id") + + rls_missing = [ + row["table_name"] + for row in existing + if not row["rls_enabled"] or not row["rls_forced"] or row["policy_count"] == 0 + ] + if rls_missing: + add( + checks, + "rls_enabled_forced_policy", + "BLOCKED", + f"RLS not fully enabled/forced/policied: {', '.join(rls_missing)}", + ) + else: + add(checks, "rls_enabled_forced_policy", "PASS", "all existing target tables have forced RLS policy") + + fail_open = [ + row["table_name"] + for row in existing + if row["has_null_fail_open_policy"] or row["has_empty_string_fail_open_policy"] + ] + if fail_open: + add(checks, "fail_open_policies", "BLOCKED", f"fail-open policy expressions: {', '.join(fail_open)}") + else: + add(checks, "fail_open_policies", "PASS", "no fail-open policy expressions detected") + + if exact_counts: + exact_rows: list[dict[str, Any]] = [] + for row in existing: + if not row["has_project_id"]: + continue + quoted = '"' + row["table_name"].replace('"', '""') + '"' + count_row = await rows( + conn, + f"SELECT :table_name AS table_name, COUNT(*) AS total_rows, COUNT(*) FILTER (WHERE project_id IS NULL) AS null_project_id_rows FROM {quoted}", + {"table_name": row["table_name"]}, + ) + exact_rows.extend(count_row) + evidence["exact_counts"] = exact_rows + null_tables = [row["table_name"] for row in exact_rows if int(row["null_project_id_rows"]) > 0] + if null_tables: + add(checks, "project_id_backfill", "BLOCKED", f"NULL project_id remains: {', '.join(null_tables)}") + else: + add(checks, "project_id_backfill", "PASS", "no NULL project_id rows in counted tables") + else: + add(checks, "project_id_backfill", "WARN", "exact counts skipped; rerun with --exact-counts before enabling RLS") + + await engine.dispose() + return checks, evidence + + +def print_human(checks: list[Check], evidence: dict[str, Any]) -> None: + blocked = sum(1 for check in checks if check.status == "BLOCKED") + warn = sum(1 for check in checks if check.status == "WARN") + passed = sum(1 for check in checks if check.status == "PASS") + print(f"AwoooP RLS preflight: PASS={passed} WARN={warn} BLOCKED={blocked}") + for check in checks: + print(f"{check.status:<7} {check.name}: {check.detail}") + + role = evidence.get("current_role") or {} + if role: + print( + "role " + f"current_user={role.get('current_user')} " + f"session_user={role.get('session_user')} " + f"superuser={role.get('current_user_superuser')} " + f"bypassrls={role.get('current_user_bypassrls')}" + ) + + for row in evidence.get("tables", []): + print( + "table " + f"{row['table_name']} " + f"exists={row['exists']} " + f"project_id={row['has_project_id']} " + f"rls={row['rls_enabled']} " + f"force={row['rls_forced']} " + f"policies={row['policy_count']} " + f"fail_open_null={row['has_null_fail_open_policy']} " + f"fail_open_empty={row['has_empty_string_fail_open_policy']} " + f"estimated_rows={row['estimated_rows']}" + ) + + for row in evidence.get("exact_counts", []): + print( + "count " + f"{row['table_name']} " + f"total_rows={row['total_rows']} " + f"null_project_id_rows={row['null_project_id_rows']}" + ) + + +async def main() -> int: + parser = argparse.ArgumentParser(description="Run read-only AwoooP RLS preflight checks.") + parser.add_argument("--exact-counts", action="store_true", help="Run exact COUNT(*) checks for project_id backfill.") + parser.add_argument("--json", action="store_true", help="Print JSON instead of human-readable output.") + args = parser.parse_args() + + checks, evidence = await collect(exact_counts=args.exact_counts) + blocked = any(check.status == "BLOCKED" for check in checks) + + if args.json: + print( + json.dumps( + {"checks": [asdict(check) for check in checks], "evidence": evidence}, + ensure_ascii=False, + default=str, + ) + ) + else: + print_human(checks, evidence) + + return 2 if blocked else 0 + + +if __name__ == "__main__": + try: + raise SystemExit(asyncio.run(main())) + except KeyboardInterrupt: + raise SystemExit(130) + except Exception as exc: + print(f"BLOCKED preflight_exception: {exc}", file=sys.stderr) + raise SystemExit(2)