chore(ops): 新增 RLS preflight 與 registry certbot 修復包
All checks were successful
Code Review / ai-code-review (push) Successful in 13s
All checks were successful
Code Review / ai-code-review (push) Successful in 13s
This commit is contained in:
@@ -1,3 +1,56 @@
|
||||
## 2026-05-12 | RLS Preflight 與 188 Registry Certbot 修復包
|
||||
|
||||
**背景**:Wave 1 已確認 production RLS 是 P0,但不可直接熱開;188 `registry.wooo.work` certbot 也已確認失效,但目前 `ollama` SSH 帳號沒有免密 sudo。這輪把兩個紅燈轉成可重跑、可交接、可審批的 remediation 前置包。
|
||||
|
||||
**新增 RLS preflight**:
|
||||
- `scripts/ops/awooop_rls_preflight.py`:
|
||||
- 設計為在 production API pod 內執行,使用 pod-local `DATABASE_URL`,不輸出 DB URL 或密碼。
|
||||
- read-only 檢查 DB role、`set_config('app.project_id')`、target table `project_id` 欄位、RLS enabled/forced/policy、fail-open policy expression。
|
||||
- `--exact-counts` 才執行精確 `COUNT(*)` / `NULL project_id` 掃描。
|
||||
- `scripts/ops/awooop-rls-preflight.sh`:
|
||||
- 預設透過 `wooo@192.168.0.120` 執行 `sudo kubectl -n awoooi-prod exec deployment/awoooi-api -c api -- python -`。
|
||||
- 支援 `--local`、`--json`、`--exact-counts`。
|
||||
- exit `2` 表示 RLS gate blocked,不可啟用 RLS。
|
||||
- `docs/runbooks/AWOOOP-RLS-PREFLIGHT.md`:
|
||||
- 記錄 2026-05-12 production preflight 結果與 remediation order。
|
||||
|
||||
**RLS live preflight 結果**:
|
||||
- `bash scripts/ops/awooop-rls-preflight.sh --exact-counts` → exit `2`,符合 blocked gate。
|
||||
- `PASS=5 WARN=0 BLOCKED=2`。
|
||||
- PASS:
|
||||
- current DB user `awoooi` 不是 superuser / bypassrls。
|
||||
- `set_config('app.project_id', 'awoooi', TRUE)` 可用。
|
||||
- 所有已存在 target tables 都有 `project_id`。
|
||||
- production DB 目前沒有 fail-open policy expression。
|
||||
- exact counts 顯示已存在 target tables `NULL project_id = 0`。
|
||||
- BLOCKED:
|
||||
- `awooop_app`、`awooop_platform_admin`、`awooop_migration` roles 不存在。
|
||||
- target tables 尚未 RLS enabled / forced / policied。
|
||||
- 判讀:下一步不是回填資料,而是 role bootstrap + DB access path audit + staged policy enablement;目前 production app user 是 `awoooi`,policy 設計必須先決定是 grant `awooop_app` membership 還是切 connection role。
|
||||
|
||||
**新增 188 registry certbot 修復包**:
|
||||
- `scripts/ops/188-registry-certbot-fix.sh`:
|
||||
- root-only helper;預設 dry-run,必須 `--apply` 才會改 188。
|
||||
- 建立 `/var/www/certbot`。
|
||||
- 安裝 `/etc/nginx/conf.d/registry-acme-http.conf`,讓 `registry.wooo.work` HTTP-01 不再落到 `aiops.wooo.work` default vhost。
|
||||
- `nginx -t` 後 reload。
|
||||
- 用 `/snap/bin/certbot renew --cert-name registry.wooo.work` renew。
|
||||
- snap certbot 存在時停用 broken apt `certbot.timer` 並 reset failed apt certbot service。
|
||||
- `docs/runbooks/REGISTRY-CERTBOT-188.md`:
|
||||
- 記錄 expired cert、錯誤 route、apt/snap certbot owner split,以及 post-fix 驗證命令。
|
||||
|
||||
**驗證**:
|
||||
- `python3 -m py_compile scripts/ops/awooop_rls_preflight.py` → passed。
|
||||
- `bash -n scripts/ops/awooop-rls-preflight.sh scripts/ops/188-registry-certbot-fix.sh` → passed。
|
||||
- `scripts/ops/188-registry-certbot-fix.sh` dry-run → 印出預期動作,未修改本機或 188。
|
||||
- RLS preflight 已對 production API pod 跑通;blocked 結果符合預期,未改 DB。
|
||||
- 已同步 helper 到 188 `/home/ollama/awoooi-ops/188-registry-certbot-fix.sh`。
|
||||
- 188 remote `bash -n` passed;remote dry-run 印出預期 root actions,未改 Nginx / certbot。
|
||||
|
||||
**下一步**:
|
||||
- 由具 sudo 權限的 operator 在 188 執行 `sudo /home/ollama/awoooi-ops/188-registry-certbot-fix.sh --apply`。
|
||||
- RLS 先做 role bootstrap 設計審查,再產出 batch migration;不可直接套既有 RLS migration。
|
||||
|
||||
## 2026-05-12 | Wave 1 Claude P0 紅燈驗證與 GitHub CD 封堵
|
||||
|
||||
**背景**:Claude Code 盤點只能作為候選清單,必須逐項用 production DB、主機狀態、provider logs、repo artifacts 驗證;本輪先處理可快速證實且風險高的紅燈。
|
||||
|
||||
88
docs/runbooks/AWOOOP-RLS-PREFLIGHT.md
Normal file
88
docs/runbooks/AWOOOP-RLS-PREFLIGHT.md
Normal file
@@ -0,0 +1,88 @@
|
||||
# AwoooP RLS Preflight Runbook
|
||||
|
||||
> Purpose: verify whether production is ready for PostgreSQL Row-Level Security
|
||||
> without enabling RLS or changing data.
|
||||
|
||||
## Command
|
||||
|
||||
Default path runs the probe inside the production API pod through the 120
|
||||
control-plane host. `DATABASE_URL` stays inside Kubernetes and is not printed.
|
||||
|
||||
```bash
|
||||
bash scripts/ops/awooop-rls-preflight.sh
|
||||
```
|
||||
|
||||
Before enabling RLS, run exact backfill counts:
|
||||
|
||||
```bash
|
||||
bash scripts/ops/awooop-rls-preflight.sh --exact-counts
|
||||
```
|
||||
|
||||
Useful variants:
|
||||
|
||||
```bash
|
||||
bash scripts/ops/awooop-rls-preflight.sh --json
|
||||
bash scripts/ops/awooop-rls-preflight.sh --local
|
||||
AWOOOP_RLS_SSH_TARGET=wooo@192.168.0.120 bash scripts/ops/awooop-rls-preflight.sh
|
||||
```
|
||||
|
||||
Exit code `2` means the gate is blocked and RLS must not be enabled yet.
|
||||
|
||||
## 2026-05-12 Production Result
|
||||
|
||||
`--exact-counts` returned:
|
||||
|
||||
- `PASS current_role_rls_enforced`: current DB user is `awoooi`, not superuser and not `BYPASSRLS`.
|
||||
- `PASS project_context_set_config`: `set_config('app.project_id', 'awoooi', TRUE)` works in the API pod.
|
||||
- `BLOCKED required_roles`: `awooop_app`, `awooop_platform_admin`, and `awooop_migration` do not exist.
|
||||
- `PASS project_id_columns`: every existing target table has `project_id`.
|
||||
- `BLOCKED rls_enabled_forced_policy`: existing target tables are not yet RLS enabled, forced, or policied.
|
||||
- `PASS fail_open_policies`: production DB currently has no fail-open policy expressions.
|
||||
- `PASS project_id_backfill`: exact counts found zero `NULL project_id` rows in counted target tables.
|
||||
|
||||
Current blocker summary:
|
||||
|
||||
```text
|
||||
PASS=5 WARN=0 BLOCKED=2
|
||||
```
|
||||
|
||||
Important exact counts from the same run:
|
||||
|
||||
| Table | Rows | NULL project_id |
|
||||
| --- | ---: | ---: |
|
||||
| `audit_logs` | 686 | 0 |
|
||||
| `awooop_mcp_tool_registry` | 4 | 0 |
|
||||
| `awooop_outbound_message` | 228 | 0 |
|
||||
| `awooop_projects` | 2 | 0 |
|
||||
| `awooop_run_state` | 106 | 0 |
|
||||
| `incidents` | 1518 | 0 |
|
||||
| `knowledge_entries` | 2099 | 0 |
|
||||
| `playbooks` | 220 | 0 |
|
||||
|
||||
## Remediation Order
|
||||
|
||||
1. Create or reconcile RLS roles.
|
||||
- Current production app user is `awoooi`; policy design must either grant it
|
||||
membership in `awooop_app` or update the application connection role before
|
||||
policies are enforced.
|
||||
- Do not create passworded LOGIN roles in a migration unless the K8s Secret
|
||||
rotation path is ready.
|
||||
2. Verify all DB access paths use `get_db()` / `get_db_context()` or otherwise set
|
||||
`app.project_id` before queries.
|
||||
3. Apply policies first in staging or a canary DB.
|
||||
4. In production, enable one batch at a time.
|
||||
5. After each batch, run:
|
||||
|
||||
```bash
|
||||
bash scripts/ops/awooop-rls-preflight.sh --exact-counts
|
||||
```
|
||||
|
||||
6. Validate AwoooP Runs, Approvals, Monitoring, Tickets, Cost, alert ingestion,
|
||||
background workers, and TelegramGateway mirror paths.
|
||||
|
||||
## Do Not
|
||||
|
||||
- Do not enable all policies in production before the role path is decided.
|
||||
- Do not rely on fail-open `IS NULL` or empty-string policies as the target state.
|
||||
- Do not run destructive rollback SQL unless the incident commander explicitly
|
||||
approves it.
|
||||
62
docs/runbooks/REGISTRY-CERTBOT-188.md
Normal file
62
docs/runbooks/REGISTRY-CERTBOT-188.md
Normal file
@@ -0,0 +1,62 @@
|
||||
# 188 Registry Certbot Recovery
|
||||
|
||||
> Scope: `registry.wooo.work` on host `192.168.0.188`.
|
||||
|
||||
## Verified State On 2026-05-12
|
||||
|
||||
- `registry.wooo.work` certificate expired at `May 8 04:16:08 2026 GMT`.
|
||||
- HTTP-01 route check:
|
||||
|
||||
```text
|
||||
http://registry.wooo.work/.well-known/acme-challenge/codex-route-check
|
||||
-> 301 https://aiops.wooo.work/.well-known/acme-challenge/codex-route-check
|
||||
-> 404
|
||||
```
|
||||
|
||||
- `/usr/bin/certbot` is broken by Python/OpenSSL mismatch.
|
||||
- `/snap/bin/certbot` exists and should be the renewal owner.
|
||||
- Both apt `certbot.timer` and snap `snap.certbot.renew.timer` were enabled.
|
||||
- The `ollama` SSH user is in sudo group but has no passwordless sudo in this
|
||||
session, so Codex could not apply the root-level fix directly.
|
||||
|
||||
## Fix Script
|
||||
|
||||
The repo includes a root-only helper. It is dry-run by default:
|
||||
|
||||
```bash
|
||||
bash scripts/ops/188-registry-certbot-fix.sh
|
||||
```
|
||||
|
||||
To apply on 188:
|
||||
|
||||
```bash
|
||||
sudo bash /home/ollama/awoooi-ops/188-registry-certbot-fix.sh --apply
|
||||
```
|
||||
|
||||
The script:
|
||||
|
||||
- creates `/var/www/certbot`;
|
||||
- installs `/etc/nginx/conf.d/registry-acme-http.conf`;
|
||||
- routes `registry.wooo.work` HTTP-01 to `/var/www/certbot`;
|
||||
- reloads Nginx after `nginx -t`;
|
||||
- renews `registry.wooo.work` via `/snap/bin/certbot`;
|
||||
- disables the broken apt `certbot.timer` when snap certbot is present;
|
||||
- prints the renewed certificate dates.
|
||||
|
||||
## Post-Fix Verification
|
||||
|
||||
Run from any host with network access:
|
||||
|
||||
```bash
|
||||
curl -sI --max-redirs 0 http://registry.wooo.work/.well-known/acme-challenge/codex-route-check
|
||||
openssl s_client -servername registry.wooo.work -connect registry.wooo.work:443 </dev/null 2>/dev/null \
|
||||
| openssl x509 -noout -subject -issuer -dates
|
||||
```
|
||||
|
||||
Expected:
|
||||
|
||||
- HTTP challenge path returns `404` from the `registry.wooo.work` vhost, not a
|
||||
redirect to `aiops.wooo.work`.
|
||||
- `notAfter` is renewed to a future date.
|
||||
- `systemctl --failed` no longer lists apt `certbot.service` after failed state
|
||||
reset.
|
||||
117
scripts/ops/188-registry-certbot-fix.sh
Executable file
117
scripts/ops/188-registry-certbot-fix.sh
Executable file
@@ -0,0 +1,117 @@
|
||||
#!/usr/bin/env bash
|
||||
# Repair helper for 188 registry.wooo.work HTTP-01 renewal.
|
||||
# Default is dry-run. Use --apply on 188 as root after reviewing the plan.
|
||||
set -euo pipefail
|
||||
|
||||
APPLY=0
|
||||
DOMAIN="${REGISTRY_CERTBOT_DOMAIN:-registry.wooo.work}"
|
||||
WEBROOT="${REGISTRY_CERTBOT_WEBROOT:-/var/www/certbot}"
|
||||
NGINX_SNIPPET="${REGISTRY_CERTBOT_NGINX_SNIPPET:-/etc/nginx/conf.d/registry-acme-http.conf}"
|
||||
CERTBOT_BIN="${REGISTRY_CERTBOT_BIN:-/snap/bin/certbot}"
|
||||
|
||||
usage() {
|
||||
cat <<'USAGE'
|
||||
Usage: sudo bash scripts/ops/188-registry-certbot-fix.sh [--apply]
|
||||
|
||||
Fixes the known 188 drift where registry.wooo.work HTTP-01 traffic falls through
|
||||
to the aiops.wooo.work default server and certbot cannot renew the registry cert.
|
||||
|
||||
Default mode is dry-run and prints the exact actions. --apply requires root.
|
||||
|
||||
Environment:
|
||||
REGISTRY_CERTBOT_DOMAIN Default: registry.wooo.work
|
||||
REGISTRY_CERTBOT_WEBROOT Default: /var/www/certbot
|
||||
REGISTRY_CERTBOT_NGINX_SNIPPET Default: /etc/nginx/conf.d/registry-acme-http.conf
|
||||
REGISTRY_CERTBOT_BIN Default: /snap/bin/certbot
|
||||
USAGE
|
||||
}
|
||||
|
||||
while [ "$#" -gt 0 ]; do
|
||||
case "$1" in
|
||||
--apply)
|
||||
APPLY=1
|
||||
;;
|
||||
-h|--help)
|
||||
usage
|
||||
exit 0
|
||||
;;
|
||||
*)
|
||||
echo "Unknown argument: $1" >&2
|
||||
usage >&2
|
||||
exit 64
|
||||
;;
|
||||
esac
|
||||
shift
|
||||
done
|
||||
|
||||
run() {
|
||||
if [ "$APPLY" -eq 1 ]; then
|
||||
"$@"
|
||||
else
|
||||
printf 'DRY-RUN:'
|
||||
printf ' %q' "$@"
|
||||
printf '\n'
|
||||
fi
|
||||
}
|
||||
|
||||
write_snippet() {
|
||||
local tmp
|
||||
tmp="$(mktemp)"
|
||||
cat > "$tmp" <<EOF
|
||||
# Managed by AWOOOI registry certbot repair.
|
||||
# LetsEncrypt HTTP-01 must not fall through to aiops.wooo.work.
|
||||
server {
|
||||
listen 80;
|
||||
server_name ${DOMAIN};
|
||||
|
||||
location /.well-known/acme-challenge/ {
|
||||
root ${WEBROOT};
|
||||
default_type "text/plain";
|
||||
}
|
||||
|
||||
location / {
|
||||
return 301 https://\$host\$request_uri;
|
||||
}
|
||||
}
|
||||
EOF
|
||||
run install -m 0644 "$tmp" "$NGINX_SNIPPET"
|
||||
rm -f "$tmp"
|
||||
}
|
||||
|
||||
if [ "$APPLY" -eq 1 ] && [ "$(id -u)" -ne 0 ]; then
|
||||
echo "--apply must be run as root on 188" >&2
|
||||
exit 77
|
||||
fi
|
||||
|
||||
if [ "$APPLY" -eq 1 ] && [ ! -x "$CERTBOT_BIN" ]; then
|
||||
echo "certbot binary not executable: $CERTBOT_BIN" >&2
|
||||
exit 69
|
||||
fi
|
||||
|
||||
echo "Plan: repair HTTP-01 route for ${DOMAIN}, renew via ${CERTBOT_BIN}, reload nginx."
|
||||
run install -d -m 0755 "$WEBROOT"
|
||||
write_snippet
|
||||
run nginx -t
|
||||
run systemctl reload nginx
|
||||
|
||||
if [ "$APPLY" -eq 1 ]; then
|
||||
code="$(curl -s -o /dev/null -w '%{http_code}' --max-time 8 "http://${DOMAIN}/.well-known/acme-challenge/codex-route-check" || true)"
|
||||
if [ "$code" != "404" ]; then
|
||||
echo "Unexpected ACME route status after nginx reload: ${code}; expected 404 from ${DOMAIN}, not redirect/default vhost" >&2
|
||||
exit 1
|
||||
fi
|
||||
fi
|
||||
|
||||
run "$CERTBOT_BIN" renew --cert-name "$DOMAIN" --deploy-hook "systemctl reload nginx"
|
||||
|
||||
if [ -x /snap/bin/certbot ]; then
|
||||
run systemctl disable --now certbot.timer
|
||||
run systemctl reset-failed certbot.service
|
||||
fi
|
||||
|
||||
if [ "$APPLY" -eq 1 ]; then
|
||||
openssl x509 -noout -subject -issuer -dates -in "/etc/letsencrypt/live/${DOMAIN}/fullchain.pem"
|
||||
systemctl status snap.certbot.renew.timer --no-pager -l | sed -n '1,25p' || true
|
||||
else
|
||||
echo "Dry-run only. Re-run with --apply on 188 as root to execute."
|
||||
fi
|
||||
100
scripts/ops/awooop-rls-preflight.sh
Executable file
100
scripts/ops/awooop-rls-preflight.sh
Executable file
@@ -0,0 +1,100 @@
|
||||
#!/usr/bin/env bash
|
||||
# Read-only AwoooP RLS preflight runner.
|
||||
#
|
||||
# Default path runs inside the production API pod through the 120 control-plane
|
||||
# host, so DATABASE_URL stays inside Kubernetes and is never printed locally.
|
||||
set -euo pipefail
|
||||
|
||||
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
|
||||
PY_SCRIPT="${SCRIPT_DIR}/awooop_rls_preflight.py"
|
||||
|
||||
NAMESPACE="${AWOOOP_RLS_NAMESPACE:-awoooi-prod}"
|
||||
DEPLOYMENT="${AWOOOP_RLS_DEPLOYMENT:-deployment/awoooi-api}"
|
||||
CONTAINER="${AWOOOP_RLS_CONTAINER:-api}"
|
||||
SSH_TARGET="${AWOOOP_RLS_SSH_TARGET:-wooo@192.168.0.120}"
|
||||
REMOTE_KUBECTL="${AWOOOP_RLS_REMOTE_KUBECTL:-sudo kubectl}"
|
||||
KUBECTL="${AWOOOP_RLS_KUBECTL:-kubectl}"
|
||||
USE_SSH=1
|
||||
PY_ARGS=()
|
||||
SSH_OPTS=(-o BatchMode=yes -o ConnectTimeout=8)
|
||||
|
||||
usage() {
|
||||
cat <<'USAGE'
|
||||
Usage: bash scripts/ops/awooop-rls-preflight.sh [options]
|
||||
|
||||
Read-only checks for AwoooP PostgreSQL RLS readiness. The script runs the Python
|
||||
probe inside the API pod and exits 2 when RLS is not ready to enable.
|
||||
|
||||
Options:
|
||||
--exact-counts Run exact COUNT(*) project_id backfill checks.
|
||||
--json Print JSON output from the pod.
|
||||
--local Use local kubectl instead of SSH to 120.
|
||||
--ssh USER@HOST Override SSH target. Default: wooo@192.168.0.120.
|
||||
-h, --help Show this help.
|
||||
|
||||
Environment:
|
||||
AWOOOP_RLS_NAMESPACE Default: awoooi-prod
|
||||
AWOOOP_RLS_DEPLOYMENT Default: deployment/awoooi-api
|
||||
AWOOOP_RLS_CONTAINER Default: api
|
||||
AWOOOP_RLS_REMOTE_KUBECTL Default: sudo kubectl
|
||||
AWOOOP_RLS_KUBECTL Default: kubectl
|
||||
USAGE
|
||||
}
|
||||
|
||||
while [ "$#" -gt 0 ]; do
|
||||
case "$1" in
|
||||
--exact-counts)
|
||||
PY_ARGS+=(--exact-counts)
|
||||
;;
|
||||
--json)
|
||||
PY_ARGS+=(--json)
|
||||
;;
|
||||
--local)
|
||||
USE_SSH=0
|
||||
;;
|
||||
--ssh)
|
||||
shift
|
||||
SSH_TARGET="${1:-}"
|
||||
if [ -z "$SSH_TARGET" ]; then
|
||||
echo "--ssh requires USER@HOST" >&2
|
||||
exit 64
|
||||
fi
|
||||
USE_SSH=1
|
||||
;;
|
||||
-h|--help)
|
||||
usage
|
||||
exit 0
|
||||
;;
|
||||
*)
|
||||
echo "Unknown argument: $1" >&2
|
||||
usage >&2
|
||||
exit 64
|
||||
;;
|
||||
esac
|
||||
shift
|
||||
done
|
||||
|
||||
if [ ! -f "$PY_SCRIPT" ]; then
|
||||
echo "Missing Python probe: $PY_SCRIPT" >&2
|
||||
exit 66
|
||||
fi
|
||||
|
||||
if [ "$USE_SSH" -eq 1 ]; then
|
||||
printf -v namespace_q "%q" "$NAMESPACE"
|
||||
printf -v deployment_q "%q" "$DEPLOYMENT"
|
||||
printf -v container_q "%q" "$CONTAINER"
|
||||
remote_cmd="${REMOTE_KUBECTL} -n ${namespace_q} exec -i ${deployment_q} -c ${container_q} -- python -"
|
||||
if [ "${#PY_ARGS[@]}" -gt 0 ]; then
|
||||
for arg in "${PY_ARGS[@]}"; do
|
||||
printf -v arg_q "%q" "$arg"
|
||||
remote_cmd="${remote_cmd} ${arg_q}"
|
||||
done
|
||||
fi
|
||||
ssh "${SSH_OPTS[@]}" "$SSH_TARGET" "$remote_cmd" < "$PY_SCRIPT"
|
||||
else
|
||||
if [ "${#PY_ARGS[@]}" -gt 0 ]; then
|
||||
"$KUBECTL" -n "$NAMESPACE" exec -i "$DEPLOYMENT" -c "$CONTAINER" -- python - "${PY_ARGS[@]}" < "$PY_SCRIPT"
|
||||
else
|
||||
"$KUBECTL" -n "$NAMESPACE" exec -i "$DEPLOYMENT" -c "$CONTAINER" -- python - < "$PY_SCRIPT"
|
||||
fi
|
||||
fi
|
||||
332
scripts/ops/awooop_rls_preflight.py
Executable file
332
scripts/ops/awooop_rls_preflight.py
Executable file
@@ -0,0 +1,332 @@
|
||||
#!/usr/bin/env python3
|
||||
"""
|
||||
Read-only AwoooP RLS preflight.
|
||||
|
||||
This script is designed to run inside the production API pod. It uses the
|
||||
pod-local DATABASE_URL and never prints the URL or credentials.
|
||||
"""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
import argparse
|
||||
import asyncio
|
||||
import json
|
||||
import os
|
||||
import sys
|
||||
from dataclasses import asdict, dataclass
|
||||
from typing import Any
|
||||
|
||||
from sqlalchemy import text
|
||||
from sqlalchemy.ext.asyncio import create_async_engine
|
||||
|
||||
|
||||
TARGET_TABLES = [
|
||||
"incidents",
|
||||
"knowledge_entries",
|
||||
"playbooks",
|
||||
"audit_logs",
|
||||
"budget_ledger",
|
||||
"awooop_projects",
|
||||
"awooop_contracts",
|
||||
"awooop_contract_revisions",
|
||||
"awooop_published_contracts",
|
||||
"awooop_run_state",
|
||||
"awooop_run_event",
|
||||
"awooop_cost_ledger",
|
||||
"awooop_mcp_tool_registry",
|
||||
"awooop_mcp_grants",
|
||||
"awooop_mcp_credential_refs",
|
||||
"awooop_mcp_gateway_audit",
|
||||
"awooop_conversation_event",
|
||||
"awooop_outbound_message",
|
||||
]
|
||||
|
||||
REQUIRED_ROLES = [
|
||||
"awooop_app",
|
||||
"awooop_platform_admin",
|
||||
"awooop_migration",
|
||||
]
|
||||
|
||||
|
||||
@dataclass
|
||||
class Check:
|
||||
name: str
|
||||
status: str
|
||||
detail: str
|
||||
|
||||
|
||||
def add(checks: list[Check], name: str, status: str, detail: str) -> None:
|
||||
checks.append(Check(name=name, status=status, detail=detail))
|
||||
|
||||
|
||||
async def scalar(conn: Any, sql: str, params: dict[str, Any] | None = None) -> Any:
|
||||
return await conn.scalar(text(sql), params or {})
|
||||
|
||||
|
||||
async def rows(conn: Any, sql: str, params: dict[str, Any] | None = None) -> list[dict[str, Any]]:
|
||||
result = await conn.execute(text(sql), params or {})
|
||||
return [dict(row._mapping) for row in result.fetchall()]
|
||||
|
||||
|
||||
async def collect(exact_counts: bool) -> tuple[list[Check], dict[str, Any]]:
|
||||
database_url = os.environ.get("DATABASE_URL")
|
||||
if not database_url:
|
||||
return [Check("database_url", "BLOCKED", "DATABASE_URL is not set in this environment")], {}
|
||||
|
||||
engine = create_async_engine(database_url, pool_pre_ping=True)
|
||||
checks: list[Check] = []
|
||||
evidence: dict[str, Any] = {}
|
||||
|
||||
async with engine.connect() as conn:
|
||||
current_role = await rows(
|
||||
conn,
|
||||
"""
|
||||
SELECT
|
||||
current_user AS current_user,
|
||||
session_user AS session_user,
|
||||
r.rolsuper AS current_user_superuser,
|
||||
r.rolbypassrls AS current_user_bypassrls
|
||||
FROM pg_roles r
|
||||
WHERE r.rolname = current_user
|
||||
""",
|
||||
)
|
||||
evidence["current_role"] = current_role[0] if current_role else {}
|
||||
role = evidence["current_role"]
|
||||
if role.get("current_user_superuser") or role.get("current_user_bypassrls"):
|
||||
add(
|
||||
checks,
|
||||
"current_role_rls_enforced",
|
||||
"BLOCKED",
|
||||
f"current_user={role.get('current_user')} can bypass RLS",
|
||||
)
|
||||
else:
|
||||
add(
|
||||
checks,
|
||||
"current_role_rls_enforced",
|
||||
"PASS",
|
||||
f"current_user={role.get('current_user')} is subject to RLS",
|
||||
)
|
||||
|
||||
before = await scalar(conn, "SELECT current_setting('app.project_id', TRUE)")
|
||||
await scalar(conn, "SELECT set_config('app.project_id', :pid, TRUE)", {"pid": "awoooi"})
|
||||
after = await scalar(conn, "SELECT current_setting('app.project_id', TRUE)")
|
||||
evidence["project_context_probe"] = {"before": before, "after": after}
|
||||
if after == "awoooi":
|
||||
add(checks, "project_context_set_config", "PASS", "set_config app.project_id works")
|
||||
else:
|
||||
add(checks, "project_context_set_config", "BLOCKED", f"expected awoooi, got {after!r}")
|
||||
|
||||
roles = await rows(
|
||||
conn,
|
||||
"""
|
||||
WITH required_roles(rolname) AS (
|
||||
SELECT jsonb_array_elements_text(CAST(:roles_json AS jsonb))
|
||||
)
|
||||
SELECT
|
||||
rr.rolname,
|
||||
r.rolsuper,
|
||||
r.rolbypassrls,
|
||||
r.oid IS NOT NULL AS exists
|
||||
FROM required_roles rr
|
||||
LEFT JOIN pg_roles r ON r.rolname = rr.rolname
|
||||
ORDER BY rr.rolname
|
||||
""",
|
||||
{"roles_json": json.dumps(REQUIRED_ROLES)},
|
||||
)
|
||||
evidence["required_roles"] = roles
|
||||
present_roles = {row["rolname"] for row in roles if row["exists"]}
|
||||
missing_roles = [role_name for role_name in REQUIRED_ROLES if role_name not in present_roles]
|
||||
if missing_roles:
|
||||
add(checks, "required_roles", "BLOCKED", f"missing roles: {', '.join(missing_roles)}")
|
||||
else:
|
||||
add(checks, "required_roles", "PASS", "all required RLS roles exist")
|
||||
|
||||
table_rows = await rows(
|
||||
conn,
|
||||
"""
|
||||
WITH target(relname) AS (
|
||||
SELECT jsonb_array_elements_text(CAST(:tables_json AS jsonb))
|
||||
),
|
||||
rels AS (
|
||||
SELECT
|
||||
t.relname,
|
||||
c.oid,
|
||||
c.relrowsecurity,
|
||||
c.relforcerowsecurity,
|
||||
COALESCE(c.reltuples, 0)::bigint AS estimated_rows
|
||||
FROM target t
|
||||
LEFT JOIN pg_class c
|
||||
ON c.relname = t.relname
|
||||
AND c.relkind IN ('r', 'p')
|
||||
AND c.relnamespace = 'public'::regnamespace
|
||||
),
|
||||
project_columns AS (
|
||||
SELECT table_name, TRUE AS has_project_id
|
||||
FROM information_schema.columns
|
||||
WHERE table_schema = 'public'
|
||||
AND column_name = 'project_id'
|
||||
AND table_name IN (SELECT relname FROM target)
|
||||
),
|
||||
policy_stats AS (
|
||||
SELECT
|
||||
p.polrelid,
|
||||
COUNT(*) AS policy_count,
|
||||
BOOL_OR(
|
||||
COALESCE(pg_get_expr(p.polqual, p.polrelid), '') ILIKE '%current_setting(''app.project_id'', true) IS NULL%'
|
||||
OR COALESCE(pg_get_expr(p.polwithcheck, p.polrelid), '') ILIKE '%current_setting(''app.project_id'', true) IS NULL%'
|
||||
) AS has_null_fail_open_policy,
|
||||
BOOL_OR(
|
||||
COALESCE(pg_get_expr(p.polqual, p.polrelid), '') ILIKE '%current_setting(''app.project_id'', true) = ''''%'
|
||||
OR COALESCE(pg_get_expr(p.polwithcheck, p.polrelid), '') ILIKE '%current_setting(''app.project_id'', true) = ''''%'
|
||||
) AS has_empty_string_fail_open_policy
|
||||
FROM pg_policy p
|
||||
GROUP BY p.polrelid
|
||||
)
|
||||
SELECT
|
||||
r.relname AS table_name,
|
||||
r.oid IS NOT NULL AS exists,
|
||||
COALESCE(pc.has_project_id, FALSE) AS has_project_id,
|
||||
COALESCE(r.relrowsecurity, FALSE) AS rls_enabled,
|
||||
COALESCE(r.relforcerowsecurity, FALSE) AS rls_forced,
|
||||
COALESCE(ps.policy_count, 0) AS policy_count,
|
||||
COALESCE(ps.has_null_fail_open_policy, FALSE) AS has_null_fail_open_policy,
|
||||
COALESCE(ps.has_empty_string_fail_open_policy, FALSE) AS has_empty_string_fail_open_policy,
|
||||
r.estimated_rows
|
||||
FROM rels r
|
||||
LEFT JOIN project_columns pc ON pc.table_name = r.relname
|
||||
LEFT JOIN policy_stats ps ON ps.polrelid = r.oid
|
||||
ORDER BY r.relname
|
||||
""",
|
||||
{"tables_json": json.dumps(TARGET_TABLES)},
|
||||
)
|
||||
evidence["tables"] = table_rows
|
||||
|
||||
existing = [row for row in table_rows if row["exists"]]
|
||||
missing_project_id = [row["table_name"] for row in existing if not row["has_project_id"]]
|
||||
if missing_project_id:
|
||||
add(checks, "project_id_columns", "BLOCKED", f"missing project_id: {', '.join(missing_project_id)}")
|
||||
else:
|
||||
add(checks, "project_id_columns", "PASS", "all existing target tables have project_id")
|
||||
|
||||
rls_missing = [
|
||||
row["table_name"]
|
||||
for row in existing
|
||||
if not row["rls_enabled"] or not row["rls_forced"] or row["policy_count"] == 0
|
||||
]
|
||||
if rls_missing:
|
||||
add(
|
||||
checks,
|
||||
"rls_enabled_forced_policy",
|
||||
"BLOCKED",
|
||||
f"RLS not fully enabled/forced/policied: {', '.join(rls_missing)}",
|
||||
)
|
||||
else:
|
||||
add(checks, "rls_enabled_forced_policy", "PASS", "all existing target tables have forced RLS policy")
|
||||
|
||||
fail_open = [
|
||||
row["table_name"]
|
||||
for row in existing
|
||||
if row["has_null_fail_open_policy"] or row["has_empty_string_fail_open_policy"]
|
||||
]
|
||||
if fail_open:
|
||||
add(checks, "fail_open_policies", "BLOCKED", f"fail-open policy expressions: {', '.join(fail_open)}")
|
||||
else:
|
||||
add(checks, "fail_open_policies", "PASS", "no fail-open policy expressions detected")
|
||||
|
||||
if exact_counts:
|
||||
exact_rows: list[dict[str, Any]] = []
|
||||
for row in existing:
|
||||
if not row["has_project_id"]:
|
||||
continue
|
||||
quoted = '"' + row["table_name"].replace('"', '""') + '"'
|
||||
count_row = await rows(
|
||||
conn,
|
||||
f"SELECT :table_name AS table_name, COUNT(*) AS total_rows, COUNT(*) FILTER (WHERE project_id IS NULL) AS null_project_id_rows FROM {quoted}",
|
||||
{"table_name": row["table_name"]},
|
||||
)
|
||||
exact_rows.extend(count_row)
|
||||
evidence["exact_counts"] = exact_rows
|
||||
null_tables = [row["table_name"] for row in exact_rows if int(row["null_project_id_rows"]) > 0]
|
||||
if null_tables:
|
||||
add(checks, "project_id_backfill", "BLOCKED", f"NULL project_id remains: {', '.join(null_tables)}")
|
||||
else:
|
||||
add(checks, "project_id_backfill", "PASS", "no NULL project_id rows in counted tables")
|
||||
else:
|
||||
add(checks, "project_id_backfill", "WARN", "exact counts skipped; rerun with --exact-counts before enabling RLS")
|
||||
|
||||
await engine.dispose()
|
||||
return checks, evidence
|
||||
|
||||
|
||||
def print_human(checks: list[Check], evidence: dict[str, Any]) -> None:
|
||||
blocked = sum(1 for check in checks if check.status == "BLOCKED")
|
||||
warn = sum(1 for check in checks if check.status == "WARN")
|
||||
passed = sum(1 for check in checks if check.status == "PASS")
|
||||
print(f"AwoooP RLS preflight: PASS={passed} WARN={warn} BLOCKED={blocked}")
|
||||
for check in checks:
|
||||
print(f"{check.status:<7} {check.name}: {check.detail}")
|
||||
|
||||
role = evidence.get("current_role") or {}
|
||||
if role:
|
||||
print(
|
||||
"role "
|
||||
f"current_user={role.get('current_user')} "
|
||||
f"session_user={role.get('session_user')} "
|
||||
f"superuser={role.get('current_user_superuser')} "
|
||||
f"bypassrls={role.get('current_user_bypassrls')}"
|
||||
)
|
||||
|
||||
for row in evidence.get("tables", []):
|
||||
print(
|
||||
"table "
|
||||
f"{row['table_name']} "
|
||||
f"exists={row['exists']} "
|
||||
f"project_id={row['has_project_id']} "
|
||||
f"rls={row['rls_enabled']} "
|
||||
f"force={row['rls_forced']} "
|
||||
f"policies={row['policy_count']} "
|
||||
f"fail_open_null={row['has_null_fail_open_policy']} "
|
||||
f"fail_open_empty={row['has_empty_string_fail_open_policy']} "
|
||||
f"estimated_rows={row['estimated_rows']}"
|
||||
)
|
||||
|
||||
for row in evidence.get("exact_counts", []):
|
||||
print(
|
||||
"count "
|
||||
f"{row['table_name']} "
|
||||
f"total_rows={row['total_rows']} "
|
||||
f"null_project_id_rows={row['null_project_id_rows']}"
|
||||
)
|
||||
|
||||
|
||||
async def main() -> int:
|
||||
parser = argparse.ArgumentParser(description="Run read-only AwoooP RLS preflight checks.")
|
||||
parser.add_argument("--exact-counts", action="store_true", help="Run exact COUNT(*) checks for project_id backfill.")
|
||||
parser.add_argument("--json", action="store_true", help="Print JSON instead of human-readable output.")
|
||||
args = parser.parse_args()
|
||||
|
||||
checks, evidence = await collect(exact_counts=args.exact_counts)
|
||||
blocked = any(check.status == "BLOCKED" for check in checks)
|
||||
|
||||
if args.json:
|
||||
print(
|
||||
json.dumps(
|
||||
{"checks": [asdict(check) for check in checks], "evidence": evidence},
|
||||
ensure_ascii=False,
|
||||
default=str,
|
||||
)
|
||||
)
|
||||
else:
|
||||
print_human(checks, evidence)
|
||||
|
||||
return 2 if blocked else 0
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
try:
|
||||
raise SystemExit(asyncio.run(main()))
|
||||
except KeyboardInterrupt:
|
||||
raise SystemExit(130)
|
||||
except Exception as exc:
|
||||
print(f"BLOCKED preflight_exception: {exc}", file=sys.stderr)
|
||||
raise SystemExit(2)
|
||||
Reference in New Issue
Block a user