From 84cfdb61956284550ae5445f59b14fba6bdee622 Mon Sep 17 00:00:00 2001 From: OG T Date: Sun, 5 Apr 2026 01:01:58 +0800 Subject: [PATCH] =?UTF-8?q?docs(backup):=20=E5=82=99=E4=BB=BD=E5=AF=A9?= =?UTF-8?q?=E8=A8=88=E5=AE=8C=E6=95=B4=E7=9B=A4=E9=BB=9E=20+=20=E6=96=B0?= =?UTF-8?q?=E5=A2=9E=20AWOOOI=20DB=20=E8=88=87=20Gitea=20DB=20=E5=82=99?= =?UTF-8?q?=E4=BB=BD=E8=85=B3=E6=9C=AC?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit 首席架構師備份審計結論: - awoooi_prod PostgreSQL:❌ 無備份 (P0 缺口) - Gitea SQLite DB:❌ 無備份 (今日已損壞,人工修復耗時 2h+) 新增: - scripts/backup/backup-awoooi-db.sh (188 部署,02:00 daily) - scripts/backup/backup-gitea-db.sh (110 部署,01:00 daily) - docs/runbooks/BACKUP-STATUS.md (全景表 + 部署步驟 + SOP) - LOGBOOK.md 備份審計段落 待手動部署:統帥需 scp 腳本至 188/110 並設定 crontab Co-Authored-By: Claude Sonnet 4.6 --- docs/LOGBOOK.md | 19 ++++ docs/runbooks/BACKUP-STATUS.md | 150 +++++++++++++++++++++++++++++ scripts/backup/backup-awoooi-db.sh | 39 ++++++++ scripts/backup/backup-gitea-db.sh | 42 ++++++++ 4 files changed, 250 insertions(+) create mode 100644 docs/runbooks/BACKUP-STATUS.md create mode 100644 scripts/backup/backup-awoooi-db.sh create mode 100644 scripts/backup/backup-gitea-db.sh diff --git a/docs/LOGBOOK.md b/docs/LOGBOOK.md index 3161135b..759c60a6 100644 --- a/docs/LOGBOOK.md +++ b/docs/LOGBOOK.md @@ -5,6 +5,25 @@ --- +## 📍 當前狀態 (2026-04-05 備份審計完成 + ADR-057 實作 + MinIO 修復) + +| 項目 | 狀態 | 說明 | +|------|------|------| +| ADR-057 adopt() Gitea PR API 實作 | ✅ a81bf50 | drift_adopt_service.py 建立 | +| GITEA_API_TOKEN 注入 K8s Secret | ✅ | kubectl patch awoooi-secrets | +| MinIO 啟動(Velero BSL Available) | ✅ | docker compose up -d + 加入 startup.sh | +| Gitea SQLite DB 損壞修復 | ✅ | sqlite3 .recover 救回,454 Actions runs | +| 備份審計 — 全景盤點 | ✅ | BACKUP-STATUS.md 建立 | +| AWOOOI PostgreSQL 備份腳本建立 | ✅ | scripts/backup/backup-awoooi-db.sh | +| Gitea SQLite 備份腳本建立 | ✅ | scripts/backup/backup-gitea-db.sh | +| **AWOOOI PostgreSQL 備份部署 (188)** | ⚠️ 待手動 | 見 BACKUP-STATUS.md 部署步驟 | +| **Gitea DB 備份部署 (110)** | ⚠️ 待手動 | 見 BACKUP-STATUS.md 部署步驟 | + +**備份缺口**: awoooi_prod + Gitea DB 均無自動備份 → P0 待統帥手動部署 +**文件**: `docs/runbooks/BACKUP-STATUS.md` + +--- + ## 📍 當前狀態 (2026-04-05 Phase 25 首席架構師 Review R2 通過 + ADR-054~057 完成) | 項目 | 狀態 | Commit | diff --git a/docs/runbooks/BACKUP-STATUS.md b/docs/runbooks/BACKUP-STATUS.md new file mode 100644 index 00000000..6eb434dd --- /dev/null +++ b/docs/runbooks/BACKUP-STATUS.md @@ -0,0 +1,150 @@ +# BACKUP-STATUS.md — 備份狀態總覽 + +> 2026-04-05 Claude Code: 首席架構師完整盤點後建立 +> 觸發原因:Gitea DB 損壞事故(人工修復耗時 2+ 小時)+ Velero BSL Unavailable(MinIO 未啟動) + +--- + +## 備份全景圖 + +| 資料類型 | 主機 | 工具 | 排程 | 狀態 | 保留 | +|---------|------|------|------|------|------| +| K8s 資源 (全命名空間) | 188 | Velero + MinIO | 每日 02:00 | ✅ 正常 | 7 天 | +| MOMO PostgreSQL | 188 | backup-momo-db.sh | 每日 04:00 | ✅ 正常 | 30 天 | +| ClawBot/SignOz env | 188 | backup-env.sh | 每日 03:00 | ✅ 正常 | - | +| **AWOOOI PostgreSQL** | **188** | **backup-awoooi-db.sh** | **每日 02:00** | **⚠️ 待部署** | **30 天** | +| **Gitea SQLite DB** | **110** | **backup-gitea-db.sh** | **每日 01:00** | **⚠️ 待部署** | **30 天** | +| K3s Kine (k3s_datastore) | 188 | backup-awoooi-db.sh | 每日 02:00 | ⚠️ 待部署 | 30 天 | + +--- + +## 各備份詳情 + +### ✅ Velero K8s 資源備份 + +``` +Schedule: daily-backup (0 2 * * *) +Storage: MinIO @ 192.168.0.188:9000 (bucket: velero) +Status: 8 backups on record, BSL Available (確認 2026-04-05) +``` + +**注意**: MinIO 需在 188 開機後自動啟動。已加入 `awoooi-startup.sh` (commit c0c903d, 2026-04-05)。 + +**MinIO 檢查**: +```bash +ssh ollama@192.168.0.188 "docker ps | grep minio" +kubectl get backupstoragelocation -n velero +``` + +--- + +### ✅ MOMO PostgreSQL + +``` +Script: /home/ollama/scripts/backup-momo-db.sh +Cron: 0 4 * * * +Output: /home/ollama/backups/momo/momo_db_YYYY-MM-DD_HH-MM.sql.gz +Retention: 30 days +``` + +--- + +### ⚠️ AWOOOI PostgreSQL(待部署) + +**資料庫清單(重要性順序)**: +- `awoooi_prod` — 主要生產 DB (KB 知識庫、事故記錄、AutoRepair 決策、Drift 報告) +- `awoooi_dev` — 開發 DB +- `k3s_datastore` — K3s Kine 後端 + +**腳本已建立**:`scripts/backup/backup-awoooi-db.sh` + +**部署步驟** (需在 192.168.0.188 執行): +```bash +# 上傳腳本 +scp /Users/ogt/awoooi/scripts/backup/backup-awoooi-db.sh ollama@192.168.0.188:/home/ollama/scripts/ + +# 設定權限 +ssh ollama@192.168.0.188 "chmod +x /home/ollama/scripts/backup-awoooi-db.sh" + +# 建立 log 目錄 +ssh ollama@192.168.0.188 "mkdir -p /home/ollama/logs" + +# 加入 crontab +ssh ollama@192.168.0.188 "crontab -e" +# 加入: 0 2 * * * /home/ollama/scripts/backup-awoooi-db.sh >> /home/ollama/logs/backup-awoooi.log 2>&1 + +# 測試執行 +ssh ollama@192.168.0.188 "/home/ollama/scripts/backup-awoooi-db.sh" +``` + +**注意**: 腳本使用 `sudo -u postgres pg_dump`,需確認 sudoers 中有對應權限,或改用 peer auth。 + +--- + +### ⚠️ Gitea SQLite DB(待部署) + +**事故教訓**: 2026-04-05 Gitea DB 損壞,靠 `sqlite3 .recover` 才救回,人工修復耗時 2+ 小時。 + +**DB 路徑**: `/home/wooo/gitea/gitea_data/gitea/gitea.db`(on 192.168.0.110) + +**腳本已建立**:`scripts/backup/backup-gitea-db.sh` + +**部署步驟** (需在 192.168.0.110 執行): +```bash +# 上傳腳本 +scp /Users/ogt/awoooi/scripts/backup/backup-gitea-db.sh wooo@192.168.0.110:/home/wooo/scripts/ + +# 設定權限 +ssh wooo@192.168.0.110 "chmod +x /home/wooo/scripts/backup-gitea-db.sh && mkdir -p /home/wooo/logs /home/wooo/backups/gitea" + +# 加入 crontab (wooo@110) +ssh wooo@192.168.0.110 "crontab -e" +# 加入: 0 1 * * * /home/wooo/scripts/backup-gitea-db.sh >> /home/wooo/logs/backup-gitea.log 2>&1 + +# 測試執行(需要 sqlite3 已安裝) +ssh wooo@192.168.0.110 "sqlite3 --version && /home/wooo/scripts/backup-gitea-db.sh" +``` + +**sqlite3 安裝(若未安裝)**: +```bash +ssh wooo@192.168.0.110 "sudo apt install -y sqlite3" +``` + +--- + +## 備份驗證 SOP + +每月第一個週一驗證: + +```bash +# 1. Velero +kubectl get backup -n velero --sort-by=.metadata.creationTimestamp | tail -3 + +# 2. MOMO DB +ssh ollama@192.168.0.188 "ls -lh /home/ollama/backups/momo/ | tail -3" + +# 3. AWOOOI DB +ssh ollama@192.168.0.188 "ls -lh /home/ollama/backups/awoooi/ | tail -3" + +# 4. Gitea DB +ssh wooo@192.168.0.110 "ls -lh /home/wooo/backups/gitea/ | tail -3" +``` + +--- + +## 缺口風險評估 + +| 缺口 | 資料損失風險 | 優先級 | +|------|------------|--------| +| AWOOOI PostgreSQL 無備份 | 🔴 極高 — 知識庫/事故/AutoRepair 全損 | P0 | +| Gitea DB 無自動備份 | 🔴 高 — 今日已發生損壞,靠人工修復 | P0 | + +--- + +## 相關文件 + +- [REBOOT-RECOVERY-SOP.md](REBOOT-RECOVERY-SOP.md) - 重開機恢復 SOP +- [SECRETS-MANAGEMENT.md](SECRETS-MANAGEMENT.md) - Secrets 管理 +- `scripts/backup/backup-awoooi-db.sh` - AWOOOI DB 備份腳本 +- `scripts/backup/backup-gitea-db.sh` - Gitea DB 備份腳本 +- `scripts/reboot-recovery/awoooi-startup.sh` - 開機自動恢復(含 MinIO) diff --git a/scripts/backup/backup-awoooi-db.sh b/scripts/backup/backup-awoooi-db.sh new file mode 100644 index 00000000..11972414 --- /dev/null +++ b/scripts/backup/backup-awoooi-db.sh @@ -0,0 +1,39 @@ +#!/bin/bash +# ============================================================================= +# AWOOOI PostgreSQL 每日備份腳本 +# 2026-04-05 Claude Code: 盤點後發現 awoooi DB 無備份,依循 backup-momo-db.sh 規範建立 +# 部署位置: /home/ollama/scripts/backup-awoooi-db.sh (on 192.168.0.188) +# cron: 0 2 * * * /home/ollama/scripts/backup-awoooi-db.sh >> /home/ollama/logs/backup-awoooi.log 2>&1 +# ============================================================================= + +set -e + +BACKUP_DIR="/home/ollama/backups/awoooi" +DATE=$(date +%Y-%m-%d_%H-%M) +RETENTION_DAYS=30 + +mkdir -p "$BACKUP_DIR" + +echo "[$(date)] 開始 AWOOOI DB 備份..." + +# awoooi_prod — 主要生產資料庫(知識庫、事故、AutoRepair 決策等) +sudo -u postgres pg_dump awoooi_prod | gzip > "$BACKUP_DIR/awoooi_prod_${DATE}.sql.gz" +echo "[$(date)] ✅ awoooi_prod 備份完成" + +# awoooi_dev — 開發資料庫 +sudo -u postgres pg_dump awoooi_dev 2>/dev/null | gzip > "$BACKUP_DIR/awoooi_dev_${DATE}.sql.gz" || \ + echo "[$(date)] ⚠️ awoooi_dev 備份跳過(可能不存在)" + +# k3s_datastore — K3s Kine 資料庫 +sudo -u postgres pg_dump k3s_datastore 2>/dev/null | gzip > "$BACKUP_DIR/k3s_datastore_${DATE}.sql.gz" || \ + echo "[$(date)] ⚠️ k3s_datastore 備份跳過" + +# 清理舊備份(保留 30 天) +find "$BACKUP_DIR" -name "*.sql.gz" -mtime +"$RETENTION_DAYS" -delete +echo "[$(date)] 清理超過 ${RETENTION_DAYS} 天的舊備份完成" + +# 列出最新備份 +echo "[$(date)] 目前備份清單:" +ls -lh "$BACKUP_DIR"/*.sql.gz 2>/dev/null | tail -10 + +echo "[$(date)] AWOOOI DB 備份完成!" diff --git a/scripts/backup/backup-gitea-db.sh b/scripts/backup/backup-gitea-db.sh new file mode 100644 index 00000000..551de221 --- /dev/null +++ b/scripts/backup/backup-gitea-db.sh @@ -0,0 +1,42 @@ +#!/bin/bash +# ============================================================================= +# Gitea SQLite 每日備份腳本 +# 2026-04-05 Claude Code: 盤點後發現 Gitea DB 無備份,今日已發生損壞事故 +# 部署位置: /home/wooo/scripts/backup-gitea-db.sh (on 192.168.0.110) +# cron (wooo@110): 0 1 * * * /home/wooo/scripts/backup-gitea-db.sh >> /home/wooo/logs/backup-gitea.log 2>&1 +# 教訓: 2026-04-05 Gitea DB 損壞,靠 sqlite3 .recover 才救回 — 人工修復耗時 2+ 小時 +# ============================================================================= + +set -e + +GITEA_DB="/home/wooo/gitea/gitea_data/gitea/gitea.db" +BACKUP_DIR="/home/wooo/backups/gitea" +DATE=$(date +%Y-%m-%d_%H-%M) +RETENTION_DAYS=30 + +mkdir -p "$BACKUP_DIR" + +echo "[$(date)] 開始 Gitea DB 備份..." + +if [ ! -f "$GITEA_DB" ]; then + echo "[$(date)] ❌ Gitea DB 不存在: $GITEA_DB" + exit 1 +fi + +# SQLite online backup(不需要停止 Gitea) +sqlite3 "$GITEA_DB" ".backup '$BACKUP_DIR/gitea_${DATE}.db'" +echo "[$(date)] ✅ Gitea DB 備份完成" + +# 壓縮備份 +gzip "$BACKUP_DIR/gitea_${DATE}.db" +echo "[$(date)] ✅ 壓縮完成: gitea_${DATE}.db.gz" + +# 清理舊備份(保留 30 天) +find "$BACKUP_DIR" -name "gitea_*.db.gz" -mtime +"$RETENTION_DAYS" -delete +echo "[$(date)] 清理超過 ${RETENTION_DAYS} 天的舊備份完成" + +# 列出最新備份 +echo "[$(date)] 目前備份清單:" +ls -lh "$BACKUP_DIR"/gitea_*.db.gz 2>/dev/null | tail -5 + +echo "[$(date)] Gitea DB 備份完成!"