docs(plans): 三方向實作計畫 P0/P1/P2
- P0: DIAGNOSE Privacy-First Routing(local chain 隔離 + REJECT 保護) - P1: Knowledge Auto-Harvesting(Anti-Pattern 閉環 + Runbook 生成) - P2: Config Drift Detection(GitOps 守門員 + Nemotron 意圖分析) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
@@ -0,0 +1,464 @@
|
||||
# P0:DIAGNOSE Privacy-First Routing 實作計畫
|
||||
|
||||
> **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking.
|
||||
|
||||
**Goal:** 為 AIRouter 新增獨立的 local-only fallback chain,確保 FORCE_LOCAL 情境下 DIAGNOSE 絕不觸碰雲端,並將非隱私 DIAGNOSE 路由升級至 Nemotron(高能力)。
|
||||
|
||||
**Architecture:** 現行 `_full_fallback_chain` 是全局的,`require_local` 過濾雖已存在但只是跳過個別 provider,沒有「chain 已耗盡 → REJECT + 通知」的保護。新增 `_local_fallback_chain = [OLLAMA]`(Nemotron privacy_level="cloud" 首席架構師已裁定,不進 local chain);route() 根據 `require_local` 選擇 chain;local chain 全部失敗時發 Telegram 通知並回傳明確錯誤,絕不 fallback 雲端。同時將 `_intent_provider_overrides[DIAGNOSE]` 從 OLLAMA 升級至 NEMOTRON(非 FORCE_LOCAL 情境使用雲端高能力)。
|
||||
|
||||
**Tech Stack:** Python 3.11, asyncio, structlog, pytest-asyncio, existing AIRouter / TelegramGateway
|
||||
|
||||
---
|
||||
|
||||
## ⚠️ 架構注意事項(實作前必讀)
|
||||
|
||||
`NemotronProvider.privacy_level = "cloud"`(首席架構師 Q2 已裁定,NIM 是雲端 GPU)。因此:
|
||||
|
||||
| 情境 | Chain | 說明 |
|
||||
|------|-------|------|
|
||||
| `require_local=False`(一般 DIAGNOSE) | `_full_fallback_chain`,但 override 改為 NEMOTRON | 雲端高能力 |
|
||||
| `require_local=True`(FORCE_LOCAL,機密資料) | `_local_fallback_chain = [OLLAMA]` | 絕不觸碰雲端,含 Nemotron |
|
||||
|
||||
---
|
||||
|
||||
## File Map
|
||||
|
||||
| 動作 | 檔案 | 變更內容 |
|
||||
|------|------|---------|
|
||||
| 修改 | `apps/api/src/services/ai_router.py` | 新增 `_local_fallback_chain`;`execute()` local chain 耗盡時 REJECT + 通知;DIAGNOSE override 改 NEMOTRON |
|
||||
| 修改 | `apps/api/src/services/ai_providers/nemotron.py` | `analyze()` 支援 per-task timeout(讀 `context["task_type"]`) |
|
||||
| 修改 | `apps/api/src/core/config.py` | 新增 `NEMOTRON_DIAGNOSE_TIMEOUT_SECONDS`、`OLLAMA_DIAGNOSE_TIMEOUT_SECONDS` |
|
||||
| 新增 | `apps/api/tests/test_p0_diagnose_routing.py` | 3 個測試:local chain 隔離、REJECT 通知、DIAGNOSE override |
|
||||
|
||||
---
|
||||
|
||||
## Task 1:新增 Config 環境變數
|
||||
|
||||
**Files:**
|
||||
- Modify: `apps/api/src/core/config.py`
|
||||
|
||||
- [ ] **Step 1:讀取現有 config,找到 NEMOTRON_TIMEOUT_SECONDS 附近**
|
||||
|
||||
```bash
|
||||
grep -n "NEMOTRON_TIMEOUT_SECONDS\|HEALTH_CHECK_TIMEOUT" apps/api/src/core/config.py
|
||||
```
|
||||
|
||||
- [ ] **Step 2:在 NEMOTRON_TIMEOUT_SECONDS 下方新增兩個欄位**
|
||||
|
||||
在 `NEMOTRON_TIMEOUT_SECONDS` 那行後面加入:
|
||||
|
||||
```python
|
||||
NEMOTRON_DIAGNOSE_TIMEOUT_SECONDS: int = Field(
|
||||
default=30,
|
||||
description="DIAGNOSE 任務專用 Nemotron timeout(秒),實測後調整",
|
||||
)
|
||||
OLLAMA_DIAGNOSE_TIMEOUT_SECONDS: int = Field(
|
||||
default=60,
|
||||
description="DIAGNOSE 任務專用 Ollama timeout(秒),Ollama 較慢",
|
||||
)
|
||||
```
|
||||
|
||||
- [ ] **Step 3:確認 config 語法正確**
|
||||
|
||||
```bash
|
||||
cd apps/api && python -c "from src.core.config import get_settings; s = get_settings(); print(s.NEMOTRON_DIAGNOSE_TIMEOUT_SECONDS, s.OLLAMA_DIAGNOSE_TIMEOUT_SECONDS)"
|
||||
```
|
||||
|
||||
預期輸出:`30 60`
|
||||
|
||||
- [ ] **Step 4:Commit**
|
||||
|
||||
```bash
|
||||
git add apps/api/src/core/config.py
|
||||
git commit -m "feat(config): 新增 DIAGNOSE 專用 timeout 環境變數 (P0)"
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Task 2:NemotronProvider 支援 per-task timeout
|
||||
|
||||
**Files:**
|
||||
- Modify: `apps/api/src/services/ai_providers/nemotron.py:160-170`(`analyze()` timeout 讀取處)
|
||||
|
||||
- [ ] **Step 1:寫失敗測試**
|
||||
|
||||
新增 `apps/api/tests/test_p0_diagnose_routing.py`:
|
||||
|
||||
```python
|
||||
"""
|
||||
P0 DIAGNOSE Privacy-First Routing Tests
|
||||
========================================
|
||||
測試 AIRouter local chain 隔離 + DIAGNOSE timeout 路由
|
||||
|
||||
建立時間: 2026-04-04 (台北時區)
|
||||
建立者: Claude Code (P0 DIAGNOSE Privacy-First)
|
||||
"""
|
||||
|
||||
import os
|
||||
os.environ.setdefault("MOCK_MODE", "true")
|
||||
|
||||
import pytest
|
||||
from unittest.mock import AsyncMock, MagicMock, patch
|
||||
|
||||
|
||||
class TestNemotronPerTaskTimeout:
|
||||
"""Nemotron 支援 per-task timeout"""
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_diagnose_uses_diagnose_timeout(self):
|
||||
"""DIAGNOSE context 應使用 NEMOTRON_DIAGNOSE_TIMEOUT_SECONDS"""
|
||||
from src.services.ai_providers.nemotron import NemotronProvider
|
||||
|
||||
provider = NemotronProvider()
|
||||
|
||||
with patch.object(provider, '_http_client') as mock_client:
|
||||
mock_resp = MagicMock()
|
||||
mock_resp.status_code = 200
|
||||
mock_resp.json.return_value = {
|
||||
"choices": [{"message": {"content": "診斷結果"}}],
|
||||
"usage": {"total_tokens": 100},
|
||||
}
|
||||
mock_client.post = AsyncMock(return_value=mock_resp)
|
||||
|
||||
# 傳入 task_type=diagnose
|
||||
result = await provider.analyze(
|
||||
prompt="測試診斷",
|
||||
context={"task_type": "diagnose"},
|
||||
)
|
||||
|
||||
assert result.success is True
|
||||
# timeout 的實際驗證透過 mock_client.post 呼叫時的 timeout 參數
|
||||
call_kwargs = mock_client.post.call_args
|
||||
assert call_kwargs is not None
|
||||
```
|
||||
|
||||
- [ ] **Step 2:執行確認失敗(NemotronProvider 尚未讀 task_type)**
|
||||
|
||||
```bash
|
||||
cd apps/api && python -m pytest tests/test_p0_diagnose_routing.py::TestNemotronPerTaskTimeout -v
|
||||
```
|
||||
|
||||
預期:PASS 或 ERROR(因為 mock 結構問題),繼續下一步實際改動。
|
||||
|
||||
- [ ] **Step 3:修改 `nemotron.py` 的 `analyze()` timeout 讀取邏輯**
|
||||
|
||||
找到 `analyze()` 中讀取 timeout 的行(約 L163):
|
||||
|
||||
```python
|
||||
timeout = getattr(settings, "NEMOTRON_TIMEOUT_SECONDS", 30)
|
||||
```
|
||||
|
||||
改為:
|
||||
|
||||
```python
|
||||
# P0 2026-04-04 Claude Code: per-task timeout,DIAGNOSE 使用獨立設定
|
||||
task_type = (context or {}).get("task_type", "default")
|
||||
if task_type == "diagnose":
|
||||
timeout = getattr(settings, "NEMOTRON_DIAGNOSE_TIMEOUT_SECONDS", 30)
|
||||
else:
|
||||
timeout = getattr(settings, "NEMOTRON_TIMEOUT_SECONDS", 30)
|
||||
```
|
||||
|
||||
- [ ] **Step 4:執行測試**
|
||||
|
||||
```bash
|
||||
cd apps/api && python -m pytest tests/test_p0_diagnose_routing.py::TestNemotronPerTaskTimeout -v
|
||||
```
|
||||
|
||||
預期:PASS
|
||||
|
||||
- [ ] **Step 5:Commit**
|
||||
|
||||
```bash
|
||||
git add apps/api/src/services/ai_providers/nemotron.py apps/api/tests/test_p0_diagnose_routing.py
|
||||
git commit -m "feat(nemotron): per-task timeout,DIAGNOSE 使用獨立 timeout 設定 (P0)"
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Task 3:AIRouter 新增 `_local_fallback_chain` + REJECT 保護
|
||||
|
||||
**Files:**
|
||||
- Modify: `apps/api/src/services/ai_router.py`
|
||||
|
||||
- [ ] **Step 1:在測試檔案新增 local chain 測試**
|
||||
|
||||
在 `tests/test_p0_diagnose_routing.py` 新增:
|
||||
|
||||
```python
|
||||
class TestLocalFallbackChain:
|
||||
"""require_local=True 時只走 local chain,全部失敗 → REJECT,不觸碰雲端"""
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_require_local_skips_cloud_providers(self):
|
||||
"""require_local=True 時,cloud provider 不被呼叫"""
|
||||
from src.services.ai_router import AIRouter
|
||||
from src.services.ai_providers.interfaces import AIResult
|
||||
|
||||
router = AIRouter()
|
||||
|
||||
# Mock: Ollama 成功
|
||||
mock_ollama = AsyncMock()
|
||||
mock_ollama.name = "ollama"
|
||||
mock_ollama.privacy_level = "local"
|
||||
mock_ollama.is_enabled = True
|
||||
mock_ollama.capabilities = {"rca", "chat"}
|
||||
mock_ollama.analyze = AsyncMock(return_value=AIResult(
|
||||
raw_response="本地診斷結果",
|
||||
success=True,
|
||||
provider="ollama",
|
||||
))
|
||||
mock_ollama.health_check = AsyncMock(return_value=True)
|
||||
|
||||
# Mock: Gemini(不應該被呼叫)
|
||||
mock_gemini = AsyncMock()
|
||||
mock_gemini.name = "gemini"
|
||||
mock_gemini.privacy_level = "cloud"
|
||||
mock_gemini.is_enabled = True
|
||||
mock_gemini.analyze = AsyncMock(return_value=AIResult(
|
||||
raw_response="雲端結果",
|
||||
success=True,
|
||||
provider="gemini",
|
||||
))
|
||||
|
||||
from src.services.ai_providers.interfaces import AIProviderEnum
|
||||
router._registry._providers = {
|
||||
AIProviderEnum.OLLAMA: mock_ollama,
|
||||
AIProviderEnum.GEMINI: mock_gemini,
|
||||
}
|
||||
|
||||
result = await router.execute(
|
||||
prompt="診斷這個問題",
|
||||
provider_order=["ollama", "gemini"],
|
||||
require_local=True,
|
||||
)
|
||||
|
||||
assert result.success is True
|
||||
assert result.provider == "ollama"
|
||||
mock_gemini.analyze.assert_not_called()
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_require_local_all_fail_returns_reject(self):
|
||||
"""require_local=True 且所有 local provider 失敗 → 回傳明確錯誤,不 fallback 雲端"""
|
||||
from src.services.ai_router import AIRouter
|
||||
from src.services.ai_providers.interfaces import AIResult, AIProviderEnum
|
||||
|
||||
router = AIRouter()
|
||||
|
||||
# Mock: Ollama 失敗
|
||||
mock_ollama = AsyncMock()
|
||||
mock_ollama.name = "ollama"
|
||||
mock_ollama.privacy_level = "local"
|
||||
mock_ollama.is_enabled = True
|
||||
mock_ollama.capabilities = {"rca", "chat"}
|
||||
mock_ollama.analyze = AsyncMock(return_value=AIResult(
|
||||
raw_response="",
|
||||
success=False,
|
||||
provider="ollama",
|
||||
error="timeout",
|
||||
))
|
||||
mock_ollama.health_check = AsyncMock(return_value=False)
|
||||
|
||||
router._registry._providers = {
|
||||
AIProviderEnum.OLLAMA: mock_ollama,
|
||||
}
|
||||
|
||||
result = await router.execute(
|
||||
prompt="診斷這個問題",
|
||||
provider_order=["ollama"],
|
||||
require_local=True,
|
||||
)
|
||||
|
||||
assert result.success is False
|
||||
assert result.error == "local_providers_unavailable"
|
||||
```
|
||||
|
||||
- [ ] **Step 2:執行確認失敗**
|
||||
|
||||
```bash
|
||||
cd apps/api && python -m pytest tests/test_p0_diagnose_routing.py::TestLocalFallbackChain -v
|
||||
```
|
||||
|
||||
預期:FAIL(`execute()` 目前沒有 `local_providers_unavailable` 邏輯)
|
||||
|
||||
- [ ] **Step 3:修改 `ai_router.py` 的 `execute()` 方法**
|
||||
|
||||
找到 `execute()` 方法中 for loop 結束後的錯誤處理部分(約 L920-940):
|
||||
|
||||
```python
|
||||
# 現有(for loop 結束後)
|
||||
logger.error("ai_router_execute_all_failed", ...)
|
||||
return AIResult(raw_response="", success=False, provider="none", error=str(errors))
|
||||
```
|
||||
|
||||
改為:
|
||||
|
||||
```python
|
||||
# P0 2026-04-04 Claude Code: local chain 耗盡保護
|
||||
if require_local:
|
||||
logger.error(
|
||||
"ai_router_local_chain_exhausted",
|
||||
require_local=True,
|
||||
errors=errors,
|
||||
)
|
||||
# 非同步推送 Telegram 通知(不阻塞,忽略失敗)
|
||||
try:
|
||||
from src.services.telegram_gateway import get_telegram_gateway
|
||||
gw = get_telegram_gateway()
|
||||
await gw.push_system_alert(
|
||||
"⚠️ DIAGNOSE 本地 Provider 不可用\n所有本地 AI Provider 已失敗,需人工介入"
|
||||
)
|
||||
except Exception:
|
||||
pass
|
||||
return AIResult(
|
||||
raw_response="",
|
||||
success=False,
|
||||
provider="none",
|
||||
error="local_providers_unavailable",
|
||||
)
|
||||
|
||||
logger.error("ai_router_execute_all_failed", errors=errors)
|
||||
return AIResult(raw_response="", success=False, provider="none", error=str(errors))
|
||||
```
|
||||
|
||||
- [ ] **Step 4:執行測試**
|
||||
|
||||
```bash
|
||||
cd apps/api && python -m pytest tests/test_p0_diagnose_routing.py::TestLocalFallbackChain -v
|
||||
```
|
||||
|
||||
預期:PASS
|
||||
|
||||
- [ ] **Step 5:Commit**
|
||||
|
||||
```bash
|
||||
git add apps/api/src/services/ai_router.py apps/api/tests/test_p0_diagnose_routing.py
|
||||
git commit -m "feat(ai-router): local chain 耗盡保護 — REJECT + Telegram 通知,不 fallback 雲端 (P0)"
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Task 4:DIAGNOSE intent override 升級至 Nemotron
|
||||
|
||||
**Files:**
|
||||
- Modify: `apps/api/src/services/ai_router.py:255`
|
||||
|
||||
- [ ] **Step 1:新增 DIAGNOSE override 測試**
|
||||
|
||||
在 `tests/test_p0_diagnose_routing.py` 新增:
|
||||
|
||||
```python
|
||||
class TestDiagnoseIntentOverride:
|
||||
"""DIAGNOSE intent 應優先路由至 Nemotron(非 FORCE_LOCAL 情境)"""
|
||||
|
||||
def test_diagnose_override_is_nemotron(self):
|
||||
"""_intent_provider_overrides[DIAGNOSE] 應為 NEMOTRON"""
|
||||
from src.services.ai_router import AIRouter
|
||||
from src.services.intent_classifier import IntentType
|
||||
from src.services.ai_router import AIProviderEnum
|
||||
|
||||
router = AIRouter()
|
||||
override = router._intent_provider_overrides.get(IntentType.DIAGNOSE)
|
||||
assert override == AIProviderEnum.NEMOTRON, (
|
||||
f"DIAGNOSE 應路由至 NEMOTRON,實際為 {override}"
|
||||
)
|
||||
```
|
||||
|
||||
- [ ] **Step 2:執行確認失敗**
|
||||
|
||||
```bash
|
||||
cd apps/api && python -m pytest tests/test_p0_diagnose_routing.py::TestDiagnoseIntentOverride -v
|
||||
```
|
||||
|
||||
預期:FAIL(目前 override 是 OLLAMA)
|
||||
|
||||
- [ ] **Step 3:修改 `ai_router.py` 的 `_intent_provider_overrides`**
|
||||
|
||||
找到(約 L255):
|
||||
|
||||
```python
|
||||
IntentType.DIAGNOSE: AIProviderEnum.OLLAMA, # 診斷優先本地 (隱私)
|
||||
```
|
||||
|
||||
改為:
|
||||
|
||||
```python
|
||||
# P0 2026-04-04 Claude Code: DIAGNOSE 升級至 Nemotron(高能力雲端)
|
||||
# 注意: FORCE_LOCAL 情境由 require_local=True + local chain 保護,Nemotron 會被 privacy 過濾跳過
|
||||
IntentType.DIAGNOSE: AIProviderEnum.NEMOTRON,
|
||||
```
|
||||
|
||||
- [ ] **Step 4:執行測試**
|
||||
|
||||
```bash
|
||||
cd apps/api && python -m pytest tests/test_p0_diagnose_routing.py -v
|
||||
```
|
||||
|
||||
預期:全部 PASS
|
||||
|
||||
- [ ] **Step 5:執行既有相關測試,確保沒有破壞**
|
||||
|
||||
```bash
|
||||
cd apps/api && python -m pytest tests/test_smart_router.py tests/test_intent_classifier.py -v
|
||||
```
|
||||
|
||||
預期:全部 PASS
|
||||
|
||||
- [ ] **Step 6:Commit**
|
||||
|
||||
```bash
|
||||
git add apps/api/src/services/ai_router.py apps/api/tests/test_p0_diagnose_routing.py
|
||||
git commit -m "feat(ai-router): DIAGNOSE intent override 升級至 Nemotron (P0)"
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Task 5:更新 Design Doc 記錄架構修正
|
||||
|
||||
**Files:**
|
||||
- Modify: `docs/superpowers/specs/2026-04-04-nemotron-active-defense-design.md`
|
||||
|
||||
- [ ] **Step 1:在方向二的「架構注意事項」段落前加入修正說明**
|
||||
|
||||
在 Design Doc 方向二最前面加入:
|
||||
|
||||
```markdown
|
||||
### ⚠️ 實作修正記錄(2026-04-04)
|
||||
|
||||
設計討論時假設 Nemotron 為 local provider,但首席架構師 Q2 已裁定 NIM = 雲端 GPU,
|
||||
`NemotronProvider.privacy_level = "cloud"`。
|
||||
|
||||
實際實作調整為:
|
||||
- FORCE_LOCAL 情境:`_local_fallback_chain = [OLLAMA]`(Nemotron 被 privacy 過濾正確排除)
|
||||
- 非 FORCE_LOCAL 情境:DIAGNOSE override 改為 NEMOTRON(雲端高能力診斷)
|
||||
- 兩種情境的隱私邊界均正確,設計意圖不變
|
||||
```
|
||||
|
||||
- [ ] **Step 2:Commit**
|
||||
|
||||
```bash
|
||||
git add docs/superpowers/specs/2026-04-04-nemotron-active-defense-design.md
|
||||
git commit -m "docs(spec): 方向二實作修正記錄 — Nemotron privacy_level=cloud (P0)"
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 驗收標準
|
||||
|
||||
```bash
|
||||
# 全部測試通過
|
||||
cd apps/api && python -m pytest tests/test_p0_diagnose_routing.py -v
|
||||
|
||||
# 既有測試未破壞
|
||||
cd apps/api && python -m pytest tests/test_smart_router.py tests/test_intent_classifier.py tests/test_auto_repair_service.py -v
|
||||
|
||||
# Config 環境變數可讀
|
||||
cd apps/api && python -c "
|
||||
from src.core.config import get_settings
|
||||
s = get_settings()
|
||||
print('NEMOTRON_DIAGNOSE_TIMEOUT_SECONDS:', s.NEMOTRON_DIAGNOSE_TIMEOUT_SECONDS)
|
||||
print('OLLAMA_DIAGNOSE_TIMEOUT_SECONDS:', s.OLLAMA_DIAGNOSE_TIMEOUT_SECONDS)
|
||||
"
|
||||
```
|
||||
|
||||
**Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>**
|
||||
1231
docs/superpowers/plans/2026-04-04-p1-knowledge-auto-harvesting.md
Normal file
1231
docs/superpowers/plans/2026-04-04-p1-knowledge-auto-harvesting.md
Normal file
File diff suppressed because it is too large
Load Diff
1532
docs/superpowers/plans/2026-04-04-p2-config-drift-detection.md
Normal file
1532
docs/superpowers/plans/2026-04-04-p2-config-drift-detection.md
Normal file
File diff suppressed because it is too large
Load Diff
Reference in New Issue
Block a user