ewoooc/docs/guides/external_professional_benchmark.md

# 外部專業做法 Benchmark

> 用途：定期把外部電商、商品資料與 UX 專業做法轉成 EwoooC / MOMO Pro 的可執行產品準則。

## 固定節奏

- 每週一 09:30 執行外部 benchmark，自動輸出可落地建議。
- 只採用能改善核心價值的做法：商品身份比對準確率、可用比價覆蓋率、價格新鮮度、人工覆核效率、競品情報決策品質。
- 外部資料必須保留來源、讀取日期、觀察結論與不採用原因。

## 2026-06-02 初始觀察

### 1. 商品 identity 必須優先吃結構化 identifiers

Google Merchant Center 的商品資料規格把 `id`、`brand`、`gtin`、`mpn`、`price`、`availability` 視為商品資料核心；Schema.org / Google Product structured data 也把 `Product`、`Offer`、`AggregateOffer`、`sku`、`gtin`、`brand`、`price`、`availability` 放在商品與報價語意中心。

落地到本產品：

- 比對引擎不能只靠商品名稱 token；應逐步建立 `identity_evidence` 欄位，分層保存品牌、SKU、GTIN/條碼、MPN/型號、容量、入數、色號、香味、款式。
- 若雙方有 GTIN / MPN / 明確型號，應優先作為 strong evidence。
- 若缺 GTIN / MPN，不得自動推定同款；要清楚標示 `identifier_missing` 或 `identifier_weak`。

### 2. 價格可用性必須和 freshness 綁在一起

Google Merchant Center 要求價格與庫存狀態要和 landing page / checkout 保持一致；Schema.org Offer 也有 `price`、`priceCurrency`、`availability` 等報價欄位。

落地到本產品：

- `decision_ready` 只能計入明確未過期價格，不應把未知 freshness 當可決策。
- Dashboard 必須拆開 identity coverage、fresh price coverage、pending identity、stale identity。
- 目前 V10.549-V10.565 的方向正確：未知新鮮度不得灌高覆蓋率，並要進刷新/救援流程。

### 3. 多 offer / 多平台比價應該呈現為 offer evidence，不只是單一低價

Schema.org `AggregateOffer` 用於同一商品對應多個商家 offer。這個概念適合我們把 MOMO / PChome 的同款證據與價格證據分開保存。

落地到本產品：

- `competitor_prices` 應逐步從單一 match row，演進成「identity pair + offer snapshot」兩層。
- PPT / AI 決策不只顯示價差，也要顯示 identity confidence、freshness、offer source、last crawled、manual review state。

### 4. Product comparison UX 要讓使用者比較規格差異

Baymard 的商品頁與比較 UX 研究強調：使用者需要清楚的 product comparison，尤其是規格驅動品類。

落地到本產品：

- 人工覆核頁不能只列 MOMO/PChome 名稱與價格；要突出「不一致欄位」：色號、香味、容量、入數、套組、任選、效期、航空版。
- 對 `identity_veto` / `true_low_confidence` 要顯示人可以理解的原因，不只顯示 `待審`。
- Dashboard 建議下一步要直接連到對應操作：刷新、補抓、重評、單位價覆核、人工覆核。

## 目前不採用

- 不採用「只靠低價/高相似度自動配對」：價格相近不是 identity evidence。
- 不採用「大量放寬 threshold 來拉覆蓋率」：會污染核心比價資料。
- 不採用「把外部網站 UI 風格直接照搬」：只吸收資訊架構、證據呈現與工作流做法。

## 2026-07-02 AI automation dashboard benchmark

### 來源觀察

- Grafana dashboard best practices 強調 methodical dashboards、分層下鑽、alerts 導向 dashboard、dashboard/panel 說明與版本化 dashboard JSON。
- Datadog dashboards 強調即時掌握系統健康、KPI、趨勢、異常、優先處理與根因診斷。
- New Relic golden signals dashboard 強調用少數核心訊號快速掌握服務健康，並用 template variables 動態篩選。
- Atlassian Statuspage / incident communication 強調狀態溝通、事件自動化與使用者可理解的狀態更新。

### 落地到 PChome AI automation dashboard

- 狀態分層: 第一視窗必須能用 `success / warning / danger / neutral` 呈現健康、等待、需處理、已完成，不把所有狀態混成同一種卡片。
- 下一步優先: 第一視窗摘要必須直接顯示下一個機器動作；raw package、endpoint、artifact hash 放在 API / evidence 層。
- 證據按需: 產品畫面顯示「回讀、異動、留存、資料寫入」等營運語；receipt、hash、artifact、DB table 名稱只留在 detailed readback 與 tests。
- Golden signals: AI automation 第一視窗至少要有四個核心訊號：已自動落地、已驗證、異動狀態、下一步。
- Dashboard-as-code: benchmark 結論必須進 tests；`tests/test_pchome_dashboard_benchmark_guardrails.py` 是 PChome AI dashboard benchmark guard。
- Surface rollout: `/ai_intelligence` 與 `/observability/overview` 已套用相同 golden-signal guardrails；`tests/test_ai_surface_benchmark_guardrails.py` 鎖住兩頁首屏的狀態分層、下一步優先與 evidence-on-demand 語言。
- AI Agent surface rollout: `/observability/agent_orchestration` 已套用 compact Agent workbench guardrails；`tests/test_agent_orchestration_text_density_guardrails.py` 鎖住首屏短標籤、核心訊號與 hidden explanatory copy。
- AI traffic surface rollout: `/observability/ai_calls` 已套用 compact AI traffic workbench guardrails；`tests/test_ai_calls_text_density_guardrails.py` 鎖住首屏短標籤、成本/錯誤/知識核心訊號與 hidden explanatory copy。
- AI quality surface rollout: `/observability/quality_trend` 已套用 compact AI quality workbench guardrails；`tests/test_quality_trend_text_density_guardrails.py` 鎖住首屏短標籤、品質/知識/行動成效核心訊號與 hidden explanatory copy。
- AI cost surface rollout: `/observability/budget` 已套用 compact AI cost workbench guardrails；`tests/test_budget_text_density_guardrails.py` 鎖住首屏短標籤、成本/預警/節流核心訊號與 hidden explanatory copy。
- AI business surface rollout: `/observability/business_intel` 已套用 compact AI business workbench guardrails；`tests/test_business_intel_text_density_guardrails.py` 鎖住首屏短標籤、商業戰果/閉環/競品核心訊號與 hidden explanatory copy。
- AI runtime surface rollout: `/observability/host_health` 已套用 compact AI runtime workbench guardrails；`tests/test_host_health_text_density_guardrails.py` 鎖住首屏短標籤、主機級聯/自癒/節流核心訊號與 hidden explanatory copy。
- AI knowledge surface rollout: `/observability/rag_queries` 已套用 compact AI knowledge workbench guardrails；`tests/test_rag_queries_text_density_guardrails.py` 鎖住首屏短標籤、知識命中/省模閉環/回饋學習核心訊號與 hidden explanatory copy。
- AI promotion exception surface rollout: `/observability/promotion_review` 已套用 compact AI promotion workbench guardrails；`tests/test_promotion_review_text_density_guardrails.py` 鎖住首屏短標籤、AI 例外/去重守門/防污染核心訊號與 non-manual wording。
- AI visual QA surface rollout: `/observability/ppt_audit_history` 已套用 compact AI visual QA workbench guardrails；`tests/test_ppt_audit_text_density_guardrails.py` 鎖住首屏短標籤、預覽/審核/修復核心訊號與 non-raw env/table wording。
- Route HTML readback rollout: high-visibility AI observability surfaces 已新增 Flask route-level rendered HTML guard；`tests/test_admin_observability_routes.py::test_high_visibility_ai_surfaces_route_html_readback_keeps_compact_contract` 鎖住 10 個 route 的 compact density marker 與 non-raw engineering wording。
- Runtime HTML readback rollout: `/api/ai-automation/surface-html-readback` 與 AI smoke 已接入 high-visibility AI surface contract readback；`tests/test_ai_automation_smoke_service.py` 鎖住 10 個 surface、退化偵測、smoke check 與 scheduled health family。
- Sitewide UI/UX Agent rollout: `/api/ai-automation/sitewide-ui-ux-agent` 掃描真正頁面模板並產生整站專業化 inventory；`/api/ai-automation/sitewide-ui-ux-repair-package` 會輸出 no-write controlled repair package，把主流產品頁原則轉成可排序、可驗證的修復項。四批 controlled repair 已把 AI 建議、PPT 預覽、服務更新監控、當日業績、商品/EDM 舊入口、成長報表、PChome 商品監控、比價決策台、業績作戰分析與供貨風險頁群納入 compact workbench guardrails，Agent 基線收斂到 29 個 compact guardrail / 0 個待專業化 surface。
- Sitewide visual QA rollout: `scripts/check_responsive_overflow.js --artifact-output data/ai_automation/sitewide_visual_qa_latest.json` 會把 desktop / tablet / mobile overflow QA 輸出為 artifact；`/api/ai-automation/sitewide-visual-qa-readback`、`Sitewide visual QA readback` 與 `sitewide_visual_qa` scheduled family 會把最新視覺 QA 狀態納入 AI automation monitoring。

## 下一步 TODO 候選

1. 建立 `identity_evidence` 正規化 payload，讓 matcher 回傳 identifier/spec/variant evidence。
2. 在覆核頁新增差異高亮：色號、香味、容量、入數、任選、效期、來源新鮮度。
3. 將 PPT / AI payload 的比價項目拆成 identity evidence 與 offer evidence。
4. 每週 benchmark 結果若命中上述 TODO，回寫 `TODO_NEXT_STEPS.txt` 或新增 ADR / memory。
5. 將 PChome AI automation benchmark guardrails 套到後續 AI Agent surfaces 與每條 safe automation lane 的 first-viewport summary。
   - 已完成: `/api/ai/pchome-growth/ai-automation-surface-summary` 以 `golden_signals` 固化「已自動落地、已驗證、異動狀態、下一步」。
   - 已完成: `/ai_intelligence` 首屏直接消費 surface summary，raw receipt / hash / DB table / endpoint 細節留在 evidence-on-demand 層。
   - 已完成: `/observability/agent_orchestration` 首屏以「AI 分工 / 成本守門 / 知識命中」短標籤與四個核心數字呈現 Agent 編排狀態。
   - 已完成: `/observability/ai_calls` 首屏以「流量監控 / 成本守門 / 知識命中」短標籤與六個核心數字呈現 AI 呼叫、成本、錯誤與知識狀態。
   - 已完成: `/observability/quality_trend` 首屏以「品質回饋 / 知識可靠 / 行動成效」短標籤與四個核心數字呈現 AI 建議可靠性。
   - 已完成: `/observability/budget` 首屏以「成本守門 / 節流狀態 / 知識策略」短標籤與四個核心數字呈現 AI 成本治理狀態。
   - 已完成: `/observability/business_intel` 首屏以「戰果追蹤 / 閉環成效 / 競品訊號」短標籤與四個核心數字呈現 AI 商業轉化狀態。
   - 已完成: `/observability/host_health` 首屏以「主機級聯 / 自癒閉環 / 成本節流」短標籤與四個核心數字呈現 AI runtime 健康狀態。
   - 已完成: `/observability/rag_queries` 首屏以「知識命中 / 省模閉環 / 回饋學習」短標籤與四個核心數字呈現 AI 知識召回狀態。
   - 已完成: `/observability/promotion_review` 首屏以「AI 例外 / 去重守門 / 防污染」短標籤與四個核心數字呈現 AI 晉升例外狀態。
   - 已完成: `/observability/ppt_audit_history` 首屏以「預覽就緒 / 視覺審核 / 修復閉環」短標籤呈現 AI 視覺 QA 產線狀態。
   - 已完成: Sitewide visual QA artifact readback 會把 desktop / tablet / mobile overflow 結果接入 AI smoke、scheduled health summary 與 `/metrics`。

## 參考來源

- Google Merchant Center Product data specification: https://support.google.com/merchants/answer/7052112
- Google Search Central Product structured data: https://developers.google.com/search/docs/appearance/structured-data/product
- Google SRE The Four Golden Signals: https://sre.google/sre-book/monitoring-distributed-systems/
- Schema.org Product / Offer / AggregateOffer: https://schema.org/Product, https://schema.org/Offer, https://schema.org/AggregateOffer
- Baymard Product Page UX Best Practices: https://baymard.com/blog/current-state-ecommerce-product-page-ux
- Baymard Product Comparison UX: https://baymard.com/blog/provide-comparison-features
- Grafana Dashboard best practices: https://grafana.com/docs/grafana/latest/visualizations/dashboards/build-dashboards/best-practices/
- Datadog Dashboards: https://docs.datadoghq.com/dashboards/
- New Relic Golden Signals dashboard: https://newrelic.com/instant-observability/golden-signals-dashboard-for-new-relic
- Atlassian Statuspage user guide: https://support.atlassian.com/statuspage/docs/read-the-statuspage-user-guide/