修改: "2026 Ubuntu \345\256\211\350\243\205.md"
新文件: "AI\346\212\200\346\234\257/cc-switch&claude-code.md" 重命名: hermes-agent.md -> "AI\346\212\200\346\234\257/hermes-agent.md" 删除: "AI\346\212\200\346\234\257/\345\207\206\345\244\207.md" 重命名: "\347\224\265\345\255\220\347\247\244\345\233\276\347\211\207\350\257\206\345\210\253\347\263\273\347\273\237\350\256\276\350\256\241.md" -> "AI\346\212\200\346\234\257/\347\224\265\345\255\220\347\247\244\345\233\276\347\211\207\350\257\206\345\210\253\347\263\273\347\273\237\350\256\276\350\256\241.md"
This commit is contained in:
@@ -0,0 +1,126 @@
|
||||
## CC Switch & Claude Code
|
||||
|
||||
## 一、CC Switch
|
||||
|
||||
先完成 CC Switch 安装
|
||||
|
||||
- 下载页面 https://www.ccswitch.io/zh/ 选择 [CC-Switch-v3.15.0-Linux-x86_64.deb ](https://github.com/farion1231/cc-switch/releases/download/v3.15.0/CC-Switch-v3.15.0-Linux-x86_64.deb)
|
||||
|
||||
- sudo dpkg -i CC-Switch-v3.15.0-Linux-x86_64.deb
|
||||
|
||||
- 进入 CC-Switch 配置 claude 供应商
|
||||
1. 选择 DouBaoSeed
|
||||
1. 输入 API Key :ark-8f9...
|
||||
1. 模型映射 - 获取模型列表(选择想用的,如 doubao-seed-2-0-code-preview-latest)
|
||||
1. 测试 & 保存 (CC Switch 会自动把正确的 base_url + api_key 写入:~/.claude/settings.json)
|
||||
|
||||
> 避免 Claude Code 进入官方校验
|
||||
>
|
||||
> ```bash
|
||||
> # 1. 退出 claude 并清所有相关变量
|
||||
> claude /logout
|
||||
> unset ANTHROPIC_AUTH_TOKEN ANTHROPIC_API_KEY ANTHROPIC_BASE_URL
|
||||
>
|
||||
> # 2. 检查:必须什么都不输出
|
||||
> env | grep ANTHROPIC
|
||||
>
|
||||
> # 3. 清空 Claude 配置(让 CC Switch 全权接管)
|
||||
> > ~/.claude/settings.json
|
||||
>
|
||||
> # 4. 关键:跳过官方首次校验(国内必做)
|
||||
> echo '{"hasCompletedOnboarding": true}' > ~/.claude.json
|
||||
> ```
|
||||
|
||||
## 二、CLAUDE CODE
|
||||
|
||||
Version: 2.1.152
|
||||
|
||||
```bash
|
||||
curl -fsSL https://claude.ai/install.sh | bash
|
||||
echo 'export PATH="$HOME/.local/bin:$PATH"' >> ~/.bashrc && source ~/.bashrc
|
||||
claude 运行 ( 理论上不会出现初始化配置选项,如 Select login method: 选择 2. Anthropic Console account · API usage billing )
|
||||
```
|
||||
|
||||
|
||||
|
||||
## 三、Claude Code中常用命令
|
||||
|
||||
`/status` 查看当前模型和连接状态
|
||||
|
||||
`/logout` 清除当前认证状态,退出登录后重新启动 Claude Code,通常会再次弹出登录方式选择界面。
|
||||
|
||||
|
||||
|
||||
## 四、插件
|
||||
|
||||
插件目录 ~/.claude/plugins
|
||||
|
||||
### Compound Engineering
|
||||
|
||||
**安装:** [参考](https://mp.weixin.qq.com/s/TfJAJF_viwpZ6xHDt4WrIw)
|
||||
|
||||
```
|
||||
claude 聊天中运行
|
||||
/plugin marketplace add EveryInc/compound-engineering-plugin 添加插件
|
||||
/plugin install compound-engineering 安装插件
|
||||
/reload-plugins
|
||||
/ce-setup
|
||||
进入设置流程
|
||||
```
|
||||
|
||||
|
||||
|
||||
**用法:**
|
||||
|
||||
>这是一个非常强大的 AI 工程工具集,包含 38+ 个技能和 50+ 个代理。
|
||||
>
|
||||
> 快速开始
|
||||
>
|
||||
> 首先运行 /ce-setup 来诊断环境、安装缺失工具并引导项目配置。
|
||||
>
|
||||
> 核心工作流
|
||||
>
|
||||
> 这是一个完整的工程工作流,理念是"每一步工作都要让后续工作变得更容易"。
|
||||
>
|
||||
> ┌─────────────────┬─────────────────────────────────────────────────────────────────┐
|
||||
> │ 技能 │ 用途 │
|
||||
> ├─────────────────┼─────────────────────────────────────────────────────────────────┤
|
||||
> │ /ce-strategy │ 创建 STRATEGY.md - 定义产品的目标问题、方法、用户画像、关键指标 │
|
||||
> ├─────────────────┼─────────────────────────────────────────────────────────────────┤
|
||||
> │ /ce-ideate │ 大局观构思:生成并批判性评估想法 │
|
||||
> ├─────────────────┼─────────────────────────────────────────────────────────────────┤
|
||||
> │ /ce-brainstorm │ 交互式问答,在规划前写出需求文档 ,头脑风暴需求 │
|
||||
> ├─────────────────┼─────────────────────────────────────────────────────────────────┤
|
||||
> │ /ce-plan │ 制定详细的实施计划 │
|
||||
> ├─────────────────┼─────────────────────────────────────────────────────────────────┤
|
||||
> │ /ce-work │ 系统地执行工作项 │
|
||||
> ├─────────────────┼─────────────────────────────────────────────────────────────────┤
|
||||
> │ /ce-debug │ 系统性地找出根本原因并修复 bug │
|
||||
> ├─────────────────┼─────────────────────────────────────────────────────────────────┤
|
||||
> │ /ce-code-review │ 结构化代码审查,使用分层角色代理 │
|
||||
> ├─────────────────┼─────────────────────────────────────────────────────────────────┤
|
||||
> │ /ce-compound │ 记录已解决问题,积累团队知识 │
|
||||
> └─────────────────┴─────────────────────────────────────────────────────────────────┘
|
||||
>
|
||||
> 典型使用示例
|
||||
>
|
||||
> 功能开发:
|
||||
> /ce-brainstorm "让后台任务重试更安全"
|
||||
> /ce-plan docs/brainstorms/background-job-retry-safety-requirements.md
|
||||
> /ce-work
|
||||
> /ce-code-review
|
||||
> /ce-compound
|
||||
>
|
||||
> 专注调试:
|
||||
> /ce-debug "结账 webhook 有时创建重复发票"
|
||||
> /ce-code-review
|
||||
> /ce-compound
|
||||
>
|
||||
> 其他有用技能
|
||||
>
|
||||
> - /ce-product-pulse - 生成产品使用情况报告(保存到 docs/pulse-reports/)
|
||||
> - /ce-optimize - 迭代优化循环
|
||||
> - /ce-doc-review - 文档审查
|
||||
> - /ce-commit-push-pr - 提交、推送并打开 PR
|
||||
>
|
||||
>
|
||||
@@ -0,0 +1,616 @@
|
||||
# Hermes Agent 0.9.0
|
||||
|
||||
https://hermes-agent.nousresearch.com/docs/user-guide/docker?_highlight=docker#docker-compose-example
|
||||
|
||||
1.95.126.170
|
||||
|
||||
## 部署
|
||||
|
||||
/www/hermes/docker-compose.yml
|
||||
|
||||
```yml
|
||||
services:
|
||||
hermes:
|
||||
image: nousresearch/hermes-agent:latest
|
||||
container_name: hermes
|
||||
restart: unless-stopped
|
||||
command: gateway run
|
||||
ports:
|
||||
- "8642:8642"
|
||||
volumes:
|
||||
- ~/.hermes:/opt/data
|
||||
networks:
|
||||
- hermes-net
|
||||
# Uncomment to forward specific env vars instead of using .env file:
|
||||
# environment:
|
||||
# - ANTHROPIC_API_KEY=${ANTHROPIC_API_KEY}
|
||||
# - OPENAI_API_KEY=${OPENAI_API_KEY}
|
||||
# - TELEGRAM_BOT_TOKEN=${TELEGRAM_BOT_TOKEN}
|
||||
deploy:
|
||||
resources:
|
||||
limits:
|
||||
memory: 4G
|
||||
cpus: "2.0"
|
||||
|
||||
dashboard:
|
||||
image: nousresearch/hermes-agent:latest
|
||||
container_name: hermes-dashboard
|
||||
restart: unless-stopped
|
||||
command: dashboard --host 0.0.0.0 --insecure
|
||||
ports:
|
||||
- "9119:9119"
|
||||
volumes:
|
||||
- ~/.hermes:/opt/data
|
||||
environment:
|
||||
- GATEWAY_HEALTH_URL=http://hermes:8642
|
||||
networks:
|
||||
- hermes-net
|
||||
depends_on:
|
||||
- hermes
|
||||
deploy:
|
||||
resources:
|
||||
limits:
|
||||
memory: 512M
|
||||
cpus: "0.5"
|
||||
|
||||
networks:
|
||||
hermes-net:
|
||||
driver: bridge
|
||||
```
|
||||
|
||||
## 配置
|
||||
|
||||
cat ~/.hermes/.env 配置仪表盘、模型、消息平台
|
||||
|
||||
```ini
|
||||
# Hermes Agent Environment Configuration
|
||||
# Copy this file to .env and fill in your API keys
|
||||
|
||||
# =============================================================================
|
||||
# LLM PROVIDER (OpenRouter)
|
||||
# =============================================================================
|
||||
# OpenRouter provides access to many models through one API
|
||||
# All LLM calls go through OpenRouter - no direct provider keys needed
|
||||
# Get your key at: https://openrouter.ai/keys
|
||||
# OPENROUTER_API_KEY=
|
||||
|
||||
# Default model is configured in ~/.hermes/config.yaml (model.default).
|
||||
# Use 'hermes model' or 'hermes setup' to change it.
|
||||
# LLM_MODEL is no longer read from .env — this line is kept for reference only.
|
||||
# LLM_MODEL=anthropic/claude-opus-4.6
|
||||
|
||||
# =============================================================================
|
||||
# LLM PROVIDER (Google AI Studio / Gemini)
|
||||
# =============================================================================
|
||||
# Native Gemini API via Google's OpenAI-compatible endpoint.
|
||||
# Get your key at: https://aistudio.google.com/app/apikey
|
||||
# GOOGLE_API_KEY=your_google_ai_studio_key_here
|
||||
# GEMINI_API_KEY=your_gemini_key_here # alias for GOOGLE_API_KEY
|
||||
# Optional base URL override (default: Google's OpenAI-compatible endpoint)
|
||||
# GEMINI_BASE_URL=https://generativelanguage.googleapis.com/v1beta/openai
|
||||
|
||||
# =============================================================================
|
||||
# LLM PROVIDER (z.ai / GLM)
|
||||
# =============================================================================
|
||||
# z.ai provides access to ZhipuAI GLM models (GLM-4-Plus, etc.)
|
||||
# Get your key at: https://z.ai or https://open.bigmodel.cn
|
||||
# GLM_API_KEY=
|
||||
# GLM_BASE_URL=https://api.z.ai/api/paas/v4 # Override default base URL
|
||||
|
||||
# =============================================================================
|
||||
# LLM PROVIDER (Kimi / Moonshot)
|
||||
# =============================================================================
|
||||
# Kimi Code provides access to Moonshot AI coding models (kimi-k2.5, etc.)
|
||||
# Get your key at: https://platform.kimi.ai (Kimi Code console)
|
||||
# Keys prefixed sk-kimi- use the Kimi Code API (api.kimi.com) by default.
|
||||
# Legacy keys from platform.moonshot.ai need KIMI_BASE_URL override below.
|
||||
# KIMI_API_KEY=
|
||||
# KIMI_BASE_URL=https://api.kimi.com/coding/v1 # Default for sk-kimi- keys
|
||||
# KIMI_BASE_URL=https://api.moonshot.ai/v1 # For legacy Moonshot keys
|
||||
# KIMI_BASE_URL=https://api.moonshot.cn/v1 # For Moonshot China keys
|
||||
# KIMI_CN_API_KEY= # Dedicated Moonshot China key
|
||||
|
||||
# =============================================================================
|
||||
# LLM PROVIDER (Arcee AI)
|
||||
# =============================================================================
|
||||
# Arcee AI provides access to Trinity models (trinity-mini, trinity-large-*)
|
||||
# Get an Arcee key at: https://chat.arcee.ai/
|
||||
# ARCEEAI_API_KEY=
|
||||
# ARCEE_BASE_URL= # Override default base URL
|
||||
|
||||
# =============================================================================
|
||||
# LLM PROVIDER (MiniMax)
|
||||
# =============================================================================
|
||||
# MiniMax provides access to MiniMax models (global endpoint)
|
||||
# Get your key at: https://www.minimax.io
|
||||
# MINIMAX_API_KEY=
|
||||
# MINIMAX_BASE_URL=https://api.minimax.io/v1 # Override default base URL
|
||||
|
||||
# MiniMax China endpoint (for users in mainland China)
|
||||
# MINIMAX_CN_API_KEY=
|
||||
# MINIMAX_CN_BASE_URL=https://api.minimaxi.com/v1 # Override default base URL
|
||||
|
||||
# =============================================================================
|
||||
# LLM PROVIDER (OpenCode Zen)
|
||||
# =============================================================================
|
||||
# OpenCode Zen provides curated, tested models (GPT, Claude, Gemini, MiniMax, GLM, Kimi)
|
||||
# Pay-as-you-go pricing. Get your key at: https://opencode.ai/auth
|
||||
# OPENCODE_ZEN_API_KEY=
|
||||
# OPENCODE_ZEN_BASE_URL=https://opencode.ai/zen/v1 # Override default base URL
|
||||
A
|
||||
|
||||
# =============================================================================
|
||||
# LLM PROVIDER (OpenCode Go)
|
||||
# =============================================================================
|
||||
# OpenCode Go provides access to open models (GLM-5, Kimi K2.5, MiniMax M2.5)
|
||||
# A
|
||||
# $10/month subscription. Get your key at: https://opencode.ai/auth
|
||||
# OPENCODE_GO_API_KEY=
|
||||
|
||||
# =============================================================================
|
||||
# LLM PROVIDER (Hugging Face Inference Providers)
|
||||
# =============================================================================
|
||||
# Hugging Face routes to 20+ open models via unified OpenAI-compatible endpoint.
|
||||
# Free tier included ($0.10/month), no markup on provider rates.
|
||||
# Get your token at: https://huggingface.co/settings/tokens
|
||||
# Required permission: "Make calls to Inference Providers"
|
||||
# HF_TOKEN=
|
||||
# OPENCODE_GO_BASE_URL=https://opencode.ai/zen/go/v1 # Override default base URL
|
||||
|
||||
# =============================================================================
|
||||
# LLM PROVIDER (Qwen OAuth)
|
||||
# =============================================================================
|
||||
# Qwen OAuth reuses your local Qwen CLI login (qwen auth qwen-oauth).
|
||||
# No API key needed — credentials come from ~/.qwen/oauth_creds.json.
|
||||
# Optional base URL override:
|
||||
# HERMES_QWEN_BASE_URL=https://portal.qwen.ai/v1
|
||||
|
||||
# =============================================================================
|
||||
# LLM PROVIDER (Xiaomi MiMo)
|
||||
# =============================================================================
|
||||
# Xiaomi MiMo models (mimo-v2-pro, mimo-v2-omni, mimo-v2-flash).
|
||||
# Get your key at: https://platform.xiaomimimo.com
|
||||
# XIAOMI_API_KEY=your_key_here
|
||||
# Optional base URL override:
|
||||
# XIAOMI_BASE_URL=https://api.xiaomimimo.com/v1
|
||||
|
||||
# =============================================================================
|
||||
# TOOL API KEYS
|
||||
# =============================================================================
|
||||
|
||||
# Exa API Key - AI-native web search and contents
|
||||
# Get at: https://exa.ai
|
||||
# EXA_API_KEY=
|
||||
|
||||
# Parallel API Key - AI-native web search and extract
|
||||
# Get at: https://parallel.ai
|
||||
# PARALLEL_API_KEY=
|
||||
|
||||
# Firecrawl API Key - Web search, extract, and crawl
|
||||
# Get at: https://firecrawl.dev/
|
||||
# FIRECRAWL_API_KEY=
|
||||
|
||||
|
||||
# FAL.ai API Key - Image generation
|
||||
# Get at: https://fal.ai/
|
||||
# FAL_KEY=
|
||||
|
||||
# Honcho - Cross-session AI-native user modeling (optional)
|
||||
# Builds a persistent understanding of the user across sessions and tools.
|
||||
# Get at: https://app.honcho.dev
|
||||
# Also requires ~/.honcho/config.json with enabled=true (see README).
|
||||
# HONCHO_API_KEY=
|
||||
|
||||
# =============================================================================
|
||||
# TERMINAL TOOL CONFIGURATION
|
||||
# =============================================================================
|
||||
# Backend type: "local", "singularity", "docker", "modal", or "ssh"
|
||||
# Terminal backend is configured in ~/.hermes/config.yaml (terminal.backend).
|
||||
# Use 'hermes setup' or 'hermes config set terminal.backend docker' to change.
|
||||
# Supported: local, docker, singularity, modal, ssh
|
||||
#
|
||||
# Only override here if you need to force a backend without touching config.yaml:
|
||||
# TERMINAL_ENV=local
|
||||
|
||||
# Override the container runtime binary (e.g. to use Podman instead of Docker).
|
||||
# Useful on systems where Docker's storage driver is broken or unavailable.
|
||||
# HERMES_DOCKER_BINARY=/usr/local/bin/podman
|
||||
|
||||
# Container images (for singularity/docker/modal backends)
|
||||
# TERMINAL_DOCKER_IMAGE=nikolaik/python-nodejs:python3.11-nodejs20
|
||||
# TERMINAL_SINGULARITY_IMAGE=docker://nikolaik/python-nodejs:python3.11-nodejs20
|
||||
TERMINAL_MODAL_IMAGE=nikolaik/python-nodejs:python3.11-nodejs20
|
||||
|
||||
|
||||
# Working directory for terminal commands
|
||||
# For local backend: "." means current directory (resolved automatically)
|
||||
# For remote backends (ssh/docker/modal/singularity): use an absolute path
|
||||
# INSIDE the target environment, or leave unset for the backend's default
|
||||
# (/root for modal, / for docker, ~ for ssh). Do NOT use a host-local path.
|
||||
# Usually managed by config.yaml (terminal.cwd) — uncomment to override
|
||||
# TERMINAL_CWD=.
|
||||
|
||||
# Default command timeout in seconds
|
||||
TERMINAL_TIMEOUT=60
|
||||
|
||||
# Cleanup inactive environments after this many seconds
|
||||
TERMINAL_LIFETIME_SECONDS=300
|
||||
|
||||
# =============================================================================
|
||||
# SSH REMOTE EXECUTION (for TERMINAL_ENV=ssh)
|
||||
# =============================================================================
|
||||
# Run terminal commands on a remote server via SSH.
|
||||
# Agent code stays on your machine, commands execute remotely.
|
||||
#
|
||||
# SECURITY BENEFITS:
|
||||
# - Agent cannot read your .env file (API keys protected)
|
||||
# - Agent cannot modify its own code
|
||||
# - Remote server acts as isolated sandbox
|
||||
# - Can safely configure passwordless sudo on remote
|
||||
#
|
||||
# TERMINAL_SSH_HOST=192.168.1.100
|
||||
# TERMINAL_SSH_USER=agent
|
||||
# TERMINAL_SSH_PORT=22
|
||||
# TERMINAL_SSH_KEY=~/.ssh/id_rsa
|
||||
|
||||
# =============================================================================
|
||||
# SUDO SUPPORT (works with ALL terminal backends)
|
||||
# =============================================================================
|
||||
# If set, enables sudo commands by piping password via `sudo -S`.
|
||||
# Works with: local, docker, singularity, modal, and ssh backends.
|
||||
#
|
||||
# SECURITY WARNING: Password stored in plaintext. Only use on trusted machines.
|
||||
#
|
||||
# ALTERNATIVES:
|
||||
# - For SSH backend: Configure passwordless sudo on the remote server
|
||||
# - For containers: Run as root inside the container (no sudo needed)
|
||||
# - For local: Configure /etc/sudoers for specific commands
|
||||
# - For CLI: Leave unset - you'll be prompted interactively with 45s timeout
|
||||
#
|
||||
# SUDO_PASSWORD=your_password_here
|
||||
|
||||
# =============================================================================
|
||||
# MODAL CLOUD BACKEND (Optional - for TERMINAL_ENV=modal)
|
||||
# =============================================================================
|
||||
# Modal uses CLI authentication, not environment variables.
|
||||
# Run: pip install modal && modal setup
|
||||
# This will authenticate via browser and store credentials locally.
|
||||
# No API key needed in .env - Modal handles auth automatically.
|
||||
|
||||
# =============================================================================
|
||||
# BROWSER TOOL CONFIGURATION (agent-browser + Browserbase)
|
||||
# =============================================================================
|
||||
# Browser automation requires Browserbase cloud service for remote browser execution.
|
||||
# This allows the agent to navigate websites, fill forms, and extract information.
|
||||
#
|
||||
# STEALTH MODES:
|
||||
# - Basic Stealth: ALWAYS active (random fingerprints, auto CAPTCHA solving)
|
||||
# - Advanced Stealth: Requires BROWSERBASE_ADVANCED_STEALTH=true (Scale Plan only)
|
||||
|
||||
# Browserbase API Key - Cloud browser execution
|
||||
# Get at: https://browserbase.com/
|
||||
# BROWSERBASE_API_KEY=
|
||||
|
||||
# Browserbase Project ID - From your Browserbase dashboard
|
||||
# BROWSERBASE_PROJECT_ID=
|
||||
|
||||
# Enable residential proxies for better CAPTCHA solving (default: true)
|
||||
# Routes traffic through residential IPs, significantly improves success rate
|
||||
BROWSERBASE_PROXIES=true
|
||||
|
||||
# Enable advanced stealth mode (default: false, requires Scale Plan)
|
||||
# Uses custom Chromium build to avoid bot detection altogether
|
||||
BROWSERBASE_ADVANCED_STEALTH=false
|
||||
|
||||
# Browser session timeout in seconds (default: 300)
|
||||
# Sessions are cleaned up after this duration of inactivity
|
||||
BROWSER_SESSION_TIMEOUT=300
|
||||
|
||||
# Browser inactivity timeout - auto-cleanup inactive sessions (default: 120 = 2 min)
|
||||
# Browser sessions are automatically closed after this period of no activity
|
||||
BROWSER_INACTIVITY_TIMEOUT=120
|
||||
|
||||
# =============================================================================
|
||||
# SESSION LOGGING
|
||||
# =============================================================================
|
||||
# Session trajectories are automatically saved to logs/ directory
|
||||
# Format: logs/session_YYYYMMDD_HHMMSS_UUID.json
|
||||
# Contains full conversation history in trajectory format for debugging/replay
|
||||
|
||||
# =============================================================================
|
||||
# VOICE TRANSCRIPTION & OPENAI TTS
|
||||
# =============================================================================
|
||||
# Required for voice message transcription (Whisper) and OpenAI TTS voices.
|
||||
# Uses OpenAI's API directly (not via OpenRouter).
|
||||
# Named VOICE_TOOLS_OPENAI_KEY to avoid interference with OpenRouter.
|
||||
# Get at: https://platform.openai.com/api-keys
|
||||
# VOICE_TOOLS_OPENAI_KEY=
|
||||
|
||||
# =============================================================================
|
||||
# SLACK INTEGRATION
|
||||
# =============================================================================
|
||||
# Slack Bot Token - From Slack App settings (OAuth & Permissions)
|
||||
# Get at: https://api.slack.com/apps
|
||||
# SLACK_BOT_TOKEN=xoxb-...
|
||||
|
||||
# Slack App Token - For Socket Mode (App-Level Tokens in Slack App settings)
|
||||
# SLACK_APP_TOKEN=xapp-...
|
||||
|
||||
# Slack allowed users (comma-separated Slack user IDs)
|
||||
# SLACK_ALLOWED_USERS=
|
||||
|
||||
# =============================================================================
|
||||
# TELEGRAM INTEGRATION
|
||||
# =============================================================================
|
||||
# Telegram Bot Token - From @BotFather (https://t.me/BotFather)
|
||||
# TELEGRAM_BOT_TOKEN=
|
||||
# TELEGRAM_ALLOWED_USERS= # Comma-separated user IDs
|
||||
# TELEGRAM_HOME_CHANNEL= # Default chat for cron delivery
|
||||
# TELEGRAM_HOME_CHANNEL_NAME= # Display name for home channel
|
||||
|
||||
# Webhook mode (optional — for cloud deployments like Fly.io/Railway)
|
||||
# Default is long polling. Setting TELEGRAM_WEBHOOK_URL switches to webhook mode.
|
||||
# TELEGRAM_WEBHOOK_URL=https://my-app.fly.dev/telegram
|
||||
# TELEGRAM_WEBHOOK_PORT=8443
|
||||
# TELEGRAM_WEBHOOK_SECRET= # Recommended for production
|
||||
|
||||
# WhatsApp (built-in Baileys bridge — run `hermes whatsapp` to pair)
|
||||
# WHATSAPP_ENABLED=false
|
||||
# WHATSAPP_ALLOWED_USERS=15551234567
|
||||
|
||||
# Email (IMAP/SMTP — send and receive emails as Hermes)
|
||||
# For Gmail: enable 2FA → create App Password at https://myaccount.google.com/apppasswords
|
||||
# EMAIL_ADDRESS=hermes@gmail.com
|
||||
# EMAIL_PASSWORD=xxxx xxxx xxxx xxxx
|
||||
# EMAIL_IMAP_HOST=imap.gmail.com
|
||||
# EMAIL_IMAP_PORT=993
|
||||
# EMAIL_SMTP_HOST=smtp.gmail.com
|
||||
# EMAIL_SMTP_PORT=587
|
||||
# EMAIL_POLL_INTERVAL=15
|
||||
# EMAIL_ALLOWED_USERS=your@email.com
|
||||
# EMAIL_HOME_ADDRESS=your@email.com
|
||||
|
||||
# Gateway-wide: allow ALL users without an allowlist (default: false = deny)
|
||||
# Only set to true if you intentionally want open access.
|
||||
# GATEWAY_ALLOW_ALL_USERS=false
|
||||
|
||||
# =============================================================================
|
||||
# RESPONSE PACING
|
||||
# =============================================================================
|
||||
# Human-like delays between message chunks on messaging platforms.
|
||||
# Makes the bot feel less robotic.
|
||||
# HERMES_HUMAN_DELAY_MODE=off # off | natural | custom
|
||||
# HERMES_HUMAN_DELAY_MIN_MS=800 # Min delay in ms (custom mode)
|
||||
# HERMES_HUMAN_DELAY_MAX_MS=2500 # Max delay in ms (custom mode)
|
||||
|
||||
# =============================================================================
|
||||
# DEBUG OPTIONS
|
||||
# =============================================================================
|
||||
WEB_TOOLS_DEBUG=false
|
||||
VISION_TOOLS_DEBUG=false
|
||||
MOA_TOOLS_DEBUG=false
|
||||
IMAGE_TOOLS_DEBUG=false
|
||||
|
||||
# =============================================================================
|
||||
# CONTEXT COMPRESSION (Auto-shrinks long conversations)
|
||||
# =============================================================================
|
||||
# When conversation approaches model's context limit, middle turns are
|
||||
# automatically summarized to free up space.
|
||||
#
|
||||
# Context compression is configured in ~/.hermes/config.yaml under compression:
|
||||
# CONTEXT_COMPRESSION_ENABLED=true # Enable auto-compression (default: true)
|
||||
# CONTEXT_COMPRESSION_THRESHOLD=0.85 # Compress at 85% of context limit
|
||||
# Model is set via compression.summary_model in config.yaml (default: google/gemini-3-flash-preview)
|
||||
|
||||
# =============================================================================
|
||||
# RL TRAINING (Tinker + Atropos)
|
||||
# =============================================================================
|
||||
# Run reinforcement learning training on language models using the Tinker API.
|
||||
# Requires the rl-server to be running (from tinker-atropos package).
|
||||
|
||||
# Tinker API Key - RL training service
|
||||
# Get at: https://tinker-console.thinkingmachines.ai/keys
|
||||
# TINKER_API_KEY=
|
||||
|
||||
# Weights & Biases API Key - Experiment tracking and metrics
|
||||
# Get at: https://wandb.ai/authorize
|
||||
# WANDB_API_KEY=
|
||||
|
||||
# RL API Server URL (default: http://localhost:8080)
|
||||
# Change if running the rl-server on a different host/port
|
||||
# RL_API_URL=http://localhost:8080
|
||||
|
||||
# =============================================================================
|
||||
# SKILLS HUB (GitHub integration for skill search/install/publish)
|
||||
# =============================================================================
|
||||
|
||||
# GitHub Personal Access Token — for higher API rate limits on skill search/install
|
||||
# Get at: https://github.com/settings/tokens (Fine-grained recommended)
|
||||
# GITHUB_TOKEN=ghp_xxxxxxxxxxxxxxxxxxxx
|
||||
|
||||
# GitHub App credentials (optional — for bot identity on PRs)
|
||||
# GITHUB_APP_ID=
|
||||
# GITHUB_APP_PRIVATE_KEY_PATH=
|
||||
# GITHUB_APP_INSTALLATION_ID=
|
||||
|
||||
# Groq API key (free tier — used for Whisper STT in voice mode)
|
||||
# GROQ_API_KEY=
|
||||
|
||||
# =============================================================================
|
||||
# STT PROVIDER SELECTION
|
||||
# =============================================================================
|
||||
# Default STT provider is "local" (faster-whisper) — runs on your machine, no API key needed.
|
||||
# Install with: pip install faster-whisper
|
||||
# Model downloads automatically on first use (~150 MB for "base").
|
||||
# To use cloud providers instead, set GROQ_API_KEY or VOICE_TOOLS_OPENAI_KEY above.
|
||||
# Provider priority: local > groq > openai
|
||||
# Configure in config.yaml: stt.provider: local | groq | openai
|
||||
|
||||
# =============================================================================
|
||||
# STT ADVANCED OVERRIDES (optional)
|
||||
# =============================================================================
|
||||
# Override default STT models per provider (normally set via stt.model in config.yaml)
|
||||
# STT_GROQ_MODEL=whisper-large-v3-turbo
|
||||
# STT_OPENAI_MODEL=whisper-1
|
||||
|
||||
# Override STT provider endpoints (for proxies or self-hosted instances)
|
||||
# GROQ_BASE_URL=https://api.groq.com/openai/v1
|
||||
# STT_OPENAI_BASE_URL=https://api.openai.com/v1
|
||||
|
||||
# 基本配置
|
||||
GATEWAY_ALLOW_ALL_USERS=true
|
||||
|
||||
# Dashboard 认证
|
||||
DASHBOARD_USERNAME=admin
|
||||
DASHBOARD_PASSWORD=your_secure_password_here # 改成你的密码
|
||||
|
||||
# 时区
|
||||
TZ=Asia/Shanghai
|
||||
ARCEEAI_API_KEY=dec57933-c3e4-4f4a-a114-b0ae8104e396
|
||||
ARCEE_BASE_URL=https://ark.cn-beijing.volces.com/api/coding/v3
|
||||
|
||||
|
||||
WECOM_BOT_ID=aib1TzEwFCq0ia1...qlDLt2Fv
|
||||
WECOM_SECRET=1U12qLqTdV......ZRIt0ts
|
||||
GATEWAY_ALLOW_ALL_USERS=true
|
||||
WECOM_ALLOWED_USERS=LiuRan,ZhangHongGang,WangJian
|
||||
WECOM_ENABLE_ATTACHMENTS=true
|
||||
```
|
||||
|
||||
### 飞书
|
||||
|
||||
**飞书开发者后台**
|
||||
|
||||
1. 登录[飞书开放平台](https://open.feishu.cn/app?lang=zh-CN),创建**企业自建应用**
|
||||
2. `App ID` cli_a968...9cb6 和 `App Secret` Q8ykzs7vJdhnda.....qpxi1yfAJIUQH
|
||||
3. 进入「添加应用能力」→ **机器人** →允许接收私聊、允许接收群聊
|
||||
4. 进入「权限管理」
|
||||
- `im:message:send_as_bot`(机器人发消息)
|
||||
- `im:message:readonly`(读消息)
|
||||
- `im:message.group_at_msg:readonly`(群 @)
|
||||
- `im:message.p2p_msg:readonly`(私聊)
|
||||
- `contact:user.employee_id:readonly`(读用户 ID)
|
||||
5. 事件订阅(WebSocket 模式)
|
||||
- 左侧:开发配置 → 事件与回调
|
||||
- 接收方式:**使用长连接(WebSocket)**
|
||||
- 订阅事件:**`im.message.receive_v1`**(接收消息)
|
||||
- 保存
|
||||
6. 版本发布**→ 发布**(企业自建应用直接可用)
|
||||
|
||||
**服务器:安装飞书依赖**
|
||||
|
||||
```bash
|
||||
# 进入 Hermes 环境(根据你部署方式)
|
||||
cd /path/to/hermes
|
||||
|
||||
# 安装飞书 SDK
|
||||
uv pip install lark-oapi websockets
|
||||
# 或用 pip(无 uv)
|
||||
pip install lark-oapi websockets
|
||||
```
|
||||
|
||||
### 企业微信
|
||||
|
||||
1. 打开企业微信管理后台,创建机器人:https://work.weixin.qq.com/wework_admin/frame#/aiHelper/create 选择 API 方式创建 **CS Agent**
|
||||
- 连接方式: 使用长连接
|
||||
- Bot ID aib1T ...t2Fv
|
||||
- Secret 1U12qLqTdV pd8kKBrcpdk9pZRIt0ts
|
||||
2. 可使用权限
|
||||
- 用户信息
|
||||
- 文档
|
||||
|
||||
## 运行
|
||||
|
||||
```
|
||||
cd /www/hermes/
|
||||
docker run -it --rm -v ~/.hermes:/opt/data nousresearch/hermes-agent
|
||||
```
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
我想让你通过学习成为医学减重领域的个案管理师,怎么样
|
||||
|
||||
我可以按照「权威知识库搭建→适配你的工作规范→实操迭代优化」三步来完成学习,完全适配你的工作需求:
|
||||
|
||||
### 第一步:先搭建权威的医学减重知识基底(我可以自主完成)
|
||||
|
||||
我会优先检索学习国内国际公认的权威指南作为底层知识框架,包括:
|
||||
|
||||
- 《中国超重/肥胖医学营养治疗指南(2023版)》
|
||||
- WHO成人肥胖预防与管理临床指南
|
||||
- 肥胖合并高血压/糖尿病/多囊卵巢综合征等并发症的个体化干预规范
|
||||
- 医学减重个案管理师执业标准、随访规范、风险预警指征 所有核心知识我会保存到持久化记忆,使用时严格遵循规范不会出错。
|
||||
|
||||
|
||||
|
||||
### 第二步:对齐你的专属工作要求(需要你提供相关资料)
|
||||
|
||||
你可以把你常用的工作资料发给我,我会全部学习固化:
|
||||
|
||||
1. 你的机构内部工作规范、减重方案模板、个案评估表/随访记录表模板
|
||||
2. 过往你处理过的优质个案案例(可以隐去隐私信息)
|
||||
3. 你需要我输出的报告格式、需要遵守的特殊要求 我会把这些规则做成专属技能,后续处理任务100%符合你的工作标准。
|
||||
|
||||
|
||||
|
||||
### 第三步:实操迭代优化
|
||||
|
||||
你可以先拿少量测试个案让我尝试处理,你给出反馈后我会快速修正优化,几轮之后就能达到你要求的处理精度。
|
||||
|
||||
|
||||
|
||||
### 学完后我可以帮你完成的工作:
|
||||
|
||||
✅ 个案初筛:自动根据用户的体检数据、基础病史做减重风险评估,判断适应症/禁忌症 ✅ 方案生成:根据个体情况生成个性化的饮食/运动/行为干预方案,特殊合并症人群自动适配特殊要求 ✅ 随访管理:自动生成随访计划、随访话术,可配置定时任务自动给你推送当日待跟进个案清单 ✅ 数据统计:自动统计个案的减重达标率、依从率、不良反应发生率等数据,生成汇总报表
|
||||
|
||||
你看这个学习路径是否符合你的预期?有没有需要我优先学习的资料或者重点侧重的功能?
|
||||
|
||||
## mini 机器
|
||||
|
||||
https://github.com/NousResearch/hermes-agent
|
||||
|
||||
|
||||
|
||||
```
|
||||
第一步:创建虚拟环境(彻底避开系统限制)
|
||||
bash
|
||||
运行
|
||||
cd ~/hermes-agent-main
|
||||
python3 -m venv venv
|
||||
source venv/bin/activate
|
||||
第二步:安装所有依赖(不会再报错!)
|
||||
bash
|
||||
运行
|
||||
pip install -r requirements.txt
|
||||
第三步:运行项目
|
||||
bash
|
||||
运行
|
||||
./hermes
|
||||
为什么这个方法一定成功?
|
||||
之前报错:因为你直接用系统 Python,系统包不允许被 pip 修改 / 卸载
|
||||
现在方法:创建了独立干净的 Python 环境,没有任何权限冲突
|
||||
所有包都能正常安装,不会再出现任何报错
|
||||
以后每次使用前,先激活环境
|
||||
每次新开终端,只需要先运行这一行:
|
||||
bash
|
||||
运行
|
||||
source ~/hermes-agent-main/venv/bin/activate
|
||||
然后再运行:
|
||||
bash
|
||||
运行
|
||||
./hermes
|
||||
总结
|
||||
执行这三行,彻底解决所有问题:
|
||||
bash
|
||||
运行
|
||||
cd ~/hermes-agent-main
|
||||
python3 -m venv venv
|
||||
source venv/bin/activate
|
||||
pip install -r requirements.txt
|
||||
./hermes
|
||||
再也不会有 ModuleNotFound / 无法卸载 / 权限报错 等问题
|
||||
这是 Linux 上运行 Python 项目的标准正确方法
|
||||
```
|
||||
|
||||
-53
@@ -1,53 +0,0 @@
|
||||
# 准备工作
|
||||
|
||||
|
||||
|
||||
## 一、CC Switch
|
||||
|
||||
先完成 CC Switch 安装
|
||||
|
||||
- 下载页面 https://www.ccswitch.io/zh/ 选择 [CC-Switch-v3.15.0-Linux-x86_64.deb ](https://github.com/farion1231/cc-switch/releases/download/v3.15.0/CC-Switch-v3.15.0-Linux-x86_64.deb)
|
||||
|
||||
- sudo dpkg -i CC-Switch-v3.15.0-Linux-x86_64.deb
|
||||
|
||||
- 进入 CC-Switch 配置 claude 供应商
|
||||
1. 选择 DouBaoSeed
|
||||
1. 输入 API Key :ark-8f9...
|
||||
1. 模型映射 - 获取模型列表(选择想用的,如 doubao-seed-2-0-code-preview-latest)
|
||||
1. 测试 & 保存 (CC Switch 会自动把正确的 base_url + api_key 写入:~/.claude/settings.json)
|
||||
|
||||
> 避免 Claude Code 进入官方校验
|
||||
>
|
||||
> ```bash
|
||||
> # 1. 退出 claude 并清所有相关变量
|
||||
> claude /logout
|
||||
> unset ANTHROPIC_AUTH_TOKEN ANTHROPIC_API_KEY ANTHROPIC_BASE_URL
|
||||
>
|
||||
> # 2. 检查:必须什么都不输出
|
||||
> env | grep ANTHROPIC
|
||||
>
|
||||
> # 3. 清空 Claude 配置(让 CC Switch 全权接管)
|
||||
> > ~/.claude/settings.json
|
||||
>
|
||||
> # 4. 关键:跳过官方首次校验(国内必做)
|
||||
> echo '{"hasCompletedOnboarding": true}' > ~/.claude.json
|
||||
> ```
|
||||
|
||||
## 二、CLAUDE CODE
|
||||
|
||||
Version: 2.1.152
|
||||
|
||||
```bash
|
||||
curl -fsSL https://claude.ai/install.sh | bash
|
||||
echo 'export PATH="$HOME/.local/bin:$PATH"' >> ~/.bashrc && source ~/.bashrc
|
||||
claude 运行 ( 理论上不会出现初始化配置选项,如 Select login method: 选择 2. Anthropic Console account · API usage billing )
|
||||
```
|
||||
|
||||
|
||||
|
||||
## 三、Claude Code中常用命令
|
||||
|
||||
`/status` 查看当前模型和连接状态
|
||||
|
||||
`/logout` 清除当前认证状态,退出登录后重新启动 Claude Code,通常会再次弹出登录方式选择界面。
|
||||
|
||||
@@ -0,0 +1,271 @@
|
||||
# 电子秤图片识别系统设计文档
|
||||
|
||||
> 版本:v1.0
|
||||
> 最后更新:2026-05-20
|
||||
> 状态:草案(待迭代)
|
||||
|
||||
## 1. 项目背景
|
||||
|
||||
### 1.1 业务场景
|
||||
用户拍摄电子秤显示器图片,需自动提取**读数(数字 + 单位)**用于数据录入或核对。
|
||||
图片来源包括家用体重秤、台秤、智能手机 App 截图等。
|
||||
|
||||
### 1.2 现状
|
||||
- 现有脚本 `preprocess.py` 仅完成 ROI 裁剪 + 对比度增强,无识别能力
|
||||
- 122 张样本通过人工 + 多模态模型识别,准确率约 80%,主要误差来源见 §2
|
||||
|
||||
### 1.3 目标
|
||||
- **短期**:将自动识别准确率提升至 **≥85%**,剩余由人工核对
|
||||
- **中期**:达到 **≥95%**,仅极端模糊/异常显示需要人工
|
||||
- **长期**:支持任意品牌秤的型号自动识别 + 单位歧义消除
|
||||
|
||||
|
||||
## 2. 问题分析(基于 122 张样本)
|
||||
|
||||
### 2.1 误差来源分布
|
||||
|
||||
| 误差类型 | 占比 | 典型样例 | 严重性 |
|
||||
|---|---|---|---|
|
||||
| 拍摄方向倒置 | ~25% | 倒着读"119.35"识别为反向 | 🔴 高 |
|
||||
| ROI 裁切过紧 | ~12% | 只剩"6"丢失其余数字 | 🔴 高 |
|
||||
| 7段数码管字符混淆 | ~16% | `7↔L`、`2↔5`、`6↔9` | 🟡 中 |
|
||||
| 显示器反光/模糊 | ~8% | 粉色秤小屏幕 | 🟡 中 |
|
||||
| 单位歧义(kg/斤) | ~5% | 小米秤同显示两种模式 | 🟢 低 |
|
||||
| 显示异常(启动/出错) | ~2% | 段笔画不完整 | 🟢 低 |
|
||||
|
||||
### 2.2 设备类型分布
|
||||
|
||||
| 设备类型 | 占比 | 显示特征 | 难点 |
|
||||
|---|---|---|---|
|
||||
| 蓝背光 LCD 体重秤 | ~40% | 黑字蓝底 | 反光、自适应阈值难 |
|
||||
| LED 数码管秤(橙/白) | ~20% | 高对比度发光数字 | 易倒置拍摄 |
|
||||
| 香山台秤(多屏) | ~10% | 黑底白字三段 LCD | 多窗口干扰 |
|
||||
| App 截图 | ~25% | 高清,多字段 | 需要定位"主体重"字段 |
|
||||
| 其他(粉色、HUAWEI、LEFU 等) | ~5% | 各异 | 长尾问题 |
|
||||
|
||||
|
||||
## 3. 技术方案
|
||||
|
||||
### 3.1 整体 Pipeline
|
||||
|
||||
```
|
||||
┌─────────┐ ┌──────────┐ ┌────────┐ ┌────────┐ ┌────────┐ ┌────────┐
|
||||
│ 原始图片 │ → │ EXIF校正 │ → │ ROI检测 │ → │ 方向校正│ → │ 增强 │ → │ OCR识别 │
|
||||
└─────────┘ └──────────┘ └────────┘ └────────┘ └────────┘ └────────┘
|
||||
↓
|
||||
┌──────────────────┐
|
||||
│ 数值+单位解析 │
|
||||
│ 合理性校验 │
|
||||
└──────────────────┘
|
||||
↓
|
||||
┌──────────────────┐
|
||||
│ 输出:值/单位/置信│
|
||||
└──────────────────┘
|
||||
```
|
||||
|
||||
### 3.2 各阶段技术选型
|
||||
|
||||
#### Step 1 — EXIF 方向校正
|
||||
**目的**:手机拍摄的 JPG 自带 Orientation 元数据,可零成本转正。
|
||||
**方案**:`PIL.ExifTags` 读取 Orientation 标签,按 3/6/8 三种 case 旋转。
|
||||
**预期收益**:解决约 30% 的方向问题,几乎无成本。
|
||||
|
||||
#### Step 2 — ROI 检测(屏幕区域定位)
|
||||
**目的**:从全图裁出显示器矩形,降低 OCR 干扰。
|
||||
**方案**:4 策略加权打分:
|
||||
- **策略 A**:LED 高亮 + 低饱和(白/黄数码管)— boost ×1.3
|
||||
- **策略 B**:蓝色背光 LCD(HSV: H∈[85,140])
|
||||
- **策略 C**:白色/浅色 LCD(V>180, S<60)
|
||||
- **策略 D**:Canny 边缘 + 横向膨胀(用于黑底白字台秤)— boost ×0.8
|
||||
|
||||
**打分维度**:面积比、宽高比、对比度、亮度。
|
||||
**风险**:当前 margin 15% 偏紧,新版调至 20%;仍需观察。
|
||||
|
||||
#### Step 3 — 方向校正(4 向暴力 + 评分)
|
||||
**目的**:解决倒置/侧拍。
|
||||
**评分项**:
|
||||
1. **形状先验**:显示器一般 `w>h`(横向) → +1
|
||||
2. **7段重力特征**:水平投影下 1/3 行 ≥ 上 1/3 行 → +1(数字朝上的典型分布)
|
||||
3. **Tesseract OSD**:若返回 `rotate=0` → +2(强证据)
|
||||
|
||||
**选择**:取 0°/90°/180°/270° 中评分最高的方向。
|
||||
**优势**:即使 OSD 失败,仍有 7段特征兜底。
|
||||
|
||||
#### Step 4 — 图像增强
|
||||
**目的**:让 OCR 识别 7 段数码管。
|
||||
**关键步骤**:
|
||||
1. CLAHE(自适应直方图均衡化)
|
||||
2. 极性自动判断:若图像均值<127 视为暗背景,反转
|
||||
3. 自适应阈值(Gaussian, blockSize=25)
|
||||
4. **关键**:2×2 核膨胀 1 次 — 连接 7 段间隙,否则 Tesseract 会把一个数字切成多字符
|
||||
|
||||
#### Step 5 — OCR 识别(多后端)
|
||||
| 后端 | 优势 | 劣势 | 推荐场景 |
|
||||
|---|---|---|---|
|
||||
| **Tesseract** | 易部署、有 LCD 字模 | 7段需调参 | 默认 |
|
||||
| **SSOCR** | 专为 7段设计 | 仅命令行、需手编译 | 干净 LED 显示 |
|
||||
| **PaddleOCR** | 中文+数字皆强 | 体积大 | App 截图 |
|
||||
| **自训 CNN** | 准确率天花板高 | 需训练数据 | 长期投入 |
|
||||
|
||||
**当前默认**:Tesseract,PSM=7(单行),whitelist 限制为数字 + 单位字符。
|
||||
|
||||
#### Step 6 — 数值合理性校验
|
||||
**正则提取**:`(\d{1,3}\.?\d{0,2})\s*(kg|KG|公斤|千克|斤|两|lb)?`
|
||||
**合理范围**:
|
||||
- kg / 公斤 / 千克:10–200
|
||||
- 斤 / 两:20–400
|
||||
- lb:20–440
|
||||
|
||||
**作用**:拦截 OCR 典型错误(如把 `1` 错认 `7` 得到 170kg 时直接 reject)。
|
||||
|
||||
|
||||
## 4. 文件结构
|
||||
|
||||
```
|
||||
新建文件夹 1/
|
||||
├── 未识别/ # 待处理原图
|
||||
├── 预处理/ # 旧版输出(v1)
|
||||
├── 预处理v2/ # 新版输出
|
||||
│ ├── *.png # 各图的二值化预处理结果
|
||||
│ └── _results.csv # 识别汇总(文件名/旋转/OCR/置信度/解析值)
|
||||
├── preprocess_v2.py # 新版(预处理 + 识别 + 校验)
|
||||
└── 设计文档.md # 本文档
|
||||
```
|
||||
|
||||
|
||||
## 5. 评估指标
|
||||
|
||||
### 5.1 准确率指标
|
||||
- **Top-1 准确率**:单次识别完全正确(数字 + 单位都对)的比例
|
||||
- **数字准确率**:仅数字正确(容忍单位错误)
|
||||
- **±5% 容忍准确率**:识别值与真值相对误差 ≤5%(避免小数位识别错带来的负面感受)
|
||||
|
||||
### 5.2 召回率 / 拒识率
|
||||
- **拒识率**:合理性校验后输出"无效"的比例(高拒识率好过错误识别)
|
||||
- **置信度阈值**:建议 Tesseract conf > 60 才接受,否则降级人工
|
||||
|
||||
### 5.3 性能
|
||||
- 单图处理时间 < 1s(CPU 单线程)
|
||||
- 122 张全量处理 < 2 分钟
|
||||
|
||||
|
||||
## 6. 已知局限与待解决
|
||||
|
||||
| # | 问题 | 当前对策 | 长期改进 |
|
||||
|---|---|---|---|
|
||||
| L1 | 粉色秤显示器尺寸过小(占图 <1%) | margin 放宽 | 改用超分辨率(ESRGAN)预处理 |
|
||||
| L2 | 启动画面/异常显示误识 | 合理性校验拦截 | 加一个"是否合法读数"的分类器 |
|
||||
| L3 | 香山台秤三屏,只需重量栏 | ROI 取最大 LCD | 训练专用检测器,按字段标签裁切 |
|
||||
| L4 | 小米秤 kg/斤 模式无法从显示判断 | 默认按区间猜单位 | 加品牌+型号识别(CNN 或 logo 匹配) |
|
||||
| L5 | OCR 把"6/9"反向识别 | 4向暴力旋转 | 训练朝向检测器 |
|
||||
| L6 | App 截图字段繁多 | 暂用通用 OCR + 选最大数字 | 训练 App 截图专用 layout 模型 |
|
||||
|
||||
|
||||
## 7. 改进路线图
|
||||
|
||||
### v2.0(当前)— 通用 Pipeline
|
||||
- ✅ EXIF 方向校正
|
||||
- ✅ 4 策略 ROI 检测
|
||||
- ✅ 4 向暴力旋转 + 评分
|
||||
- ✅ Tesseract / SSOCR / PaddleOCR 三选一
|
||||
- ✅ 数值合理性校验
|
||||
- ✅ CSV 汇总输出
|
||||
|
||||
### v2.1 — 兜底机制
|
||||
- [ ] 低置信度自动转人工:置信度 <60 时输出标记
|
||||
- [ ] 多 OCR 后端投票(Tesseract + PaddleOCR 投票)
|
||||
- [ ] 单位上下文推断:根据数字范围反推(如 50–100 多半是 kg,120+ 多半是斤)
|
||||
|
||||
### v2.2 — 专项优化
|
||||
- [ ] **App 截图分支**:检测到 App UI 元素时走专用路径(直接定位"目标数字"色块)
|
||||
- [ ] **秤品牌识别**:用图中 logo(mi、SENSSUN、HUAWEI、LEFU、HCS 等)辅助选择 OCR 参数
|
||||
- [ ] **超分辨率增强**:粉色秤等小显示器先 4× 超分再识别
|
||||
|
||||
### v3.0 — 深度学习
|
||||
- [ ] 用现有 122 张 + 持续累积构建训练集(目标 ≥1000 张)
|
||||
- [ ] Fine-tune 7 段专用 CNN(参考 [Seven-Segment-OCR](https://github.com/SachaIZADI/Seven-Segment-OCR))
|
||||
- [ ] 端到端 YOLO 检测+识别一体模型(参考 Goodfellow 多位数字识别论文)
|
||||
|
||||
### v4.0 — 工程化
|
||||
- [ ] 提供 HTTP API(FastAPI)
|
||||
- [ ] 提供 Web UI 标注 + 校验工具
|
||||
- [ ] 主动学习:被人工修正过的样本自动加入训练集
|
||||
|
||||
|
||||
## 8. 测试与验证
|
||||
|
||||
### 8.1 当前测试集
|
||||
- 122 张样本图(位于 `未识别/`,已按"识别结果.jpg"命名作为 ground truth)
|
||||
- 真值来源:人工 + 多模态模型双校
|
||||
- **注意**:约 10 张倒置图的真值仍可能有误差,需后续校对
|
||||
|
||||
### 8.2 验证流程
|
||||
```bash
|
||||
# 1. 跑识别
|
||||
python preprocess_v2.py --backend tesseract
|
||||
|
||||
# 2. 对照真值(文件名即真值)
|
||||
python evaluate.py # 待实现:从文件名提取真值,与 _results.csv 比对
|
||||
```
|
||||
|
||||
### 8.3 验收标准(v2.0)
|
||||
- Top-1 准确率 ≥ 70%
|
||||
- 数字准确率 ≥ 80%
|
||||
- ±5% 容忍准确率 ≥ 85%
|
||||
- 拒识率 ≤ 20%(剩余转人工)
|
||||
|
||||
|
||||
## 9. 依赖
|
||||
|
||||
### 必装
|
||||
```bash
|
||||
pip install opencv-python pillow pytesseract imutils numpy
|
||||
```
|
||||
|
||||
### 系统依赖(Linux)
|
||||
```bash
|
||||
sudo apt install tesseract-ocr tesseract-ocr-chi-sim
|
||||
# 可选:下载 LCD 专用字模
|
||||
wget https://github.com/Shreeshrii/tessdata_ssd/raw/master/ssd.traineddata
|
||||
sudo mv ssd.traineddata /usr/share/tesseract-ocr/4.00/tessdata/
|
||||
```
|
||||
|
||||
### 可选(更强后端)
|
||||
```bash
|
||||
pip install paddlepaddle paddleocr # PaddleOCR
|
||||
sudo apt install ssocr # 7段专用 OCR(Debian/Ubuntu 自带)
|
||||
```
|
||||
|
||||
|
||||
## 10. 参考资料
|
||||
|
||||
### 论文
|
||||
- Goodfellow et al., [Multi-digit Number Recognition from Street View Imagery using Deep CNNs](https://arxiv.org/abs/1312.6082)
|
||||
|
||||
### 开源项目
|
||||
- [SSOCR](https://www.unix-ag.uni-kl.de/~auerswal/ssocr/) — 专用 7段 OCR(C)
|
||||
- [tessdata_ssd](https://github.com/Shreeshrii/tessdata_ssd) — Tesseract 7段字模
|
||||
- [LCD-OCR](https://github.com/DevashishPrasad/LCD-OCR) — Tesseract + OpenCV 流水线
|
||||
- [Seven-Segment-OCR](https://github.com/SachaIZADI/Seven-Segment-OCR) — CNN/MNIST 风格
|
||||
- [seven-segment-ocr](https://github.com/renjithsasidharan/seven-segment-ocr) — TensorFlow Lite
|
||||
|
||||
### 工程文章
|
||||
- [PyImageSearch — Correcting Text Orientation with Tesseract](https://pyimagesearch.com/2022/01/31/correcting-text-orientation-with-tesseract-and-python/)
|
||||
- [Medium — Correct image orientation using Pytesseract + Imutils](https://medium.com/@birenmer/correct-image-orientation-using-python-pytesseract-and-imutils-5a524d386857)
|
||||
|
||||
|
||||
## 11. 维护与协作
|
||||
|
||||
### 11.1 文档维护
|
||||
- 每次方案变更需在 §7 路线图打勾,并写明版本号
|
||||
- 新增已知问题加入 §6 已知局限表
|
||||
- 重大架构调整需更新 §3 Pipeline 图
|
||||
|
||||
### 11.2 代码维护
|
||||
- `preprocess.py` 保留作为 v1 基准(不再改动)
|
||||
- 新增功能在 `preprocess_v2.py` 上演进,重大版本切到 `preprocess_v3.py`
|
||||
- 实验性功能放 `experiments/` 子目录
|
||||
|
||||
### 11.3 数据维护
|
||||
- 新增的真值样本放 `未识别/` 目录,文件名即真值
|
||||
- 误识别的样本单独标记,作为难例集合(后续训练用)
|
||||
Reference in New Issue
Block a user