修改: "2026 Ubuntu \345\256\211\350\243\205.md"

新文件:   "Docker\346\212\200\346\234\257.md"
	新文件:   "GIT\346\212\200\346\234\257.md"
	新文件:   "Linux\346\212\200\346\234\257.md"
	新文件:   "Python\346\212\200\346\234\257.md"
	新文件:   hermes-agent.md
	新文件:   "\345\244\207\344\273\275\344\270\216\346\201\242\345\244\215.md"
	新文件:   "\347\224\265\345\255\220\347\247\244\345\233\276\347\211\207\350\257\206\345\210\253\347\263\273\347\273\237\350\256\276\350\256\241.md"
This commit is contained in:
2026-05-26 14:00:03 +08:00
parent e4078070c8
commit 832a3db830
8 changed files with 1410 additions and 88 deletions
+87 -88
View File
@@ -1,3 +1,88 @@
20260525
下载 https://cn.ubuntu.com/download 最新 Ubuntu 桌面系统 26.04 LTS
sudo umount /dev/sdX*
sudo dd if=/path/to/image.iso of=/dev/sdX bs=4M status=progress
sync 验证数据同步完成
## 软链接
```bash
touch ~/资源/@/@000000/2026.******.txt
ln -s ~/资源/@/@000000/2026.******.txt ~ 建立软链接
```
## 微信 Linux 版
https://linux.weixin.qq.com/
WeChatLinux_x86_64.deb 下载 deb包
sudo dpkg -i WeChatLinux_x86_64.deb
## 邮箱
网易邮箱大师
https://dashi.163.com/
sudo dpkg -i mail.deb
sudo apt-get install -f 修复依赖包缺失
## 安装 Typora
[下载]( https://release-assets.githubusercontent.com/github-production-release-asset/387719913/8fc9123c-8ab6-42c4-beb8-51aed8a90380?sp=r&sv=2018-11-09&sr=b&spr=https&se=2026-02-23T12%3A07%3A58Z&rscd=attachment%3B+filename%3DTypora_Linux_0.11.18_amd64.deb&rsct=application%2Foctet-stream&skoid=96c2d410-5711-43a1-aedd-ab1947aa7ab0&sktid=398a6654-997b-47e9-b12b-9515b896b4de&skt=2026-02-23T11%3A07%3A11Z&ske=2026-02-23T12%3A07%3A58Z&sks=b&skv=2018-11-09&sig=WkCLwB1oETFZhzyEfwv%2Bg6rgXToyxVjTEoHs%2FhNv5Vk%3D&jwt=eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmVsZWFzZS1hc3NldHMuZ2l0aHVidXNlcmNvbnRlbnQuY29tIiwia2V5Ijoia2V5MSIsImV4cCI6MTc3MTg0ODIwNCwibmJmIjoxNzcxODQ2NDA0LCJwYXRoIjoicmVsZWFzZWFzc2V0cHJvZHVjdGlvbi5ibG9iLmNvcmUud2luZG93cy5uZXQifQ.t0qa0XCAmHMMaI1riV8M9-VMB4v932b0Uhb-5QJtbeM&response-content-disposition=attachment%3B%20filename%3DTypora_Linux_0.11.18_amd64.deb&response-content-type=application%2Foctet-stream) 0.11.18 或安装 /home/zimy/Tdisk/deb安装备份/Typora_Linux_0.11.18_amd64.deb
```sh
sudo apt install ./Typora_Linux_0.11.18_amd64.deb
```
sudo apt install pandoc
安装主题 zeus 和 Tailwind
## 安装 VPN
https://ikuuu.win/user/tutorial?os=linux&client=ikuuu-vpn
sudo dpkg -i ikuuu_vpn-0.16.4-b3a29b9a-linux-amd64.deb
sudo apt-get install -f 修复依赖包缺失
ikuuuvpn 打开客户端
### DNS
```
resolvectl status 查看当前真实使用的 DNS
sudo nano /etc/systemd/resolved.conf
增加
DNS=1.1.1.1 8.8.8.8
FallbackDNS=1.0.0.1 8.8.4.4
sudo systemctl restart systemd-resolved 重启服务
```
## 便签
安装短裤便签
```bash
$ sudo apt install flatpak 安装 flatpak 应用商店
$ flatpak remote-add --if-not-exists flathub https://flathub.org/repo/flathub.flatpakrepo 添加 flathub 的远程仓库
$ flatpak install flathub io.github.ellie_commons.jorts
```
##
sudo apt upgrade
其他包管理工具
@@ -42,41 +127,16 @@ https://extensions.gnome.org/extension/261/kimpanel/
https://blog.csdn.net/KaminZzz/article/details/160000087?spm=1001.2101.3001.6650.2&utm_medium=distribute.pc_relevant.none-task-blog-2%7Edefault%7EYuanLiJiHua%7EPosition-2-160000087-blog-158711659.235%5Ev43%5Epc_blog_bottom_relevance_base5&depth_1-utm_source=distribute.pc_relevant.none-task-blog-2%7Edefault%7EYuanLiJiHua%7EPosition-2-160000087-blog-158711659.235%5Ev43%5Epc_blog_bottom_relevance_base5&utm_relevant_index=5
## 安装微信
4.1.0 下载 https://dldir1v6.qq.com/weixin/Universal/Linux/WeChatLinux_x86_64.deb
sudo dpkg -i WeChatLinux_x86_64.deb
## 安装 Typora
[下载]( https://release-assets.githubusercontent.com/github-production-release-asset/387719913/8fc9123c-8ab6-42c4-beb8-51aed8a90380?sp=r&sv=2018-11-09&sr=b&spr=https&se=2026-02-23T12%3A07%3A58Z&rscd=attachment%3B+filename%3DTypora_Linux_0.11.18_amd64.deb&rsct=application%2Foctet-stream&skoid=96c2d410-5711-43a1-aedd-ab1947aa7ab0&sktid=398a6654-997b-47e9-b12b-9515b896b4de&skt=2026-02-23T11%3A07%3A11Z&ske=2026-02-23T12%3A07%3A58Z&sks=b&skv=2018-11-09&sig=WkCLwB1oETFZhzyEfwv%2Bg6rgXToyxVjTEoHs%2FhNv5Vk%3D&jwt=eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmVsZWFzZS1hc3NldHMuZ2l0aHVidXNlcmNvbnRlbnQuY29tIiwia2V5Ijoia2V5MSIsImV4cCI6MTc3MTg0ODIwNCwibmJmIjoxNzcxODQ2NDA0LCJwYXRoIjoicmVsZWFzZWFzc2V0cHJvZHVjdGlvbi5ibG9iLmNvcmUud2luZG93cy5uZXQifQ.t0qa0XCAmHMMaI1riV8M9-VMB4v932b0Uhb-5QJtbeM&response-content-disposition=attachment%3B%20filename%3DTypora_Linux_0.11.18_amd64.deb&response-content-type=application%2Foctet-stream) 0.11.18 或安装 /home/zimy/Tdisk/deb安装备份/Typora_Linux_0.11.18_amd64.deb
```sh
sudo apt install ./Typora_Linux_0.11.18_amd64.deb
```
## 安装 Dove
看 /home/zimy/Tdisk/deb安装备份/dove
## 安装 chrome
wget https://dl.google.com/linux/direct/google-chrome-stable_current_amd64.deb
## 安装 office
卸载现有版本
sudo apt remove --purge libreoffice*
sudo apt autoremove
更新包列表
sudo apt update
安装中文语言包和核心组件
sudo apt install libreoffice-l10n-zh-cn libreoffice-help-zh-cnz在
安装核心程序、Writer、Calc、Impress
sudo apt install libreoffice-core libreoffice-writer libreoffice-calc libreoffice-impress
## 网页版AI应用封装为桌面应用
```
# 需先安装 Node.js 和 npm
@@ -132,16 +192,6 @@ https://atrustcdn.sangfor.com/standard/linux/2.5.16.20/ubuntu/amd64/aTrustInstal
sudo dpkg -i aTrustInstaller_amd64.deb
## 便签
安装短裤便签
```bash
$ sudo apt install flatpak 安装 flatpak 应用商店
$ flatpak remote-add --if-not-exists flathub https://flathub.org/repo/flathub.flatpakrepo 添加 flathub 的远程仓库
$ flatpak install flathub io.github.ellie_commons.jorts
```
## 修改拼音的键盘布局
将键盘布局改为标准美式,避免“死键”行为:
@@ -150,61 +200,10 @@ $ flatpak install flathub io.github.ellie_commons.jorts
gsettings set org.gnome.desktop.input-sources sources "[('xkb', 'us'), ('ibus', 'libpinyin')]"
```
## 邮箱
网易邮箱大师
```
cd /home/zimy/Tdisk/deb安装备份
sudo dpkg -i mail.deb
sudo apt-get install -f 修复依赖包缺失
```
## 开发环境
### DNS & Docker
```
resolvectl status 查看当前真实使用的 DNS
sudo nano /etc/systemd/resolved.conf
增加
DNS=1.1.1.1 8.8.8.8
FallbackDNS=1.0.0.1 8.8.4.4
sudo systemctl restart systemd-resolved 重启服务
```
安装 Docker
```
# 添加 Docker 官方 GPG 密钥
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo gpg --dearmor -o /usr/share/keyrings/docker-archive-keyring.gpg
# 添加 Docker 仓库
echo "deb [arch=$(dpkg --print-architecture) signed-by=/usr/share/keyrings/docker-archive-keyring.gpg] https://download.docker.com/linux/ubuntu $(lsb_release -cs) stable" | sudo tee /etc/apt/sources.list.d/docker.list > /dev/null
# 安装 Docker
sudo apt update
sudo apt install docker-ce docker-ce-cli containerd.io
# 验证
$ docker version 29.2.1
$ docker compose version v5.1.0
```
sudo vim /etc/docker/daemon.json
```
{
"registry-mirrors": [
"https://docker.1ms.run",
"https://docker-0.unsee.tech",
"https://docker.m.daocloud.io"
],
"live-restore": true,
"features": { "buildkit": true }
}
```
##
### python
+84
View File
@@ -0,0 +1,84 @@
# Docker相关
## 安装
```
# 添加 Docker 官方 GPG 密钥
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo gpg --dearmor -o /usr/share/keyrings/docker-archive-keyring.gpg
# 添加 Docker 仓库
echo "deb [arch=$(dpkg --print-architecture) signed-by=/usr/share/keyrings/docker-archive-keyring.gpg] https://download.docker.com/linux/ubuntu $(lsb_release -cs) stable" | sudo tee /etc/apt/sources.list.d/docker.list > /dev/null
# 安装 Docker
sudo apt update
sudo apt install docker-ce docker-ce-cli containerd.io
# 验证
$ docker version 29.5.2
$ docker compose version v5.1.4
```
sudo vim /etc/docker/daemon.json
```
{
"registry-mirrors": [
"https://docker.1ms.run",
"https://docker-0.unsee.tech",
"https://docker.m.daocloud.io"
],
"live-restore": true,
"features": { "buildkit": true }
}
```
### 将当前用户加入 docker 组
这是最推荐的方法,加入后无需每次使用 `sudo`
```
# 1. 将当前用户加入 docker 组
sudo usermod -aG docker $USER
# 2. 刷新组权限(或者完全退出终端并重新登录)
newgrp docker
```
## 使用
```
Docker相关:
给当前用户增加 docker 运行权限
sudo usermod -aG docker $USER
newgrp docker
SaaS镜像发布:submit- 然后远程 docker load -i my*.tar 重启容器
docker compose logs coze-server --tail 10 | grep -i "error" 查错
docker logs container_name | grep -C 20 "error" 查错误日志前后 20 行内的内容
docker compose exec -it ? /bin/sh 或 /bin/bash 进入容器执行命令
docker image prune -a 删除所有未使用的镜像 docker rmi $(docker images -q) 删除所有镜像
删除所有容器 docker stop $(docker ps -q) ; docker rm $(docker ps -q)
doker pull coleifer/sqlite-web:latest 拉镜像
docker save coleifer/sqlite-web:latest -o sqlite_web_latest.tar 保存镜像,上传
docker load -i sqlite_web_latest.tar 服务器加载镜像
-
清空日志:
docker compose logs --no-log-prefix > /dev/null 2>&1 && truncate -s 0 $(docker inspect --format='{{.LogPath}}' $(docker compose ps -q))
```
- 重启所有服务:`docker compose restart`
- 只重启网关服务:`docker compose restart hermes`
- 查看所有服务实时日志:`docker compose logs -f`
- 升级到最新镜像版本:`docker compose pull && docker compose up -d`
- 停止所有服务:`docker compose down`
启动指定生产环境 docker compose -f docker-compose.prod.yml ps
+185
View File
@@ -0,0 +1,185 @@
# GIT 功能应用
## 安装
```bash
sudo apt install git
sudo apt install gitk git-cola 图形化工具
全局设置
git config --global user.email "zimyx@.com"
git config --global user.name "Your Name"
```
公司 http://117.78.60.236:8000/users/sign_in zhanghonggang@csbr.cn w~7
## 分支管理规范
分支类型与命名规范
### 分支命名格式
| 分支类型 | 命名格式 | 示例 |
|---------|---------|------|
| 主分支 | `main` | `main` |
| 开发分支 | `dev` | `dev` |
| 功能分支 | `feat/<功能名>` | `feat/event`, `feat/stream` |
| 热修复分支 | `hotfix/<问题名>` | `hotfix/event` |
| 发布分支 | `release/<版本号>` | `release/v1.0.0` |
### 分支用途说明
| 分支类型 | 用途说明 |
|---------|---------|
| `main` | 稳定版本分支,直接对应生产环境代码 |
| `dev` | 开发主分支,所有功能集成分支先合并到此分支 |
| `feat/*` | 功能开发分支,用于开发新功能 |
| `hotfix/*` | 热修复分支,用于紧急修复生产问题 |
| `release/*` | 发布分支,用于版本发布准备 |
### 分支流程规则
#### 核心规则:dev 禁止合并到 main
**重要约束:`dev` 分支** **禁止直接合并到 `main` 分支**
- `dev` 是开发分支,代码未经充分测试,不适合直接发布到生产
- 所有合并到 `main` 的代码必须通过 `release/*` 分支或直接从 `hotfix/*` 分支合并
#### 标准合并流程
```
功能开发流程:
feat/* → dev → release/* → main
热修复流程:
hotfix/* → main (紧急情况可直接合并)
hotfix/* → dev (同步修复到开发分支)
```
#### 各分支合并规则
| 从分支 | 合并到 | 是否允许 | 说明 |
|-------|-------|---------|------|
| `feat/*` | `dev` | ✅ 允许 | 功能开发完成后合并到开发分支 |
| `dev` | `main` | ❌ 禁止 | dev 禁止直接合并到 main |
| `dev` | `release/*` | ✅ 允许 | 开发分支可以合并到发布分支 |
| `release/*` | `main` | ✅ 允许 | 发布分支通过测试后合并到 main |
| `release/*` | `dev` | ✅ 允许 | 发布分支可同步回开发分支 |
| `hotfix/*` | `main` | ✅ 允许 | 热修复可直接合并到 main |
| `hotfix/*` | `dev` | ✅ 允许 | 热修复需同步到开发分支 |
| `feat/*` | `main` | ❌ 禁止 | 功能分支禁止直接合并到 main |
| `feat/*` | 其他 `feat/*` | ❌ 禁止 | 功能分支之间禁止互相合并 |
### 当前项目分支结构
```
本地分支:
- dev (开发主分支,当前工作分支)
- feat/event (事件功能分支)
- feat/stream (流功能分支)
- hotfix/event (事件热修复分支)
- main (生产环境分支)
远程分支:
- origin/main
- origin/dev
- origin/feat/event
```
### Git 操作指南
#### 创建功能分支
```bash
# 从 main 创建功能分支
git checkout main
git pull origin main
git checkout -b feat/your-feature-name
```
#### 合并功能到 dev
```bash
# 完成功能开发后,切换到 dev 分支
git checkout dev
git pull origin dev
# 合并功能分支
git merge feat/your-feature-name
# 推送 dev 分支
git push origin dev
```
#### 创建发布分支
```bash
# 从 dev 创建发布分支
git checkout dev
git pull origin dev
git checkout -b release/v1.0.0
```
#### 发布到 main
```bash
# 在 release 分支完成测试后,合并到 main
git checkout main
git pull origin main
git merge release/v1.0.0
git push origin main
# 删除发布分支
git branch -d release/v1.0.0
git push origin --delete release/v1.0.0
```
#### 热修复操作
```bash
# 从 main 创建热修复分支
git checkout main
git pull origin main
git checkout -b hotfix/issue-description
# 修复完成后,合并到 main
git checkout main
git merge hotfix/issue-description
git push origin main
# 同时合并到 dev
git checkout dev
git merge hotfix/issue-description
git push origin dev
# 删除热修复分支
git branch -d hotfix/issue-description
```
```
git add .
git commit -m "更新代码"
git push
```
+44
View File
@@ -0,0 +1,44 @@
ss -tunlp | grep -E '9090|789.' 查看 9090 和 789* 端口
find / -name '*.php' 查询根目录下所有php后缀文件
for file in *.docx; do pandoc "$file" -o "${file%.docx}.md"; done 批量转换 docx 到 md
pdf2docx convert a.pdf a.docx 转换 pdf 到 docx
ps aux --sort=-%cpu | head -n 6 CPU占用Top5 free -h 内存占用
openssl rand -base64 12 生成长度为12的随机密码
curl 4.ipw.cn 本机上网出口 IPv4 查询, 4可改6
Watch -n 1 'ls -l' 每隔1秒执行目录查询
tar -cvf coze-studio.tar coze-studio 创建tar压缩包
tar -xvf coze-studio.tar 解压
映射远程目录 Nautilus (GNOME Files) 的“其他位置”输入 sftp://1.95.127.205
sudo apt update
sudo apt upgrade
===================================================快捷键
Alt + F2 运行应用程序 Copilot(AI键) 打开命令行终端 cmd
WIN +Alt + → / ← 工作台间切换 WIN + ` 在同一个应用程序的窗口之间切换
Shift + 小键盘05 0截屏 1目录 2 firefox 3 邮箱 4Typora 5 记事 . 摄像头
Emoji 符号 通过 Firefox 插件输入
```
VIM 快捷键
yy -> p(粘贴) 拷贝当前行 nyy -> p(粘贴) 拷贝下n行
dd 删除当前行 u 撤销 5 -> Shift+g 定位到第5行
/xx ->n(下一个) 查找文本。上一个按‘N’
G 文末 gg 首行 set nu 设置行号
==========================================================
```
==============
安装 ufw sudo pacman -S ufw 防火墙
sudo ufw enable 启用 ufw
sudo ufw default deny incoming 设置默认策略为拒绝所有传入和传出连接
sudo ufw default deny outgoing
允许特定的服务和端口。例如,允许 SSH 连接:sudo ufw allow ssh 或者:sudo ufw allow 22/tcp 最后,检查 ufw 的状态:sudo ufw status
+21
View File
@@ -0,0 +1,21 @@
```
PYTHON
apt install python3-pip python3-dev 安装包管理器、编译环境包 (3.12.3)
python3 -m venv .venv 为对应目录创建虚拟环境
source .venv/bin/activate 激活后终端会显示
Python 使用 pyenv 多版本管理
$ curl https://pyenv.run | bash 安装pyenv
echo 'export PYENV_ROOT="$HOME/.pyenv"' >> ~/.bashrc
echo '[[ -d $PYENV_ROOT/bin ]] && export PATH="$PYENV_ROOT/bin:$PATH"' >> ~/.bashrc
echo 'eval "$(pyenv init - bash)"' >> ~/.bashrc
source .bashrc
pyenv install 3.10.0 安装 Python 3.10
pyenv global 3.10.0 设置全局 Python 版本
用 Python 3.10 创建虚拟环境, 进入项目目录
python3.10 -m venv .venv_py310
source .venv_py310/bin/activate
```
+616
View File
@@ -0,0 +1,616 @@
# Hermes Agent 0.9.0
https://hermes-agent.nousresearch.com/docs/user-guide/docker?_highlight=docker#docker-compose-example
1.95.126.170
## 部署
/www/hermes/docker-compose.yml
```yml
services:
hermes:
image: nousresearch/hermes-agent:latest
container_name: hermes
restart: unless-stopped
command: gateway run
ports:
- "8642:8642"
volumes:
- ~/.hermes:/opt/data
networks:
- hermes-net
# Uncomment to forward specific env vars instead of using .env file:
# environment:
# - ANTHROPIC_API_KEY=${ANTHROPIC_API_KEY}
# - OPENAI_API_KEY=${OPENAI_API_KEY}
# - TELEGRAM_BOT_TOKEN=${TELEGRAM_BOT_TOKEN}
deploy:
resources:
limits:
memory: 4G
cpus: "2.0"
dashboard:
image: nousresearch/hermes-agent:latest
container_name: hermes-dashboard
restart: unless-stopped
command: dashboard --host 0.0.0.0 --insecure
ports:
- "9119:9119"
volumes:
- ~/.hermes:/opt/data
environment:
- GATEWAY_HEALTH_URL=http://hermes:8642
networks:
- hermes-net
depends_on:
- hermes
deploy:
resources:
limits:
memory: 512M
cpus: "0.5"
networks:
hermes-net:
driver: bridge
```
## 配置
cat ~/.hermes/.env 配置仪表盘、模型、消息平台
```ini
# Hermes Agent Environment Configuration
# Copy this file to .env and fill in your API keys
# =============================================================================
# LLM PROVIDER (OpenRouter)
# =============================================================================
# OpenRouter provides access to many models through one API
# All LLM calls go through OpenRouter - no direct provider keys needed
# Get your key at: https://openrouter.ai/keys
# OPENROUTER_API_KEY=
# Default model is configured in ~/.hermes/config.yaml (model.default).
# Use 'hermes model' or 'hermes setup' to change it.
# LLM_MODEL is no longer read from .env — this line is kept for reference only.
# LLM_MODEL=anthropic/claude-opus-4.6
# =============================================================================
# LLM PROVIDER (Google AI Studio / Gemini)
# =============================================================================
# Native Gemini API via Google's OpenAI-compatible endpoint.
# Get your key at: https://aistudio.google.com/app/apikey
# GOOGLE_API_KEY=your_google_ai_studio_key_here
# GEMINI_API_KEY=your_gemini_key_here # alias for GOOGLE_API_KEY
# Optional base URL override (default: Google's OpenAI-compatible endpoint)
# GEMINI_BASE_URL=https://generativelanguage.googleapis.com/v1beta/openai
# =============================================================================
# LLM PROVIDER (z.ai / GLM)
# =============================================================================
# z.ai provides access to ZhipuAI GLM models (GLM-4-Plus, etc.)
# Get your key at: https://z.ai or https://open.bigmodel.cn
# GLM_API_KEY=
# GLM_BASE_URL=https://api.z.ai/api/paas/v4 # Override default base URL
# =============================================================================
# LLM PROVIDER (Kimi / Moonshot)
# =============================================================================
# Kimi Code provides access to Moonshot AI coding models (kimi-k2.5, etc.)
# Get your key at: https://platform.kimi.ai (Kimi Code console)
# Keys prefixed sk-kimi- use the Kimi Code API (api.kimi.com) by default.
# Legacy keys from platform.moonshot.ai need KIMI_BASE_URL override below.
# KIMI_API_KEY=
# KIMI_BASE_URL=https://api.kimi.com/coding/v1 # Default for sk-kimi- keys
# KIMI_BASE_URL=https://api.moonshot.ai/v1 # For legacy Moonshot keys
# KIMI_BASE_URL=https://api.moonshot.cn/v1 # For Moonshot China keys
# KIMI_CN_API_KEY= # Dedicated Moonshot China key
# =============================================================================
# LLM PROVIDER (Arcee AI)
# =============================================================================
# Arcee AI provides access to Trinity models (trinity-mini, trinity-large-*)
# Get an Arcee key at: https://chat.arcee.ai/
# ARCEEAI_API_KEY=
# ARCEE_BASE_URL= # Override default base URL
# =============================================================================
# LLM PROVIDER (MiniMax)
# =============================================================================
# MiniMax provides access to MiniMax models (global endpoint)
# Get your key at: https://www.minimax.io
# MINIMAX_API_KEY=
# MINIMAX_BASE_URL=https://api.minimax.io/v1 # Override default base URL
# MiniMax China endpoint (for users in mainland China)
# MINIMAX_CN_API_KEY=
# MINIMAX_CN_BASE_URL=https://api.minimaxi.com/v1 # Override default base URL
# =============================================================================
# LLM PROVIDER (OpenCode Zen)
# =============================================================================
# OpenCode Zen provides curated, tested models (GPT, Claude, Gemini, MiniMax, GLM, Kimi)
# Pay-as-you-go pricing. Get your key at: https://opencode.ai/auth
# OPENCODE_ZEN_API_KEY=
# OPENCODE_ZEN_BASE_URL=https://opencode.ai/zen/v1 # Override default base URL
A
# =============================================================================
# LLM PROVIDER (OpenCode Go)
# =============================================================================
# OpenCode Go provides access to open models (GLM-5, Kimi K2.5, MiniMax M2.5)
# A
# $10/month subscription. Get your key at: https://opencode.ai/auth
# OPENCODE_GO_API_KEY=
# =============================================================================
# LLM PROVIDER (Hugging Face Inference Providers)
# =============================================================================
# Hugging Face routes to 20+ open models via unified OpenAI-compatible endpoint.
# Free tier included ($0.10/month), no markup on provider rates.
# Get your token at: https://huggingface.co/settings/tokens
# Required permission: "Make calls to Inference Providers"
# HF_TOKEN=
# OPENCODE_GO_BASE_URL=https://opencode.ai/zen/go/v1 # Override default base URL
# =============================================================================
# LLM PROVIDER (Qwen OAuth)
# =============================================================================
# Qwen OAuth reuses your local Qwen CLI login (qwen auth qwen-oauth).
# No API key needed — credentials come from ~/.qwen/oauth_creds.json.
# Optional base URL override:
# HERMES_QWEN_BASE_URL=https://portal.qwen.ai/v1
# =============================================================================
# LLM PROVIDER (Xiaomi MiMo)
# =============================================================================
# Xiaomi MiMo models (mimo-v2-pro, mimo-v2-omni, mimo-v2-flash).
# Get your key at: https://platform.xiaomimimo.com
# XIAOMI_API_KEY=your_key_here
# Optional base URL override:
# XIAOMI_BASE_URL=https://api.xiaomimimo.com/v1
# =============================================================================
# TOOL API KEYS
# =============================================================================
# Exa API Key - AI-native web search and contents
# Get at: https://exa.ai
# EXA_API_KEY=
# Parallel API Key - AI-native web search and extract
# Get at: https://parallel.ai
# PARALLEL_API_KEY=
# Firecrawl API Key - Web search, extract, and crawl
# Get at: https://firecrawl.dev/
# FIRECRAWL_API_KEY=
# FAL.ai API Key - Image generation
# Get at: https://fal.ai/
# FAL_KEY=
# Honcho - Cross-session AI-native user modeling (optional)
# Builds a persistent understanding of the user across sessions and tools.
# Get at: https://app.honcho.dev
# Also requires ~/.honcho/config.json with enabled=true (see README).
# HONCHO_API_KEY=
# =============================================================================
# TERMINAL TOOL CONFIGURATION
# =============================================================================
# Backend type: "local", "singularity", "docker", "modal", or "ssh"
# Terminal backend is configured in ~/.hermes/config.yaml (terminal.backend).
# Use 'hermes setup' or 'hermes config set terminal.backend docker' to change.
# Supported: local, docker, singularity, modal, ssh
#
# Only override here if you need to force a backend without touching config.yaml:
# TERMINAL_ENV=local
# Override the container runtime binary (e.g. to use Podman instead of Docker).
# Useful on systems where Docker's storage driver is broken or unavailable.
# HERMES_DOCKER_BINARY=/usr/local/bin/podman
# Container images (for singularity/docker/modal backends)
# TERMINAL_DOCKER_IMAGE=nikolaik/python-nodejs:python3.11-nodejs20
# TERMINAL_SINGULARITY_IMAGE=docker://nikolaik/python-nodejs:python3.11-nodejs20
TERMINAL_MODAL_IMAGE=nikolaik/python-nodejs:python3.11-nodejs20
# Working directory for terminal commands
# For local backend: "." means current directory (resolved automatically)
# For remote backends (ssh/docker/modal/singularity): use an absolute path
# INSIDE the target environment, or leave unset for the backend's default
# (/root for modal, / for docker, ~ for ssh). Do NOT use a host-local path.
# Usually managed by config.yaml (terminal.cwd) — uncomment to override
# TERMINAL_CWD=.
# Default command timeout in seconds
TERMINAL_TIMEOUT=60
# Cleanup inactive environments after this many seconds
TERMINAL_LIFETIME_SECONDS=300
# =============================================================================
# SSH REMOTE EXECUTION (for TERMINAL_ENV=ssh)
# =============================================================================
# Run terminal commands on a remote server via SSH.
# Agent code stays on your machine, commands execute remotely.
#
# SECURITY BENEFITS:
# - Agent cannot read your .env file (API keys protected)
# - Agent cannot modify its own code
# - Remote server acts as isolated sandbox
# - Can safely configure passwordless sudo on remote
#
# TERMINAL_SSH_HOST=192.168.1.100
# TERMINAL_SSH_USER=agent
# TERMINAL_SSH_PORT=22
# TERMINAL_SSH_KEY=~/.ssh/id_rsa
# =============================================================================
# SUDO SUPPORT (works with ALL terminal backends)
# =============================================================================
# If set, enables sudo commands by piping password via `sudo -S`.
# Works with: local, docker, singularity, modal, and ssh backends.
#
# SECURITY WARNING: Password stored in plaintext. Only use on trusted machines.
#
# ALTERNATIVES:
# - For SSH backend: Configure passwordless sudo on the remote server
# - For containers: Run as root inside the container (no sudo needed)
# - For local: Configure /etc/sudoers for specific commands
# - For CLI: Leave unset - you'll be prompted interactively with 45s timeout
#
# SUDO_PASSWORD=your_password_here
# =============================================================================
# MODAL CLOUD BACKEND (Optional - for TERMINAL_ENV=modal)
# =============================================================================
# Modal uses CLI authentication, not environment variables.
# Run: pip install modal && modal setup
# This will authenticate via browser and store credentials locally.
# No API key needed in .env - Modal handles auth automatically.
# =============================================================================
# BROWSER TOOL CONFIGURATION (agent-browser + Browserbase)
# =============================================================================
# Browser automation requires Browserbase cloud service for remote browser execution.
# This allows the agent to navigate websites, fill forms, and extract information.
#
# STEALTH MODES:
# - Basic Stealth: ALWAYS active (random fingerprints, auto CAPTCHA solving)
# - Advanced Stealth: Requires BROWSERBASE_ADVANCED_STEALTH=true (Scale Plan only)
# Browserbase API Key - Cloud browser execution
# Get at: https://browserbase.com/
# BROWSERBASE_API_KEY=
# Browserbase Project ID - From your Browserbase dashboard
# BROWSERBASE_PROJECT_ID=
# Enable residential proxies for better CAPTCHA solving (default: true)
# Routes traffic through residential IPs, significantly improves success rate
BROWSERBASE_PROXIES=true
# Enable advanced stealth mode (default: false, requires Scale Plan)
# Uses custom Chromium build to avoid bot detection altogether
BROWSERBASE_ADVANCED_STEALTH=false
# Browser session timeout in seconds (default: 300)
# Sessions are cleaned up after this duration of inactivity
BROWSER_SESSION_TIMEOUT=300
# Browser inactivity timeout - auto-cleanup inactive sessions (default: 120 = 2 min)
# Browser sessions are automatically closed after this period of no activity
BROWSER_INACTIVITY_TIMEOUT=120
# =============================================================================
# SESSION LOGGING
# =============================================================================
# Session trajectories are automatically saved to logs/ directory
# Format: logs/session_YYYYMMDD_HHMMSS_UUID.json
# Contains full conversation history in trajectory format for debugging/replay
# =============================================================================
# VOICE TRANSCRIPTION & OPENAI TTS
# =============================================================================
# Required for voice message transcription (Whisper) and OpenAI TTS voices.
# Uses OpenAI's API directly (not via OpenRouter).
# Named VOICE_TOOLS_OPENAI_KEY to avoid interference with OpenRouter.
# Get at: https://platform.openai.com/api-keys
# VOICE_TOOLS_OPENAI_KEY=
# =============================================================================
# SLACK INTEGRATION
# =============================================================================
# Slack Bot Token - From Slack App settings (OAuth & Permissions)
# Get at: https://api.slack.com/apps
# SLACK_BOT_TOKEN=xoxb-...
# Slack App Token - For Socket Mode (App-Level Tokens in Slack App settings)
# SLACK_APP_TOKEN=xapp-...
# Slack allowed users (comma-separated Slack user IDs)
# SLACK_ALLOWED_USERS=
# =============================================================================
# TELEGRAM INTEGRATION
# =============================================================================
# Telegram Bot Token - From @BotFather (https://t.me/BotFather)
# TELEGRAM_BOT_TOKEN=
# TELEGRAM_ALLOWED_USERS= # Comma-separated user IDs
# TELEGRAM_HOME_CHANNEL= # Default chat for cron delivery
# TELEGRAM_HOME_CHANNEL_NAME= # Display name for home channel
# Webhook mode (optional — for cloud deployments like Fly.io/Railway)
# Default is long polling. Setting TELEGRAM_WEBHOOK_URL switches to webhook mode.
# TELEGRAM_WEBHOOK_URL=https://my-app.fly.dev/telegram
# TELEGRAM_WEBHOOK_PORT=8443
# TELEGRAM_WEBHOOK_SECRET= # Recommended for production
# WhatsApp (built-in Baileys bridge — run `hermes whatsapp` to pair)
# WHATSAPP_ENABLED=false
# WHATSAPP_ALLOWED_USERS=15551234567
# Email (IMAP/SMTP — send and receive emails as Hermes)
# For Gmail: enable 2FA → create App Password at https://myaccount.google.com/apppasswords
# EMAIL_ADDRESS=hermes@gmail.com
# EMAIL_PASSWORD=xxxx xxxx xxxx xxxx
# EMAIL_IMAP_HOST=imap.gmail.com
# EMAIL_IMAP_PORT=993
# EMAIL_SMTP_HOST=smtp.gmail.com
# EMAIL_SMTP_PORT=587
# EMAIL_POLL_INTERVAL=15
# EMAIL_ALLOWED_USERS=your@email.com
# EMAIL_HOME_ADDRESS=your@email.com
# Gateway-wide: allow ALL users without an allowlist (default: false = deny)
# Only set to true if you intentionally want open access.
# GATEWAY_ALLOW_ALL_USERS=false
# =============================================================================
# RESPONSE PACING
# =============================================================================
# Human-like delays between message chunks on messaging platforms.
# Makes the bot feel less robotic.
# HERMES_HUMAN_DELAY_MODE=off # off | natural | custom
# HERMES_HUMAN_DELAY_MIN_MS=800 # Min delay in ms (custom mode)
# HERMES_HUMAN_DELAY_MAX_MS=2500 # Max delay in ms (custom mode)
# =============================================================================
# DEBUG OPTIONS
# =============================================================================
WEB_TOOLS_DEBUG=false
VISION_TOOLS_DEBUG=false
MOA_TOOLS_DEBUG=false
IMAGE_TOOLS_DEBUG=false
# =============================================================================
# CONTEXT COMPRESSION (Auto-shrinks long conversations)
# =============================================================================
# When conversation approaches model's context limit, middle turns are
# automatically summarized to free up space.
#
# Context compression is configured in ~/.hermes/config.yaml under compression:
# CONTEXT_COMPRESSION_ENABLED=true # Enable auto-compression (default: true)
# CONTEXT_COMPRESSION_THRESHOLD=0.85 # Compress at 85% of context limit
# Model is set via compression.summary_model in config.yaml (default: google/gemini-3-flash-preview)
# =============================================================================
# RL TRAINING (Tinker + Atropos)
# =============================================================================
# Run reinforcement learning training on language models using the Tinker API.
# Requires the rl-server to be running (from tinker-atropos package).
# Tinker API Key - RL training service
# Get at: https://tinker-console.thinkingmachines.ai/keys
# TINKER_API_KEY=
# Weights & Biases API Key - Experiment tracking and metrics
# Get at: https://wandb.ai/authorize
# WANDB_API_KEY=
# RL API Server URL (default: http://localhost:8080)
# Change if running the rl-server on a different host/port
# RL_API_URL=http://localhost:8080
# =============================================================================
# SKILLS HUB (GitHub integration for skill search/install/publish)
# =============================================================================
# GitHub Personal Access Token — for higher API rate limits on skill search/install
# Get at: https://github.com/settings/tokens (Fine-grained recommended)
# GITHUB_TOKEN=ghp_xxxxxxxxxxxxxxxxxxxx
# GitHub App credentials (optional — for bot identity on PRs)
# GITHUB_APP_ID=
# GITHUB_APP_PRIVATE_KEY_PATH=
# GITHUB_APP_INSTALLATION_ID=
# Groq API key (free tier — used for Whisper STT in voice mode)
# GROQ_API_KEY=
# =============================================================================
# STT PROVIDER SELECTION
# =============================================================================
# Default STT provider is "local" (faster-whisper) — runs on your machine, no API key needed.
# Install with: pip install faster-whisper
# Model downloads automatically on first use (~150 MB for "base").
# To use cloud providers instead, set GROQ_API_KEY or VOICE_TOOLS_OPENAI_KEY above.
# Provider priority: local > groq > openai
# Configure in config.yaml: stt.provider: local | groq | openai
# =============================================================================
# STT ADVANCED OVERRIDES (optional)
# =============================================================================
# Override default STT models per provider (normally set via stt.model in config.yaml)
# STT_GROQ_MODEL=whisper-large-v3-turbo
# STT_OPENAI_MODEL=whisper-1
# Override STT provider endpoints (for proxies or self-hosted instances)
# GROQ_BASE_URL=https://api.groq.com/openai/v1
# STT_OPENAI_BASE_URL=https://api.openai.com/v1
# 基本配置
GATEWAY_ALLOW_ALL_USERS=true
# Dashboard 认证
DASHBOARD_USERNAME=admin
DASHBOARD_PASSWORD=your_secure_password_here # 改成你的密码
# 时区
TZ=Asia/Shanghai
ARCEEAI_API_KEY=dec57933-c3e4-4f4a-a114-b0ae8104e396
ARCEE_BASE_URL=https://ark.cn-beijing.volces.com/api/coding/v3
WECOM_BOT_ID=aib1TzEwFCq0ia1...qlDLt2Fv
WECOM_SECRET=1U12qLqTdV......ZRIt0ts
GATEWAY_ALLOW_ALL_USERS=true
WECOM_ALLOWED_USERS=LiuRan,ZhangHongGang,WangJian
WECOM_ENABLE_ATTACHMENTS=true
```
### 飞书
**飞书开发者后台**
1. 登录[飞书开放平台](https://open.feishu.cn/app?lang=zh-CN),创建**企业自建应用**
2. `App ID` cli_a968...9cb6 和 `App Secret` Q8ykzs7vJdhnda.....qpxi1yfAJIUQH
3. 进入「添加应用能力」→ **机器人** →允许接收私聊、允许接收群聊
4. 进入「权限管理」
- `im:message:send_as_bot`(机器人发消息)
- `im:message:readonly`(读消息)
- `im:message.group_at_msg:readonly`(群 @
- `im:message.p2p_msg:readonly`(私聊)
- `contact:user.employee_id:readonly`(读用户 ID
5. 事件订阅(WebSocket 模式)
- 左侧:开发配置 → 事件与回调
- 接收方式:**使用长连接(WebSocket**
- 订阅事件:**`im.message.receive_v1`**(接收消息)
- 保存
6. 版本发布**→ 发布**(企业自建应用直接可用)
**服务器:安装飞书依赖**
```bash
# 进入 Hermes 环境(根据你部署方式)
cd /path/to/hermes
# 安装飞书 SDK
uv pip install lark-oapi websockets
# 或用 pip(无 uv
pip install lark-oapi websockets
```
### 企业微信
1. 打开企业微信管理后台,创建机器人:https://work.weixin.qq.com/wework_admin/frame#/aiHelper/create 选择 API 方式创建 **CS Agent**
- 连接方式: 使用长连接
- Bot ID aib1T ...t2Fv
- Secret 1U12qLqTdV pd8kKBrcpdk9pZRIt0ts
2. 可使用权限
- 用户信息
- 文档
## 运行
```
cd /www/hermes/
docker run -it --rm -v ~/.hermes:/opt/data nousresearch/hermes-agent
```
我想让你通过学习成为医学减重领域的个案管理师,怎么样
我可以按照「权威知识库搭建→适配你的工作规范→实操迭代优化」三步来完成学习,完全适配你的工作需求:
### 第一步:先搭建权威的医学减重知识基底(我可以自主完成)
我会优先检索学习国内国际公认的权威指南作为底层知识框架,包括:
- 《中国超重/肥胖医学营养治疗指南(2023版)》
- WHO成人肥胖预防与管理临床指南
- 肥胖合并高血压/糖尿病/多囊卵巢综合征等并发症的个体化干预规范
- 医学减重个案管理师执业标准、随访规范、风险预警指征 所有核心知识我会保存到持久化记忆,使用时严格遵循规范不会出错。
### 第二步:对齐你的专属工作要求(需要你提供相关资料)
你可以把你常用的工作资料发给我,我会全部学习固化:
1. 你的机构内部工作规范、减重方案模板、个案评估表/随访记录表模板
2. 过往你处理过的优质个案案例(可以隐去隐私信息)
3. 你需要我输出的报告格式、需要遵守的特殊要求 我会把这些规则做成专属技能,后续处理任务100%符合你的工作标准。
### 第三步:实操迭代优化
你可以先拿少量测试个案让我尝试处理,你给出反馈后我会快速修正优化,几轮之后就能达到你要求的处理精度。
### 学完后我可以帮你完成的工作:
✅ 个案初筛:自动根据用户的体检数据、基础病史做减重风险评估,判断适应症/禁忌症 ✅ 方案生成:根据个体情况生成个性化的饮食/运动/行为干预方案,特殊合并症人群自动适配特殊要求 ✅ 随访管理:自动生成随访计划、随访话术,可配置定时任务自动给你推送当日待跟进个案清单 ✅ 数据统计:自动统计个案的减重达标率、依从率、不良反应发生率等数据,生成汇总报表
你看这个学习路径是否符合你的预期?有没有需要我优先学习的资料或者重点侧重的功能?
## mini 机器
https://github.com/NousResearch/hermes-agent
```
第一步:创建虚拟环境(彻底避开系统限制)
bash
运行
cd ~/hermes-agent-main
python3 -m venv venv
source venv/bin/activate
第二步:安装所有依赖(不会再报错!)
bash
运行
pip install -r requirements.txt
第三步:运行项目
bash
运行
./hermes
为什么这个方法一定成功?
之前报错:因为你直接用系统 Python,系统包不允许被 pip 修改 / 卸载
现在方法:创建了独立干净的 Python 环境,没有任何权限冲突
所有包都能正常安装,不会再出现任何报错
以后每次使用前,先激活环境
每次新开终端,只需要先运行这一行:
bash
运行
source ~/hermes-agent-main/venv/bin/activate
然后再运行:
bash
运行
./hermes
总结
执行这三行,彻底解决所有问题:
bash
运行
cd ~/hermes-agent-main
python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt
./hermes
再也不会有 ModuleNotFound / 无法卸载 / 权限报错 等问题
这是 Linux 上运行 Python 项目的标准正确方法
```
+102
View File
@@ -0,0 +1,102 @@
导出 firefox 书签
备份 typora 的主题
```
==============
备份
步骤1:安全移除并重新连接
# 1. 确保设备未挂载
sudo umount /dev/sda1 2>/dev/null
# 2. 安全移除
sudo udisksctl power-off -b /dev/sda
# 3. 等待 5 秒,重新插拔 USB
sleep 5
# (手动重新插拔 USB 线)
步骤2:验证 UAS 已禁用
# 查看 USB 设备模式
sudo dmesg | tail -20
# 应该看到 "usb-storage" 而不是 "uas"
# 或查看 SCSI 主机
sudo lsscsi -t
# 应该显示 "usb: " 而不是 "uas: "
步骤3:重新分区和格式化
# 1. 创建 GPT 分区表
sudo parted /dev/sda --script mklabel gpt
# 2. 创建单个分区
sudo parted /dev/sda --script mkpart primary 0% 100%
# 3. 等待分区识别
sleep 2
sudo partprobe /dev/sda
# 4. 格式化为 ext4(带优化)
sudo mkfs.ext4 -F -L "BACKUP_DISK" -m 0 /dev/sda1
# 5. 检查结果
sudo blkid /dev/sda1
sudo fsck -f /dev/sda1
步骤4:优化挂载参数
# 1. 创建挂载点
sudo mkdir -p /mnt/backup
# 2. 挂载(使用稳定参数)
sudo mount -o noatime,nodiratime,errors=remount-ro /dev/sda1 /mnt/backup
# 3. 验证
df -h /mnt/backup
mount | grep sda1
步骤5:使用稳定的备份策略
# 1. 创建备份脚本
cat > /tmp/backup.sh << 'EOF'
#!/bin/bash
LOG="/tmp/backup_$(date +%Y%m%d_%H%M%S).log"
{
echo "=== 备份开始: $(date) ==="
# 分阶段备份,从大目录开始
echo "阶段1: 备份主目录"
rsync -av --no-owner --no-group --timeout=180 \
--exclude='/.cache/' \
--exclude='/.local/share/Trash/' \
/home/ /mnt/backup/
echo "阶段2: 备份缓存目录(排除浏览器大缓存)"
rsync -av --no-owner --no-group --timeout=180 \
--exclude='chromium/Default/Cache/' \
--exclude='google-chrome/Default/Cache/' \
--exclude='*/.cache/yay/pkg/' \
/home/*/.cache/ /mnt/backup/*/.cache/
echo "=== 备份完成: $(date) ==="
} > "$LOG" 2>&1 &
EOF
# 2. 执行备份
sudo bash /tmp/backup.sh
# 3. 查看日志
tail -f /tmp/backup_*.log
```
+271
View File
@@ -0,0 +1,271 @@
# 电子秤图片识别系统设计文档
> 版本:v1.0
> 最后更新:2026-05-20
> 状态:草案(待迭代)
## 1. 项目背景
### 1.1 业务场景
用户拍摄电子秤显示器图片,需自动提取**读数(数字 + 单位)**用于数据录入或核对。
图片来源包括家用体重秤、台秤、智能手机 App 截图等。
### 1.2 现状
- 现有脚本 `preprocess.py` 仅完成 ROI 裁剪 + 对比度增强,无识别能力
- 122 张样本通过人工 + 多模态模型识别,准确率约 80%,主要误差来源见 §2
### 1.3 目标
- **短期**:将自动识别准确率提升至 **≥85%**,剩余由人工核对
- **中期**:达到 **≥95%**,仅极端模糊/异常显示需要人工
- **长期**:支持任意品牌秤的型号自动识别 + 单位歧义消除
## 2. 问题分析(基于 122 张样本)
### 2.1 误差来源分布
| 误差类型 | 占比 | 典型样例 | 严重性 |
|---|---|---|---|
| 拍摄方向倒置 | ~25% | 倒着读"119.35"识别为反向 | 🔴 高 |
| ROI 裁切过紧 | ~12% | 只剩"6"丢失其余数字 | 🔴 高 |
| 7段数码管字符混淆 | ~16% | `7↔L``2↔5``6↔9` | 🟡 中 |
| 显示器反光/模糊 | ~8% | 粉色秤小屏幕 | 🟡 中 |
| 单位歧义(kg/斤) | ~5% | 小米秤同显示两种模式 | 🟢 低 |
| 显示异常(启动/出错) | ~2% | 段笔画不完整 | 🟢 低 |
### 2.2 设备类型分布
| 设备类型 | 占比 | 显示特征 | 难点 |
|---|---|---|---|
| 蓝背光 LCD 体重秤 | ~40% | 黑字蓝底 | 反光、自适应阈值难 |
| LED 数码管秤(橙/白) | ~20% | 高对比度发光数字 | 易倒置拍摄 |
| 香山台秤(多屏) | ~10% | 黑底白字三段 LCD | 多窗口干扰 |
| App 截图 | ~25% | 高清,多字段 | 需要定位"主体重"字段 |
| 其他(粉色、HUAWEI、LEFU 等) | ~5% | 各异 | 长尾问题 |
## 3. 技术方案
### 3.1 整体 Pipeline
```
┌─────────┐ ┌──────────┐ ┌────────┐ ┌────────┐ ┌────────┐ ┌────────┐
│ 原始图片 │ → │ EXIF校正 │ → │ ROI检测 │ → │ 方向校正│ → │ 增强 │ → │ OCR识别 │
└─────────┘ └──────────┘ └────────┘ └────────┘ └────────┘ └────────┘
┌──────────────────┐
│ 数值+单位解析 │
│ 合理性校验 │
└──────────────────┘
┌──────────────────┐
│ 输出:值/单位/置信│
└──────────────────┘
```
### 3.2 各阶段技术选型
#### Step 1 — EXIF 方向校正
**目的**:手机拍摄的 JPG 自带 Orientation 元数据,可零成本转正。
**方案**`PIL.ExifTags` 读取 Orientation 标签,按 3/6/8 三种 case 旋转。
**预期收益**:解决约 30% 的方向问题,几乎无成本。
#### Step 2 — ROI 检测(屏幕区域定位)
**目的**:从全图裁出显示器矩形,降低 OCR 干扰。
**方案**4 策略加权打分:
- **策略 A**:LED 高亮 + 低饱和(白/黄数码管)— boost ×1.3
- **策略 B**:蓝色背光 LCDHSV: H∈[85,140]
- **策略 C**:白色/浅色 LCDV>180, S<60
- **策略 D**:Canny 边缘 + 横向膨胀(用于黑底白字台秤)— boost ×0.8
**打分维度**:面积比、宽高比、对比度、亮度。
**风险**:当前 margin 15% 偏紧,新版调至 20%;仍需观察。
#### Step 3 — 方向校正(4 向暴力 + 评分)
**目的**:解决倒置/侧拍。
**评分项**
1. **形状先验**:显示器一般 `w>h`(横向) → +1
2. **7段重力特征**:水平投影下 1/3 行 ≥ 上 1/3 行 → +1(数字朝上的典型分布)
3. **Tesseract OSD**:若返回 `rotate=0` → +2(强证据)
**选择**:取 0°/90°/180°/270° 中评分最高的方向。
**优势**:即使 OSD 失败,仍有 7段特征兜底。
#### Step 4 — 图像增强
**目的**:让 OCR 识别 7 段数码管。
**关键步骤**
1. CLAHE(自适应直方图均衡化)
2. 极性自动判断:若图像均值<127 视为暗背景,反转
3. 自适应阈值(Gaussian, blockSize=25
4. **关键**:2×2 核膨胀 1 次 — 连接 7 段间隙,否则 Tesseract 会把一个数字切成多字符
#### Step 5 — OCR 识别(多后端)
| 后端 | 优势 | 劣势 | 推荐场景 |
|---|---|---|---|
| **Tesseract** | 易部署、有 LCD 字模 | 7段需调参 | 默认 |
| **SSOCR** | 专为 7段设计 | 仅命令行、需手编译 | 干净 LED 显示 |
| **PaddleOCR** | 中文+数字皆强 | 体积大 | App 截图 |
| **自训 CNN** | 准确率天花板高 | 需训练数据 | 长期投入 |
**当前默认**TesseractPSM=7(单行),whitelist 限制为数字 + 单位字符。
#### Step 6 — 数值合理性校验
**正则提取**`(\d{1,3}\.?\d{0,2})\s*(kg|KG|公斤|千克|斤|两|lb)?`
**合理范围**
- kg / 公斤 / 千克:10200
- 斤 / 两:20400
- lb20440
**作用**:拦截 OCR 典型错误(如把 `1` 错认 `7` 得到 170kg 时直接 reject)。
## 4. 文件结构
```
新建文件夹 1/
├── 未识别/ # 待处理原图
├── 预处理/ # 旧版输出(v1)
├── 预处理v2/ # 新版输出
│ ├── *.png # 各图的二值化预处理结果
│ └── _results.csv # 识别汇总(文件名/旋转/OCR/置信度/解析值)
├── preprocess_v2.py # 新版(预处理 + 识别 + 校验)
└── 设计文档.md # 本文档
```
## 5. 评估指标
### 5.1 准确率指标
- **Top-1 准确率**:单次识别完全正确(数字 + 单位都对)的比例
- **数字准确率**:仅数字正确(容忍单位错误)
- **±5% 容忍准确率**:识别值与真值相对误差 ≤5%(避免小数位识别错带来的负面感受)
### 5.2 召回率 / 拒识率
- **拒识率**:合理性校验后输出"无效"的比例(高拒识率好过错误识别)
- **置信度阈值**:建议 Tesseract conf > 60 才接受,否则降级人工
### 5.3 性能
- 单图处理时间 < 1s(CPU 单线程)
- 122 张全量处理 < 2 分钟
## 6. 已知局限与待解决
| # | 问题 | 当前对策 | 长期改进 |
|---|---|---|---|
| L1 | 粉色秤显示器尺寸过小(占图 <1%) | margin 放宽 | 改用超分辨率(ESRGAN)预处理 |
| L2 | 启动画面/异常显示误识 | 合理性校验拦截 | 加一个"是否合法读数"的分类器 |
| L3 | 香山台秤三屏,只需重量栏 | ROI 取最大 LCD | 训练专用检测器,按字段标签裁切 |
| L4 | 小米秤 kg/斤 模式无法从显示判断 | 默认按区间猜单位 | 加品牌+型号识别(CNN 或 logo 匹配) |
| L5 | OCR 把"6/9"反向识别 | 4向暴力旋转 | 训练朝向检测器 |
| L6 | App 截图字段繁多 | 暂用通用 OCR + 选最大数字 | 训练 App 截图专用 layout 模型 |
## 7. 改进路线图
### v2.0(当前)— 通用 Pipeline
- ✅ EXIF 方向校正
- ✅ 4 策略 ROI 检测
- ✅ 4 向暴力旋转 + 评分
- ✅ Tesseract / SSOCR / PaddleOCR 三选一
- ✅ 数值合理性校验
- ✅ CSV 汇总输出
### v2.1 — 兜底机制
- [ ] 低置信度自动转人工:置信度 <60 时输出标记
- [ ] 多 OCR 后端投票(Tesseract + PaddleOCR 投票)
- [ ] 单位上下文推断:根据数字范围反推(如 50–100 多半是 kg,120+ 多半是斤)
### v2.2 — 专项优化
- [ ] **App 截图分支**:检测到 App UI 元素时走专用路径(直接定位"目标数字"色块)
- [ ] **秤品牌识别**:用图中 logomi、SENSSUN、HUAWEI、LEFU、HCS 等)辅助选择 OCR 参数
- [ ] **超分辨率增强**:粉色秤等小显示器先 4× 超分再识别
### v3.0 — 深度学习
- [ ] 用现有 122 张 + 持续累积构建训练集(目标 ≥1000 张)
- [ ] Fine-tune 7 段专用 CNN(参考 [Seven-Segment-OCR](https://github.com/SachaIZADI/Seven-Segment-OCR)
- [ ] 端到端 YOLO 检测+识别一体模型(参考 Goodfellow 多位数字识别论文)
### v4.0 — 工程化
- [ ] 提供 HTTP APIFastAPI
- [ ] 提供 Web UI 标注 + 校验工具
- [ ] 主动学习:被人工修正过的样本自动加入训练集
## 8. 测试与验证
### 8.1 当前测试集
- 122 张样本图(位于 `未识别/`,已按"识别结果.jpg"命名作为 ground truth
- 真值来源:人工 + 多模态模型双校
- **注意**:约 10 张倒置图的真值仍可能有误差,需后续校对
### 8.2 验证流程
```bash
# 1. 跑识别
python preprocess_v2.py --backend tesseract
# 2. 对照真值(文件名即真值)
python evaluate.py # 待实现:从文件名提取真值,与 _results.csv 比对
```
### 8.3 验收标准(v2.0
- Top-1 准确率 ≥ 70%
- 数字准确率 ≥ 80%
- ±5% 容忍准确率 ≥ 85%
- 拒识率 ≤ 20%(剩余转人工)
## 9. 依赖
### 必装
```bash
pip install opencv-python pillow pytesseract imutils numpy
```
### 系统依赖(Linux
```bash
sudo apt install tesseract-ocr tesseract-ocr-chi-sim
# 可选:下载 LCD 专用字模
wget https://github.com/Shreeshrii/tessdata_ssd/raw/master/ssd.traineddata
sudo mv ssd.traineddata /usr/share/tesseract-ocr/4.00/tessdata/
```
### 可选(更强后端)
```bash
pip install paddlepaddle paddleocr # PaddleOCR
sudo apt install ssocr # 7段专用 OCRDebian/Ubuntu 自带)
```
## 10. 参考资料
### 论文
- Goodfellow et al., [Multi-digit Number Recognition from Street View Imagery using Deep CNNs](https://arxiv.org/abs/1312.6082)
### 开源项目
- [SSOCR](https://www.unix-ag.uni-kl.de/~auerswal/ssocr/) — 专用 7段 OCRC
- [tessdata_ssd](https://github.com/Shreeshrii/tessdata_ssd) — Tesseract 7段字模
- [LCD-OCR](https://github.com/DevashishPrasad/LCD-OCR) — Tesseract + OpenCV 流水线
- [Seven-Segment-OCR](https://github.com/SachaIZADI/Seven-Segment-OCR) — CNN/MNIST 风格
- [seven-segment-ocr](https://github.com/renjithsasidharan/seven-segment-ocr) — TensorFlow Lite
### 工程文章
- [PyImageSearch — Correcting Text Orientation with Tesseract](https://pyimagesearch.com/2022/01/31/correcting-text-orientation-with-tesseract-and-python/)
- [Medium — Correct image orientation using Pytesseract + Imutils](https://medium.com/@birenmer/correct-image-orientation-using-python-pytesseract-and-imutils-5a524d386857)
## 11. 维护与协作
### 11.1 文档维护
- 每次方案变更需在 §7 路线图打勾,并写明版本号
- 新增已知问题加入 §6 已知局限表
- 重大架构调整需更新 §3 Pipeline 图
### 11.2 代码维护
- `preprocess.py` 保留作为 v1 基准(不再改动)
- 新增功能在 `preprocess_v2.py` 上演进,重大版本切到 `preprocess_v3.py`
- 实验性功能放 `experiments/` 子目录
### 11.3 数据维护
- 新增的真值样本放 `未识别/` 目录,文件名即真值
- 误识别的样本单独标记,作为难例集合(后续训练用)