512 lines
12 KiB
Markdown
512 lines
12 KiB
Markdown
|
|
# 架构分析方法论
|
|||
|
|
|
|||
|
|
本文档提供代码库架构分析的核心方法论和最佳实践。
|
|||
|
|
|
|||
|
|
## 目录
|
|||
|
|
|
|||
|
|
- [1. 子系统边界识别](#1-子系统边界识别)
|
|||
|
|
- [2. 工作流模式识别](#2-工作流模式识别)
|
|||
|
|
- [3. 通信机制识别](#3-通信机制识别)
|
|||
|
|
- [4. Mermaid 图表选择策略](#4-mermaid-图表选择策略)
|
|||
|
|
- [5. 业务函数识别规则](#5-业务函数识别规则)
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 1. 子系统边界识别
|
|||
|
|
|
|||
|
|
### 规则 1: 目录命名模式
|
|||
|
|
|
|||
|
|
通过目录名称识别子系统:
|
|||
|
|
|
|||
|
|
| 目录模式 | 子系统类型 | 常见技术栈 |
|
|||
|
|
|---------|-----------|-----------|
|
|||
|
|
| `frontend/`, `ui/`, `web/`, `client/` | 前端子系统 | React, Vue, Angular |
|
|||
|
|
| `backend/`, `api/`, `server/` | 后端子系统 | FastAPI, Express, Django |
|
|||
|
|
| `agent/`, `agents/`, `workers/` | Agent 系统 | LangGraph, CrewAI, Celery |
|
|||
|
|
| `services/` + 多个子目录 | 微服务集群 | Spring Boot, Go microservices |
|
|||
|
|
| `database/`, `models/`, `schema/` | 数据层 | SQLAlchemy, Prisma |
|
|||
|
|
| `shared/`, `common/`, `lib/` | 共享库 | 工具函数、类型定义 |
|
|||
|
|
|
|||
|
|
### 规则 2: 独立配置文件识别
|
|||
|
|
|
|||
|
|
每个子系统通常有独立的配置文件:
|
|||
|
|
|
|||
|
|
```
|
|||
|
|
frontend/
|
|||
|
|
├── package.json → 独立的 Node.js 项目
|
|||
|
|
├── tsconfig.json
|
|||
|
|
└── vite.config.js
|
|||
|
|
|
|||
|
|
backend/
|
|||
|
|
├── requirements.txt → 独立的 Python 项目
|
|||
|
|
├── pyproject.toml
|
|||
|
|
└── Dockerfile
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
**识别方法**:
|
|||
|
|
```bash
|
|||
|
|
# 查找独立的依赖文件
|
|||
|
|
find . -name "package.json" -o -name "requirements.txt" -o -name "go.mod"
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
### 规则 3: Docker Compose 服务划分
|
|||
|
|
|
|||
|
|
分析 `docker-compose.yml` 识别服务边界:
|
|||
|
|
|
|||
|
|
```yaml
|
|||
|
|
services:
|
|||
|
|
frontend: # 前端服务
|
|||
|
|
build: ./frontend
|
|||
|
|
api: # API 服务
|
|||
|
|
build: ./backend
|
|||
|
|
worker: # 后台任务
|
|||
|
|
build: ./worker
|
|||
|
|
db: # 数据库
|
|||
|
|
image: postgres:15
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
每个 service 通常对应一个独立的子系统。
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 2. 工作流模式识别
|
|||
|
|
|
|||
|
|
### 模式 1: Multi-Agent 编排
|
|||
|
|
|
|||
|
|
**特征**:
|
|||
|
|
- 导入 `langgraph`, `crewai`, `autogen`
|
|||
|
|
- 存在 `StateGraph()`, `Crew()`, `Sequential()` 调用
|
|||
|
|
- 使用 `add_node()`, `add_edge()` 方法
|
|||
|
|
|
|||
|
|
**提取方法**:
|
|||
|
|
|
|||
|
|
1. **定位编排定义**
|
|||
|
|
```bash
|
|||
|
|
grep -r "StateGraph\|Crew\|AutoGen" --include="*.py"
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
2. **提取 Agent 节点**
|
|||
|
|
```python
|
|||
|
|
# 示例代码
|
|||
|
|
graph = StateGraph(AgentState)
|
|||
|
|
graph.add_node("researcher", research_agent) # Agent 1
|
|||
|
|
graph.add_node("writer", writing_agent) # Agent 2
|
|||
|
|
graph.add_node("reviewer", review_agent) # Agent 3
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
3. **分析状态流转**
|
|||
|
|
```python
|
|||
|
|
graph.add_edge("researcher", "writer") # 顺序流转
|
|||
|
|
graph.add_conditional_edges("writer", should_revise, {
|
|||
|
|
"revise": "researcher", # 条件分支
|
|||
|
|
"finish": END
|
|||
|
|
})
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
**生成图表类型**: `stateDiagram-v2`
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
### 模式 2: 数据管道(ETL/Pipeline)
|
|||
|
|
|
|||
|
|
**特征**:
|
|||
|
|
- 目录名包含 `pipeline`, `etl`, `workflow`
|
|||
|
|
- 函数名包含 `extract`, `transform`, `load`
|
|||
|
|
- 使用 Airflow, Prefect, Dagster 框架
|
|||
|
|
|
|||
|
|
**提取方法**:
|
|||
|
|
|
|||
|
|
1. **识别管道阶段**
|
|||
|
|
```python
|
|||
|
|
# 示例代码
|
|||
|
|
def run_pipeline(source):
|
|||
|
|
raw = extract(source) # 阶段 1
|
|||
|
|
clean = transform(raw) # 阶段 2
|
|||
|
|
load(clean) # 阶段 3
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
2. **追踪数据转换**
|
|||
|
|
- 输入:CSV 文件
|
|||
|
|
- → DataFrame (pandas)
|
|||
|
|
- → 清洗后 DataFrame
|
|||
|
|
- → PostgreSQL 表
|
|||
|
|
|
|||
|
|
**生成图表类型**: `flowchart LR` (从左到右)
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
### 模式 3: 事件驱动架构
|
|||
|
|
|
|||
|
|
**特征**:
|
|||
|
|
- 使用 `EventEmitter`, `event_bus`, `@subscribe`
|
|||
|
|
- 发布订阅模式:`emit()`, `on()`, `subscribe()`
|
|||
|
|
- 消息队列:RabbitMQ, Kafka, Redis Pub/Sub
|
|||
|
|
|
|||
|
|
**提取方法**:
|
|||
|
|
|
|||
|
|
1. **识别事件定义**
|
|||
|
|
```python
|
|||
|
|
# 发布事件
|
|||
|
|
event_bus.emit("order_created", order)
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
2. **识别订阅者**
|
|||
|
|
```python
|
|||
|
|
@event_bus.on("order_created")
|
|||
|
|
def handle_order(order):
|
|||
|
|
send_confirmation(order)
|
|||
|
|
|
|||
|
|
@event_bus.on("order_created")
|
|||
|
|
def update_inventory(order):
|
|||
|
|
inventory.decrease(order.items)
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
**生成图表类型**: `graph TB` (显示发布-订阅关系)
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
### 模式 4: 微服务调用链
|
|||
|
|
|
|||
|
|
**特征**:
|
|||
|
|
- `docker-compose.yml` 中有多个 services
|
|||
|
|
- 使用 `requests.get()`, `httpx.get()` 调用其他服务
|
|||
|
|
- gRPC 调用:`stub.MethodCall()`
|
|||
|
|
|
|||
|
|
**提取方法**:
|
|||
|
|
|
|||
|
|
1. **识别服务间调用**
|
|||
|
|
```python
|
|||
|
|
# api-gateway 调用其他服务
|
|||
|
|
user = requests.get("http://user-service/users/{id}")
|
|||
|
|
order = requests.post("http://order-service/orders", ...)
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
2. **绘制调用拓扑**
|
|||
|
|
- api-gateway → user-service
|
|||
|
|
- api-gateway → order-service
|
|||
|
|
- order-service → payment-service
|
|||
|
|
|
|||
|
|
**生成图表类型**: `graph TB` (服务拓扑)
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
### 模式 5: 传统分层架构(MVC)
|
|||
|
|
|
|||
|
|
**特征**:
|
|||
|
|
- 目录结构:`controllers/`, `services/`, `models/`
|
|||
|
|
- 装饰器:`@app.route`, `@Controller`
|
|||
|
|
- ORM 使用:SQLAlchemy, Sequelize
|
|||
|
|
|
|||
|
|
**提取方法**:
|
|||
|
|
|
|||
|
|
1. **识别三层结构**
|
|||
|
|
- Controller: 处理 HTTP 请求
|
|||
|
|
- Service: 业务逻辑
|
|||
|
|
- Model: 数据模型
|
|||
|
|
|
|||
|
|
2. **追踪调用链**
|
|||
|
|
```
|
|||
|
|
routes.py (@app.route)
|
|||
|
|
→ UserService.create()
|
|||
|
|
→ User model
|
|||
|
|
→ Database
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
**生成图表类型**: `graph TD` (分层图)
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 3. 通信机制识别
|
|||
|
|
|
|||
|
|
### HTTP/REST API
|
|||
|
|
|
|||
|
|
**搜索关键词**:
|
|||
|
|
```bash
|
|||
|
|
# 服务端
|
|||
|
|
grep -r "@app\.(get|post|put|delete)" --include="*.py"
|
|||
|
|
grep -r "router\.(get|post)" --include="*.js"
|
|||
|
|
grep -r "@RestController" --include="*.java"
|
|||
|
|
|
|||
|
|
# 客户端
|
|||
|
|
grep -r "axios\.(get|post)" --include="*.js"
|
|||
|
|
grep -r "requests\.(get|post)" --include="*.py"
|
|||
|
|
grep -r "fetch(" --include="*.ts"
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
### WebSocket
|
|||
|
|
|
|||
|
|
**搜索关键词**:
|
|||
|
|
```bash
|
|||
|
|
grep -r "WebSocket\|socket.io\|ws://" --include="*.{py,js,ts}"
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
### 消息队列
|
|||
|
|
|
|||
|
|
**搜索关键词**:
|
|||
|
|
```bash
|
|||
|
|
# RabbitMQ
|
|||
|
|
grep -r "pika\|RabbitMQ" --include="*.py"
|
|||
|
|
|
|||
|
|
# Kafka
|
|||
|
|
grep -r "kafka\|KafkaProducer" --include="*.{py,java}"
|
|||
|
|
|
|||
|
|
# Redis Pub/Sub
|
|||
|
|
grep -r "redis.*publish\|redis.*subscribe" --include="*.py"
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
### gRPC
|
|||
|
|
|
|||
|
|
**搜索关键词**:
|
|||
|
|
```bash
|
|||
|
|
grep -r "grpc\|\.proto" --include="*.{py,go,java}"
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 4. Mermaid 图表选择策略
|
|||
|
|
|
|||
|
|
根据识别到的模式选择合适的图表类型:
|
|||
|
|
|
|||
|
|
| 场景 | 图表类型 | 适用情况 |
|
|||
|
|
|------|---------|---------|
|
|||
|
|
| 状态转换明显(Multi-Agent) | `stateDiagram-v2` | 有明确的状态节点和转换条件 |
|
|||
|
|
| 顺序+条件流程(业务流程) | `flowchart TD` | 顺序执行 + if/else 分支 |
|
|||
|
|
| 数据管道(ETL) | `flowchart LR` | 线性的数据转换流程 |
|
|||
|
|
| 系统拓扑(微服务) | `graph TB` | 多个系统/服务的交互关系 |
|
|||
|
|
| 时序交互(API 调用) | `sequenceDiagram` | 多个参与者的时序消息传递 |
|
|||
|
|
| 并发任务 | `graph TB` + subgraph | 多个并行任务分组 |
|
|||
|
|
|
|||
|
|
### 语法注意事项
|
|||
|
|
|
|||
|
|
**⚠️ Mermaid 语法约束(基于版本 11.x)**:
|
|||
|
|
- **stateDiagram-v2**: 禁用 `--` 分隔符(会报错 "No such shape: divider")
|
|||
|
|
- **stateDiagram-v2**: 不支持 `<br/>` 标签
|
|||
|
|
- **sequenceDiagram**: `alt/loop/par` 块必须正确配对 `end`
|
|||
|
|
- **所有类型**: 避免过深嵌套的控制结构
|
|||
|
|
|
|||
|
|
**stateDiagram-v2 限制**:
|
|||
|
|
- ❌ 不支持 `<br/>` 标签,使用 `note` 块代替
|
|||
|
|
- ❌ 不支持 `--` 分隔符
|
|||
|
|
- ❌ 标签文本避免 `!=`, `==` 等比较运算符
|
|||
|
|
- ✅ 复杂信息放在 `note right/left of NodeName` 块中
|
|||
|
|
|
|||
|
|
**flowchart/graph**:
|
|||
|
|
- ✅ 可使用 `<br/>` 换行
|
|||
|
|
- ✅ 支持富文本标签
|
|||
|
|
- ✅ 使用 `style` 设置节点颜色
|
|||
|
|
|
|||
|
|
### 示例:选择决策树
|
|||
|
|
|
|||
|
|
```
|
|||
|
|
检测到的特征:
|
|||
|
|
├─ 有状态变量 + 转换逻辑?
|
|||
|
|
│ → 是:使用 stateDiagram-v2
|
|||
|
|
│ → 否:继续判断
|
|||
|
|
│
|
|||
|
|
├─ 有多个参与者 + 时序消息?
|
|||
|
|
│ → 是:使用 sequenceDiagram
|
|||
|
|
│ → 否:继续判断
|
|||
|
|
│
|
|||
|
|
├─ 线性数据转换?
|
|||
|
|
│ → 是:使用 flowchart LR
|
|||
|
|
│ → 否:继续判断
|
|||
|
|
│
|
|||
|
|
└─ 默认:使用 graph TB 或 flowchart TD
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 5. 业务函数识别规则
|
|||
|
|
|
|||
|
|
### 排除辅助函数
|
|||
|
|
|
|||
|
|
**规则**:以下函数不算业务逻辑
|
|||
|
|
|
|||
|
|
```python
|
|||
|
|
# 1. 私有函数(下划线开头)
|
|||
|
|
def _internal_helper():
|
|||
|
|
pass
|
|||
|
|
|
|||
|
|
# 2. 工具函数
|
|||
|
|
def format_date(date):
|
|||
|
|
pass
|
|||
|
|
|
|||
|
|
def validate_email(email):
|
|||
|
|
pass
|
|||
|
|
|
|||
|
|
# 3. Getter/Setter
|
|||
|
|
def get_user():
|
|||
|
|
pass
|
|||
|
|
|
|||
|
|
def set_config():
|
|||
|
|
pass
|
|||
|
|
|
|||
|
|
# 4. 过短的函数(< 5 行)
|
|||
|
|
def simple_wrapper():
|
|||
|
|
return another_function()
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
### 识别业务函数
|
|||
|
|
|
|||
|
|
**特征 1:函数名包含业务关键词**
|
|||
|
|
|
|||
|
|
```python
|
|||
|
|
业务动词:
|
|||
|
|
- process_*, handle_*, execute_*, run_*
|
|||
|
|
- create_*, update_*, delete_*
|
|||
|
|
- generate_*, analyze_*, calculate_*
|
|||
|
|
- search_*, fetch_*, query_*
|
|||
|
|
- build_*, transform_*, convert_*
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
**特征 2:处理核心数据模型**
|
|||
|
|
|
|||
|
|
```python
|
|||
|
|
# 函数参数或返回值包含核心数据类
|
|||
|
|
def process_order(order: Order) -> OrderResult: # ✓ 业务函数
|
|||
|
|
...
|
|||
|
|
|
|||
|
|
def format_string(s: str) -> str: # ✗ 工具函数
|
|||
|
|
...
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
**特征 3:函数体较长(> 10 行)**
|
|||
|
|
|
|||
|
|
通常业务逻辑较复杂,函数体较长。
|
|||
|
|
|
|||
|
|
**特征 4:调用外部服务或数据库**
|
|||
|
|
|
|||
|
|
```python
|
|||
|
|
def create_user(data):
|
|||
|
|
user = User(**data)
|
|||
|
|
db.session.add(user) # ✓ 数据库操作
|
|||
|
|
db.session.commit()
|
|||
|
|
send_email(user.email) # ✓ 外部服务
|
|||
|
|
return user
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 6. 调用链追踪深度控制
|
|||
|
|
|
|||
|
|
### 最大深度策略
|
|||
|
|
|
|||
|
|
**问题**:递归追踪可能陷入无限循环
|
|||
|
|
|
|||
|
|
**解决方案**:限制最大深度
|
|||
|
|
|
|||
|
|
```python
|
|||
|
|
MAX_DEPTH = 5 # 最多追踪 5 层
|
|||
|
|
|
|||
|
|
def trace_calls(func_name, current_depth=0):
|
|||
|
|
if current_depth >= MAX_DEPTH:
|
|||
|
|
return
|
|||
|
|
|
|||
|
|
calls = extract_function_calls(func_name)
|
|||
|
|
for call in calls:
|
|||
|
|
trace_calls(call, current_depth + 1)
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
### 广度优先 vs 深度优先
|
|||
|
|
|
|||
|
|
**广度优先(BFS)**:
|
|||
|
|
- 优先展示同一层级的所有函数
|
|||
|
|
- 适合识别并发任务
|
|||
|
|
|
|||
|
|
**深度优先(DFS)**:
|
|||
|
|
- 优先追踪单一调用链
|
|||
|
|
- 适合追踪数据转换流程
|
|||
|
|
|
|||
|
|
**推荐**:对于流程分析,使用 **广度优先**。
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 7. 最佳实践
|
|||
|
|
|
|||
|
|
### ✅ 应该做的
|
|||
|
|
|
|||
|
|
1. **先宽后深** - 先了解整体结构,再深入细节
|
|||
|
|
2. **只读分析** - 不修改任何代码
|
|||
|
|
3. **标注位置** - 所有结论都附带文件:行号
|
|||
|
|
4. **可视化优先** - 用图表表达复杂关系
|
|||
|
|
5. **分阶段输出** - 边分析边展示进度
|
|||
|
|
|
|||
|
|
### ❌ 不应该做的
|
|||
|
|
|
|||
|
|
1. **不要臆测** - 所有结论必须基于代码事实
|
|||
|
|
2. **不要分析 node_modules/** - 跳过第三方依赖
|
|||
|
|
3. **不要暴露敏感信息** - 不读取 `.env`, `secrets.yaml`
|
|||
|
|
4. **不要过度深入** - 不需要分析每一个辅助函数
|
|||
|
|
5. **不要修改代码** - 纯只读分析
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 8. 故障排除
|
|||
|
|
|
|||
|
|
### 问题 1: 找不到入口点
|
|||
|
|
|
|||
|
|
**症状**:无法识别主函数
|
|||
|
|
|
|||
|
|
**解决方案**:
|
|||
|
|
```bash
|
|||
|
|
# 多种方式查找
|
|||
|
|
grep -r "if __name__ == '__main__'" --include="*.py"
|
|||
|
|
grep -r "@app.route\|@app.post" --include="*.py"
|
|||
|
|
grep -r "func main()" --include="*.go"
|
|||
|
|
ls -la | grep -i "main\|index\|app"
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
### 问题 2: 调用链太复杂
|
|||
|
|
|
|||
|
|
**症状**:函数调用超过 100 层
|
|||
|
|
|
|||
|
|
**解决方案**:
|
|||
|
|
- 只追踪业务函数,忽略工具函数
|
|||
|
|
- 限制最大深度为 5
|
|||
|
|
- 聚焦核心路径(最常用的调用链)
|
|||
|
|
|
|||
|
|
### 问题 3: 动态导入无法追踪
|
|||
|
|
|
|||
|
|
**症状**:Python 的 `importlib`, JavaScript 的 `require(variable)`
|
|||
|
|
|
|||
|
|
**解决方案**:
|
|||
|
|
- 结合运行时日志分析
|
|||
|
|
- 搜索字符串模式猜测可能的模块名
|
|||
|
|
- 在报告中标注"动态导入,无法静态分析"
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 9. 工具推荐
|
|||
|
|
|
|||
|
|
### Python 项目
|
|||
|
|
|
|||
|
|
- `ast` 模块 - 解析 Python AST
|
|||
|
|
- `radon` - 复杂度分析
|
|||
|
|
- `pydeps` - 依赖可视化
|
|||
|
|
|
|||
|
|
### JavaScript/TypeScript 项目
|
|||
|
|
|
|||
|
|
- `@typescript-eslint/parser` - 解析 TS AST
|
|||
|
|
- `madge` - 依赖关系图
|
|||
|
|
- `dependency-cruiser` - 依赖验证
|
|||
|
|
|
|||
|
|
### Go 项目
|
|||
|
|
|
|||
|
|
- `go-callvis` - 调用图可视化
|
|||
|
|
- `gocyclo` - 圈复杂度
|
|||
|
|
|
|||
|
|
### 通用工具
|
|||
|
|
|
|||
|
|
- `tree` - 目录结构可视化
|
|||
|
|
- `tokei` - 代码统计
|
|||
|
|
- `cloc` - 代码行数统计
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 10. 参考资源
|
|||
|
|
|
|||
|
|
- **C4 模型**: 系统架构可视化的四层模型
|
|||
|
|
- **UML 类图**: 面向对象系统的标准表示
|
|||
|
|
- **事件风暴**: 领域驱动设计的建模方法
|
|||
|
|
- **架构决策记录 (ADR)**: 记录架构决策的最佳实践
|