Files
AIEC_Skills/codebase_architecture_analyzer_v1/reference/workflow-extraction.md

583 lines
11 KiB
Markdown
Raw Normal View History

2025-11-12 10:27:56 +08:00
# 工作流提取详细指南
本指南提供通用的工作流提取算法,适用于各种架构模式。
## ⚠️ 重要约束
**在调用 Task tool 时,必须在 prompt 开头包含以下约束:**
```
⚠️ 重要约束:本次分析只返回文本结果,禁止生成任何文件(.md, .txt 等)。
所有 Mermaid 图表、清单、分析结论都应包含在你的文本回复中,不要使用 Write 或其他文件创建工具。
```
**Explore agent 只返回文本结果,不要生成任何文件。**
---
## 核心思想
**不预设架构类型,而是从代码中提取执行流程特征。**
---
## 算法概览
```
1. 定位入口点CLI/Web/定时任务)
2. 提取函数调用链(递归追踪)
3. 识别业务函数(排除辅助函数)
4. 分析流程模式(顺序/分支/循环/并发)
5. 生成 Mermaid 图表
```
---
## 步骤 1: 定位入口点
### 1.1 CLI 应用
**Python**:
```bash
grep -rn "if __name__ == '__main__'" --include="*.py"
grep -rn "@click.command\|@click.group" --include="*.py"
grep -rn "argparse.ArgumentParser" --include="*.py"
```
**示例**:
```python
# main.py:45
if __name__ == "__main__":
main() # ← 入口函数
```
**Go**:
```bash
grep -rn "func main()" --include="*.go"
```
---
### 1.2 Web 应用
**FastAPI/Flask**:
```bash
grep -rn "@app\.(get|post|put|delete)" --include="*.py"
grep -rn "@router\." --include="*.py"
```
**示例**:
```python
# routes.py:42
@app.post("/api/research") # ← 入口点
def research_endpoint(query: str):
...
```
**Express.js**:
```bash
grep -rn "app\.(get|post)" --include="*.js"
grep -rn "router\." --include="*.js"
```
---
### 1.3 定时任务
**Celery**:
```bash
grep -rn "@celery.task\|@shared_task" --include="*.py"
```
**Airflow**:
```bash
grep -rn "DAG(" --include="*.py"
```
**Cron/APScheduler**:
```bash
grep -rn "@schedule\|@cron" --include="*.py"
```
---
### 1.4 消息消费者
**RabbitMQ/Kafka**:
```bash
grep -rn "basic_consume\|KafkaConsumer" --include="*.py"
```
**示例**:
```python
# consumer.py:25
def callback(ch, method, properties, body): # ← 入口点
process_message(body)
```
---
## 步骤 2: 提取函数调用链
### 2.1 读取入口函数
从步骤 1 识别的入口点开始:
```python
# 示例routes.py:42
@app.post("/api/research")
def research_endpoint(query: str):
task = create_task(query) # 调用 1
results = executor.run(task) # 调用 2
return format_response(results) # 调用 3
```
**提取调用清单**:
```
research_endpoint
├── create_task
├── executor.run
└── format_response
```
---
### 2.2 递归追踪
读取 `executor.run` 的实现:
```python
# executor.py:78
def run(self, task):
data = self.fetch_data(task) # 调用 2.1
processed = self.process(data) # 调用 2.2
return self.finalize(processed) # 调用 2.3
```
**更新调用树**:
```
research_endpoint
├── create_task
├── executor.run
│ ├── fetch_data
│ ├── process
│ └── finalize
└── format_response
```
---
### 2.3 控制递归深度
**问题**: 可能陷入无限递归
**解决方案**: 限制最大深度
```python
MAX_DEPTH = 5 # 最多追踪 5 层
def trace_calls(func_name, depth=0):
if depth >= MAX_DEPTH:
return []
calls = extract_calls_from_function(func_name)
result = []
for call in calls:
result.append(call)
result.extend(trace_calls(call, depth + 1))
return result
```
---
## 步骤 3: 识别业务函数
### 3.1 排除辅助函数
**规则**:
| 函数特征 | 是否保留 |
|---------|---------|
| 私有函数 `_helper()` | ❌ 排除 |
| 工具函数 `format_date()` | ❌ 排除 |
| Getter/Setter | ❌ 排除 |
| 函数体 < 5 | 排除 |
| 包含业务关键词 | ✅ 保留 |
| 调用数据库/外部 API | ✅ 保留 |
| 处理核心数据模型 | ✅ 保留 |
---
### 3.2 业务关键词清单
```python
BUSINESS_KEYWORDS = [
# 处理动词
"process", "handle", "execute", "run",
# CRUD 操作
"create", "update", "delete", "query", "fetch",
# 业务逻辑
"calculate", "analyze", "generate", "transform",
"search", "filter", "validate", "verify",
# 工作流
"orchestrate", "coordinate", "schedule"
]
```
**示例判断**:
```python
✅ process_order() # 包含 "process"
✅ create_user() # 包含 "create"
✅ analyze_data() # 包含 "analyze"
❌ format_string() # 工具函数
❌ _internal_helper() # 私有函数
❌ get_config() # Getter
```
---
### 3.3 检测数据库/API 调用
**数据库调用**:
```python
def create_user(data):
user = User(**data)
db.session.add(user) # ✅ 数据库操作
db.session.commit()
return user
```
**外部 API 调用**:
```python
def fetch_weather(city):
response = requests.get( # ✅ 外部 API
f"https://api.weather.com/{city}"
)
return response.json()
```
**搜索模式**:
```bash
# 数据库
grep -n "db\.session\|query(\|execute(" file.py
# HTTP 请求
grep -n "requests\.\|httpx\.\|fetch(" file.py
```
---
## 步骤 4: 分析流程模式
### 4.1 顺序流程
**代码特征**:
```python
def process():
step1()
step2()
step3()
```
**识别**: 连续的函数调用,无分支
**生成图表**: `flowchart TD` (从上到下)
---
### 4.2 条件分支
**代码特征**:
```python
def process(data):
if validate(data):
path_a()
else:
path_b()
```
**识别**: `if/else`, `match/case`, 三元运算符
**生成图表**: `flowchart TD` (带菱形决策节点)
---
### 4.3 循环优化
**代码特征**:
```python
def optimize(data):
while not is_good_enough(data):
data = improve(data)
return data
```
**识别**: `while` + 条件判断
**生成图表**: `flowchart TD` (带循环边)
---
### 4.4 状态机
**代码特征**:
```python
state = "init"
if state == "init":
state = "processing"
elif state == "processing":
if condition:
state = "done"
else:
state = "error"
```
**识别**: 状态变量 + 状态转换逻辑
**生成图表**: `stateDiagram-v2`
---
### 4.5 并发编排
**Python asyncio**:
```python
results = await asyncio.gather(
task1(),
task2(),
task3()
)
```
**JavaScript Promise.all**:
```javascript
const results = await Promise.all([
fetchUser(),
fetchOrders(),
fetchProducts()
]);
```
**识别**: `asyncio.gather`, `Promise.all`, `WaitGroup`
**生成图表**: `graph TB` + subgraph (并行任务分组)
---
### 4.6 图编排LangGraph
**代码特征**:
```python
graph = StateGraph(State)
graph.add_node("a", func_a)
graph.add_node("b", func_b)
graph.add_edge("a", "b")
graph.add_conditional_edges("b", router, {
"continue": "a",
"end": END
})
```
**识别**: `add_node`, `add_edge`, `add_conditional_edges`
**生成图表**: `stateDiagram-v2`
---
## 步骤 5: 生成 Mermaid 图表
**⚠️ Mermaid 语法约束(版本 11.x**
- **stateDiagram-v2**: 禁用 `--` 分隔符,不支持 `<br/>`
- **sequenceDiagram**: `alt/loop/par` 块必须正确配对 `end`
- **所有类型**: 使用 `<br/>` 换行stateDiagram-v2 除外)
### 5.1 选择图表类型
```python
def select_diagram_type(flow_pattern):
if flow_pattern == "state_machine":
return "stateDiagram-v2"
elif flow_pattern == "concurrent":
return "graph TB" # 带 subgraph
elif flow_pattern == "sequential_with_conditions":
return "flowchart TD"
elif flow_pattern == "linear_pipeline":
return "flowchart LR"
else:
return "graph TB" # 默认
```
---
### 5.2 生成状态图Multi-Agent
**输入数据**:
```python
nodes = [
{"name": "researcher", "file": "agents/nodes.py", "line": 45},
{"name": "writer", "file": "agents/nodes.py", "line": 78}
]
edges = [
{"from": "researcher", "to": "writer"},
{
"from": "writer",
"to": "researcher",
"condition": "quality < 7"
}
]
```
**生成代码**:
```python
def generate_state_diagram(nodes, edges):
mermaid = "stateDiagram-v2\n"
mermaid += f" [*] --> {nodes[0]['name']}\n"
for edge in edges:
if "condition" in edge:
mermaid += f" {edge['from']} --> {edge['to']}: {edge['condition']}\n"
else:
mermaid += f" {edge['from']} --> {edge['to']}\n"
# 添加注释
for node in nodes:
mermaid += f"\n note right of {node['name']}\n"
mermaid += f" {node['file']}:{node['line']}\n"
mermaid += f" end note\n"
return mermaid
```
---
### 5.3 生成流程图(顺序+分支)
**输入数据**:
```python
steps = [
{"name": "extract", "type": "process"},
{"name": "validate", "type": "decision"},
{"name": "transform", "type": "process"},
{"name": "load", "type": "process"}
]
connections = [
{"from": "extract", "to": "validate"},
{"from": "validate", "to": "transform", "condition": "valid"},
{"from": "validate", "to": "error_handler", "condition": "invalid"}
]
```
**生成代码**:
```python
def generate_flowchart(steps, connections):
mermaid = "flowchart TD\n"
# 定义节点
for step in steps:
if step["type"] == "decision":
mermaid += f" {step['name']}{{{step['name']}}}\n" # 菱形
else:
mermaid += f" {step['name']}[{step['name']}]\n" # 矩形
# 连接
for conn in connections:
if "condition" in conn:
mermaid += f" {conn['from']} -->|{conn['condition']}| {conn['to']}\n"
else:
mermaid += f" {conn['from']} --> {conn['to']}\n"
return mermaid
```
---
## 完整示例
### 示例代码
```python
# routes.py:42
@app.post("/api/research")
def research_endpoint(query: str):
task = create_research_task(query)
results = execute_research(task)
return results
# research.py:15
def execute_research(task):
# 并发搜索
search_results = asyncio.gather(
search_google(task.query),
search_arxiv(task.query)
)
# 生成报告
if len(search_results) > 5:
report = generate_detailed_report(search_results)
else:
report = generate_summary(search_results)
return report
```
---
### 提取结果
**入口点**: `research_endpoint` (routes.py:42)
**调用链**:
```
research_endpoint
├── create_research_task
└── execute_research
├── search_google (并发)
├── search_arxiv (并发)
└── generate_detailed_report OR generate_summary (条件)
```
**流程模式**: 顺序 + 并发 + 条件分支
---
### 生成 Mermaid 图
```mermaid
flowchart TD
Start([API 请求]) --> CreateTask[创建研究任务]
CreateTask --> Concurrent{并发搜索}
Concurrent --> Google[搜索 Google]
Concurrent --> ArXiv[搜索 ArXiv]
Google --> Merge[合并结果]
ArXiv --> Merge
Merge --> Decision{结果数量}
Decision -->|> 5 篇| Detailed[生成详细报告]
Decision -->|≤ 5 篇| Summary[生成摘要]
Detailed --> End([返回结果])
Summary --> End
```
## 最佳实践
1. **先广度后深度** - 先了解整体流程
2. **限制递归深度** - 避免无限循环
3. **过滤辅助函数** - 只关注业务逻辑
4. **标注代码位置** - 便于验证
5. **可视化优先** - 图表比文字更直观