Files

T

CI / test (push) Successful in 8s

Details

sync: update all skills from latest workspace code

doc_parser_skill:
- New: verify_flowchart.py (flowchart validation)
- Updated: LLM.py (multi-provider: DeepSeek + DashScope)
- Updated: image_parser.py (logic tree support, external prompts)
- Updated: SKILL.md, prompts/image_prompt.md

conflict_detection_skill:
- Updated: LLM.py (multi-provider sync)
- Updated: detect_conflicts.py (logic tree text conversion)

ir_generation_skill:
- Replaced old scripts/LLM.py + ir_generator.py with standalone project
- New: main.py, config.py, step1-3_*.py, ensemble_merge.py
- New: prompts/, tests/ subdirectories

tests:
- New: acceptance/ test suite with schema validation
- Fixed: conftest no longer globally skips non-acceptance tests
- Updated: test_sample.py for new ir_generation structure

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

2026-05-30 22:45:08 +08:00

1.9 KiB

Raw Permalink Blame History

name, description

name	description
文档解析技能	解析文档（.docx, .pdf）以提取图像和文本结构，并使用视觉大语言模型分析每个图像的类型和描述。

文档解析技能

概述

此技能从文档（.docx, .pdf）中提取内容并准备进行进一步分析。它提取文本内容和嵌入图像，并对图像执行初始分析以了解其类型和内容。

功能

该技能：

从文档中提取文本结构（段落、表格、标题）
识别并提取嵌入的图像
使用视觉大语言模型分析每个图像并确定其类型和内容描述
生成结构化输出，将图像映射到其在文档中的位置
创建文档的初始解析表示，供后续处理阶段使用

输入要求

文档文件路径（必需，支持.docx和.pdf格式）
可选输出目录（默认为'output/'）
可选试运行标志，在不调用API的情况下预览大语言模型提示

输出

该技能生成一个结构化JSON文件，文件名为输入文档的基本名称后跟'_parsed.json'，包含：

sections：按标题分组的文档文本结构
image_sources：从图像标识符到其在文档中位置的映射
image_analysis：由视觉大语言模型确定的每个图像的类型、内容描述和（如适用）结构化逻辑树
- type: 图片类型（flowchart/architecture/state/sequence/activity/other）
- description: 图片的文字描述
- logic_tree（可选，仅图表类型）：结构化逻辑树JSON，包含 root（根节点描述）和 nodes 数组。节点类型：decision（判断）、action（动作）、state（状态）、start（开始）、end（结束）。decision 节点包含 condition 和 branches 字段，其他节点包含 description 字段。

集成点

此技能作为文档分析管道中的初始处理步骤。其输出被冲突检测技能消费以识别文本和视觉内容之间的差异。

1.9 KiB Raw Permalink Blame History Unescape Escape

文档解析技能

概述

功能

输入要求

输出

集成点

1.9 KiB

Raw Permalink Blame History