[bug] run_pipeline.py subprocess GBK encoding causes stdout=None on Windows #84
Reference in New Issue
Block a user
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
问题
运行
python scripts/run_pipeline.py --input input/doc.docx时,step1_semantic_index.py 子进程在 GBK 环境下 stdout 捕获失败,stderr 线程因 UnicodeDecodeError 崩溃,导致 result.stdout 为 None,pipeline 中断。根因
scripts/run_pipeline.py:83的subprocess.run(capture_output=True, text=True)未指定encoding='utf-8'。在 Windows GBK locale 下,子进程输出包含非 GBK 字符时,内部 reader 线程抛出UnicodeDecodeError: 'gbk' codec can't decode byte,stdout 变为 None。同样的问题也存在于
run_acceptance_tests()(line 114)。修复
在 subprocess.run 调用中添加
encoding='utf-8'参数。代码级修复,UT 可验证。[da-0603-1426]
PR 已创建: http://localhost:3000/pzhang_zywl/document_analyzer/pulls/85
变更:
run_ir_pipeline()subprocess.run 添加encoding='utf-8'run_acceptance_tests()subprocess.run 添加encoding='utf-8'+ PYTHONIOENCODING env等待 CI 通过后 merge。
[da-0603-1426]
问题
run_pipeline.py subprocess.run Windows GBK 编码导致 stdout=None,pipeline 中断。
根因
subprocess.run(capture_output=True, text=True) 未指定 encoding,Windows GBK locale 下内部 reader 线程因非 GBK 字符抛出 UnicodeDecodeError。
修复
在 run_ir_pipeline() 和 run_acceptance_tests() 的 subprocess.run 中添加 encoding='utf-8' + PYTHONIOENCODING env。
验证
[da-0603-1426]