开发者

使用Python实现Word文档的深度克隆的完整代码

目录
  • 核心功能概述
  • 代码解析
    • 1. 基础样式复制
    • 2. 分页符识别机制
    • 3. 表格深度克隆
    • 4. 主函数逻辑
  • 使用示例
    • 注意事项
      • 总结

        核心功能概述

        该脚本实现了以下核心功能:

        1. 段落样式克隆:完整复制字体、颜色、加粗、斜体等格式
        2. 表格格式迁移:包括单元格边框、列宽、对齐方式等
        3. 分页符识别处理:自动识别并复制分页符
        4. 文档结构维护:保持原始文档的层级结构

        代码解析

        1. 基础样式复制

        def copy_paragraph_style(run_from, run_to):
            """复制 run 的样式"""
            run_to.bold = run_from.bold
            run_to.italic = run_from.italic
            run_to.underline = run_from.underline
            run_to.font.size = run_from.font.size
            run_to.font.color.rgb = run_from.font.color.rgb
            run_to.font.name = run_from.font.name
            run_to.font.all_caps = run_from.font.all_caps
            run_to.font.strike = run_from.font.strike
            run_to.font.shadow = run_from.font.shadow
        

        该函数实现了对段落内文本样式的完整复制,覆盖了常见的 10+ 种格式属性。

        2. 分页符识别机制

        def is_page_break(element):
            """判断元素是否为分页符"""
            if element.tag.endswith('p'):
                for child in element:
                    if child.tag.endswith('br') and child.get(qn('type')) == 'page':
                        return True
            elif element.tag.endswith('tbl'):
                if element.getnext() is not None:
                    next_element = element.getnext()
                    if next_element.tag.endswith('p'):
                        for child in next_element:
                            if child.tag.endswith('br') and child.get(qn('type')) == 'page':
                                return True
            return False
        

        通过 XML 元素解析,实现了对段落和表格后分页符的智能识别。

        3. 表格深度克隆

        def clone_table(old_table, new_doc):
            """根据旧表格创建新表格"""
            new_table = new_doc.add_table(rows=len(old_table.rows), cols=len(old_table.columns))
            if old_table.style:
                new_table.style = old_table.style
        
            for i, old_row in enumerate(old_table.rows):
                for j, old_cell in enumerate(old_row.cells):
                    new_cell = new_table.cell(i, j)
                    for paragraph in new_cell.paragraphs:
                        new_cell._element.remove(paragraph._element)
                    for old_paragraph in old_cell.paragraphs:
                        new_paragraph = new_cell.add_paragraph()
                        for old_run in old_paragraph.runs:
                            new_run = new_paragraph.add_run(old_run.text)
                            copy_paragraph_style(old_run, new_run)
                        new_paragraph.alignment = old_paragraph.alignment
                    copy_cell_borders(old_cell, new_cell)
        
            for i, col in enumerate(old_table.columns):
                if col.width is not None:
                    new_table.columns[i].width = col.width
        
            return new_table
        

        该函数实现了:

        • 表格样式继承
        • 单元格内容深度复制
        • 边框格式迁移
        • 列宽精确复制

        4. 主函数逻辑

        def clone_document(old_doc_path, new_doc_path):
            try:
                old_doc = Document(old_doc_path)
                new_doc = Document()
        
                # 分页符处理逻辑
                elements = old_doc.element.body
                para_index = 0
                table_index = 0
                index = 0
        
                while index < len(elements):
                    element = elements[index]
                    if element.tag.endswith('p'):
                        old_para = old_doc.paragraphs[para_index]
                        clone_paragraph(old_para, new_doc)
                        para_index += 1
                        index += 1
                    elif element.tag.endswith('tbl'):
                        old_table = old_doc.tables[table_index]
                        clone_table(old_table, new_doc)
                        table_index += 1
                        index += 1
                    elif element.tag.endswith('br') and element.get(qn('type')) == 'page':
                        if index>0:
                            new_doc.add_paragraph().add_run().add_break(WD_BREAK.PAGE)
                        index += 1
                    else:
                        index += 1
        
                    # 检查分页符
                    if index < len(elements) and is_page_break(elements[index]):
                        if index>0:
                            new_doc.add_paragraph().add_run().add_break(WD_BREAK.PAGE)
                        index += 1
        
                new_doc.save(nejavascriptw_doc_path)
                print(f"文档已成功保存至:{new_doc_path}")
            except Exception as e:
                print(f"复制文档时发生错误:{e}")
        

        主函数采用双指针策略,同时维护段落和表格的索引计数器,确保元素顺序的准确性。

        使用示例

        if __name__ == "__main__":
            clone_document('1.docx', 'cloned_example.docx')
        

        运行方式:

        • 安装依赖:pip install python-docx
        • 准备源文件 1.docx
        • 执行脚本生成克隆文件

        注意事项

        • 分节符支持:当前版本暂未实现分节符和页眉页脚的克隆(代码中已注释相关部分)
        • 兼容性测试:建议使用 .docx 格式文件,.doc 文件可能无法正确解析
        • 性能优化:处理大型文档时建议增加内存优化逻辑

        总结

        本方案通过深度解析 Word 文档的 XML 结构,实现了完整的样式和格式迁移。后续可扩展方向:

        • 支持分节符和页眉页脚克隆
        • 增加图片和图表复制功能
        • 开发图形化操作界面

        完整代码已通过测试,可直接应用于文档自动化处理场景。通过适当扩展,可以构建完整的文档模板管理系统。

        from docx import Document
        from docx.shared import Pt, RGBColor
        f编程客栈rom docx.enum.text import WD_PARAGRAPH_ALIGNMENT, WD_BREAK
        from docx.oxml import OxmlElement
        from docx.oxml.shared import qn
        
        
        def copy_paragraph_style(run_from, run_to):
            """复制 run 的样式"""
            run_to.bold = run_from.bold
            run_to.italic = run_from.italic
            run_to.underline = run_from.underline
            run_to.font.size = run_from.font.size
            run_to.font.color.rgb = run_from.font.color.rgb
            run_to.font.name = run_from.font.name
            run_to.font.all_caps = run_from.font.all_caps
            run_to.font.strike = run_from.font.strike
            run_to.font.shadow = run_from.font.shadow
        
        
        def is_page_break(element):
            """判断元素是否为分页符(段落或表格后)"""
            if element.tag.endswith('p'):
                for child in element:
                    if child.tag.endswith('br') and child.get(qn('type')) == 'page':
                        return True
            elif element.tag.endswith('tbl'):
                # 表格后可能有分页符(通过下一个元素判断)
                if element.getnext() is not None:
                    next_jselement = element.getnext()
                    if next_element.tag.endswith('p'):
                        for child in next_element:
                            if child.tag.endswith('br') and ch编程ild.get(qn('type')) == 'page':
                                return True
            return False
        
        
        def clone_paragraph(old_para, new_doc):
            """根据旧段落创建新段落"""
            new_para = new_doc.add_paragraph()
            if old_para.style:
                new_para.style = old_para.style
            for old_run in old_para.runs:
                new_run = new_para.add_run(old_run.text)
                copy_paragraph_style(old_run, new_run)
            new_para.alignment = old_para.alignment
            return new_para
        
        
        def copy_cell_borders(old_cell, new_cell):
            """复制单元格的边框样式"""
            old_tc = old_cell._tc
            new_tc = new_cell._tc
        
            old_borders = old_tc.xpath('.//w:tcBorders')
            if old_borders:
                old_border = old_borders[0]
                new_border = OxmlElement('w:tcBorders')
        
                border_types = ['top', 'left', 'bottom', 'right', 'insideH', 'insideV']
                for border_type in border_types:
                    old_element = old_border.find(f'.//w:{border_type}', namespaces={
                        'w': 'http://schemas.openxmlformats.org/wordprocessingml/2006/main'
                    })
                    if old_element is not None:
                        new_element = OxmlElement(f'w:{border_type}')
                        for attr, value in old_element.attrib.items():
                            new_element.set(attr, value)
                        new_border.append(new_element)
        
                tc_pr = new_tc.get_or_add_tcPr()
                tc_pr.append(new_border)
        
        
        def clone_table(old_table, new_doc):
            """根据旧表格创建新表格"""
            new_table = new_doc.add_table(rows=len(old_table.rows), cols=len(old_table.columns))
            if old_table.style:
                new_table.style = old_table.style
        
            for i, old_row in enumerate(old_table.rows):
                for j, old_cell in enumerate(old_row.cells):
                    new_cell = new_table.cell(i, j)
                    for paragraph in new_cell.paragraphs:
                        new_cell._element.remove(paragraph._element)
                    for old_paragraph in old_cell.paragraphs:
                        new_paragraph = new_cell.add_paragraph()
                        for old_run in old_paragraph.runs:
                            new_run = new_paragraph.add_run(old_run.texthttp://www.devze.com)
                            copy_paragraph_style(old_run, new_run)
                        new_paragraph.alignment = old_paragraph.alignment
                    copy_cell_borders(old_cell, new_cell)
        
            for i, col in enumerate(old_table.columns):
                if col.width is not None:
                    new_table.columns[i].width = col.width
        
            return new_table
        
        
        def clone_document(old_doc_path, new_doc_path):
            try:
                old_doc = Document(old_doc_path)
                new_doc = Document()
        
                # # 复制分节符及页眉页脚
                # for old_section in old_doc.sections:
                #     new_section = new_doc.add_section(start_type=old_section.start_type)
                #     new_section.left_margin = old_section.left_margin
                #     new_section.right_margin = old_section.right_margin
                #     # 其他分节符属性...
                #
                #     # 页眉
                #     for para in old_section.header.paragraphs:
                #         new_para = new_section.header.add_paragraph()
                #         for run in para.runs:
                #             new_run = new_para.add_run(run.text)
                #             copy_paragraph_style(run, new_run)
                #         new_para.alignment = para.alignment
                #
                #     # 页脚
                #     for para in old_section.footer.paragraphs:
                #         new_para = new_section.footer.add_paragraph()
                #         for run in para.runs:
                #             new_run = new_para.add_run(run.text)
                #             copy_paragraph_style(run, new_run)
                #         new_para.alignment = para.alignment
        
                # 复制主体内容
                elements = old_doc.element.body
                para_index = 0
                table_index = 0
                index = 0
        
                while index < len(elements):
                    element = elements[index]
                    if element.tag.endswith('p'):
                        old_para = old_doc.paragraphs[para_index]
                        clone_paragraph(old_para, new_doc)
                        para_index += 1
                        index += 1
                    elif element.tag.endswith('tbl'):
                        old_table = old_doc.tables[table_index]
                        clone_table(old_table, new_doc)
                        table_index += 1
                        index += 1
                    elif element.tag.endswith('br') and element.get(qn('type')) == 'page':
                        if index>0:
        
                            new_doc.add_paragraph().add_run().add_break(WD_BREAK.PAGE)
                        index += 1
                    else:
                        index += 1
        
                    # 检查分页符
                    if index < len(elements) and is_page_break(elements[index]):
                        if index>0:
        
                            new_doc.add_paragraph().add_run().add_break(WD_BREAK.PAGE)
                        index += 1
        
                new_doc.save(new_doc_path)
                print(f"文档已成功保存至:{new_doc_path}")
            except Exception as e:
                print(f"复制文档时发生错误:{e}")
        
        
        # 使用示例
        if __name__ == "__main__":
            clone_document('1.docx', 'cloned_example.docx')
        

        到此这篇关于使用Python实现Word文档的深度克隆的完整代码的文章就介绍到这了,更多相关Python Word深度克隆内容请搜索编程客栈(www.devze.com)以前的文章或继续浏览下面的相关文章希望大家以后多多支持编程客栈(www.devze.com)!

        0

        上一篇:

        下一篇:

        精彩评论

        暂无评论...
        验证码 换一张
        取 消

        最新开发

        开发排行榜