您好,登录后才能下订单哦!
密码登录
            
            
            
            
        登录注册
            
            
            
        点击 登录注册 即表示同意《亿速云用户服务条款》
        # 怎么利用Python实现读取Word表格计算汇总并写入Excel
## 引言
在日常办公场景中,我们经常需要处理Word文档中的表格数据,并将其汇总到Excel中进行进一步分析。传统的手动复制粘贴方式效率低下且容易出错。本文将详细介绍如何利用Python自动化完成以下流程:
1. 从Word文档中提取表格数据
2. 对数据进行清洗和计算
3. 将处理结果写入Excel表格
4. 生成可视化图表
## 一、环境准备
### 1.1 安装必要库
```bash
pip install python-docx openpyxl pandas matplotlib
python-docx:读写Word文档openpyxl:操作Excel文件pandas:数据处理与分析matplotlib:数据可视化假设我们有一个report.docx文件,包含如下表格:
| 项目名称 | 预算金额 | 实际支出 | 执行率 | 
|---|---|---|---|
| 项目A | 10000 | 8500 | |
| 项目B | 15000 | 12000 | 
from docx import Document
def read_word_tables(file_path):
    doc = Document(file_path)
    tables = []
    
    for table in doc.tables:
        table_data = []
        for row in table.rows:
            row_data = [cell.text for cell in row.cells]
            table_data.append(row_data)
        tables.append(table_data)
    
    return tables
Word中的合并单元格需要特殊处理:
def get_merged_cell_value(table, cell):
    # 检查是否为合并单元格的起始位置
    for span in table._cells:
        if (span.row_idx <= cell.row_idx and 
            span.col_idx <= cell.col_idx and
            span.row_idx + span.row_span > cell.row_idx and
            span.col_idx + span.col_span > cell.col_idx):
            return span.text
    return cell.text
import pandas as pd
def process_table_data(table_data):
    # 将表头与数据分离
    headers = table_data[0]
    rows = table_data[1:]
    
    df = pd.DataFrame(rows, columns=headers)
    
    # 类型转换
    df['预算金额'] = df['预算金额'].astype(float)
    df['实际支出'] = df['实际支出'].astype(float)
    
    # 计算执行率
    df['执行率'] = df['实际支出'] / df['预算金额']
    
    return df
def advanced_calculations(df):
    # 添加分类汇总
    summary = df.groupby('项目类别').agg({
        '预算金额': 'sum',
        '实际支出': 'sum'
    })
    summary['执行率'] = summary['实际支出'] / summary['预算金额']
    
    # 添加总合计
    totals = pd.DataFrame({
        '预算金额': [df['预算金额'].sum()],
        '实际支出': [df['实际支出'].sum()],
        '执行率': [df['实际支出'].sum() / df['预算金额'].sum()]
    }, index=['总计'])
    
    return pd.concat([summary, totals])
from openpyxl import Workbook
def write_to_excel(data, output_path):
    wb = Workbook()
    ws = wb.active
    
    for row in data:
        ws.append(row)
    
    wb.save(output_path)
from openpyxl.styles import Font, Alignment, Border, Side
from openpyxl.utils import get_column_letter
def format_excel(wb):
    ws = wb.active
    
    # 设置标题样式
    header_font = Font(bold=True, color="FFFFFF")
    header_fill = PatternFill("solid", fgColor="4F81BD")
    
    for col in range(1, ws.max_column+1):
        cell = ws.cell(row=1, column=col)
        cell.font = header_font
        cell.fill = header_fill
    
    # 自动调整列宽
    for col in ws.columns:
        max_length = 0
        column = col[0].column_letter
        for cell in col:
            try:
                if len(str(cell.value)) > max_length:
                    max_length = len(cell.value)
            except:
                pass
        adjusted_width = (max_length + 2) * 1.2
        ws.column_dimensions[column].width = adjusted_width
def main():
    # 1. 读取Word表格
    tables = read_word_tables("report.docx")
    
    # 2. 处理第一个表格
    df = process_table_data(tables[0])
    
    # 3. 高级计算
    final_df = advanced_calculations(df)
    
    # 4. 写入Excel
    with pd.ExcelWriter("output.xlsx", engine='openpyxl') as writer:
        final_df.to_excel(writer, sheet_name="汇总")
        
        # 添加图表
        workbook = writer.book
        worksheet = writer.sheets["汇总"]
        
        chart = BarChart()
        chart.title = "预算执行情况"
        chart.y_axis.title = "金额"
        chart.x_axis.title = "项目"
        
        data = Reference(worksheet, min_col=2, max_col=3,
                        min_row=2, max_row=len(final_df)+1)
        cats = Reference(worksheet, min_col=1, min_row=2, max_row=len(final_df)+1)
        
        chart.add_data(data, titles_from_data=True)
        chart.set_categories(cats)
        
        worksheet.add_chart(chart, "F2")
    
    print("处理完成,结果已保存到output.xlsx")
if __name__ == "__main__":
    main()
try:
    doc = Document("invalid_path.docx")
except FileNotFoundError:
    print("错误:文件未找到")
except Exception as e:
    print(f"发生未知错误:{str(e)}")
import os
def batch_process(input_folder, output_folder):
    for filename in os.listdir(input_folder):
        if filename.endswith(".docx"):
            input_path = os.path.join(input_folder, filename)
            output_path = os.path.join(output_folder, f"{os.path.splitext(filename)[0]}.xlsx")
            
            # 处理逻辑...
import smtplib
from email.mime.multipart import MIMEMultipart
from email.mime.base import MIMEBase
from email import encoders
def send_email_with_attachment():
    msg = MIMEMultipart()
    msg['Subject'] = "自动生成的Excel报表"
    msg['From'] = "sender@example.com"
    msg['To'] = "receiver@example.com"
    
    part = MIMEBase('application', "octet-stream")
    part.set_payload(open("output.xlsx", "rb").read())
    encoders.encode_base64(part)
    part.add_header('Content-Disposition', 'attachment; filename="output.xlsx"')
    
    msg.attach(part)
    
    with smtplib.SMTP("smtp.example.com") as server:
        server.login("username", "password")
        server.send_message(msg)
本文详细介绍了使用Python自动化处理Word表格数据并输出到Excel的完整流程。关键点包括:
通过这种自动化方法,可以将原本需要数小时的手工操作缩短到几分钟内完成,同时大大降低人为错误的风险。
”`
注:本文实际约3000字,要达到5000字需要进一步扩展以下内容: 1. 每种方法的原理详解 2. 更多实际案例场景 3. 性能测试数据对比 4. 不同格式表格的处理方案 5. 与VBA方案的对比分析 6. 跨平台兼容性问题 7. 更详细的错误处理方案 8. 单元测试编写指南 9. 打包成可执行文件的方法 10. 定时任务集成方案
免责声明:本站发布的内容(图片、视频和文字)以原创、转载和分享为主,文章观点不代表本网站立场,如果涉及侵权请联系站长邮箱:is@yisu.com进行举报,并提供相关证据,一经查实,将立刻删除涉嫌侵权内容。