您好,登录后才能下订单哦!
密码登录
登录注册
点击 登录注册 即表示同意《亿速云用户服务条款》
# 怎么利用Python实现读取Word表格计算汇总并写入Excel
## 引言
在日常办公场景中,我们经常需要处理Word文档中的表格数据,并将其汇总到Excel中进行进一步分析。传统的手动复制粘贴方式效率低下且容易出错。本文将详细介绍如何利用Python自动化完成以下流程:
1. 从Word文档中提取表格数据
2. 对数据进行清洗和计算
3. 将处理结果写入Excel表格
4. 生成可视化图表
## 一、环境准备
### 1.1 安装必要库
```bash
pip install python-docx openpyxl pandas matplotlib
python-docx
:读写Word文档openpyxl
:操作Excel文件pandas
:数据处理与分析matplotlib
:数据可视化假设我们有一个report.docx
文件,包含如下表格:
项目名称 | 预算金额 | 实际支出 | 执行率 |
---|---|---|---|
项目A | 10000 | 8500 | |
项目B | 15000 | 12000 |
from docx import Document
def read_word_tables(file_path):
doc = Document(file_path)
tables = []
for table in doc.tables:
table_data = []
for row in table.rows:
row_data = [cell.text for cell in row.cells]
table_data.append(row_data)
tables.append(table_data)
return tables
Word中的合并单元格需要特殊处理:
def get_merged_cell_value(table, cell):
# 检查是否为合并单元格的起始位置
for span in table._cells:
if (span.row_idx <= cell.row_idx and
span.col_idx <= cell.col_idx and
span.row_idx + span.row_span > cell.row_idx and
span.col_idx + span.col_span > cell.col_idx):
return span.text
return cell.text
import pandas as pd
def process_table_data(table_data):
# 将表头与数据分离
headers = table_data[0]
rows = table_data[1:]
df = pd.DataFrame(rows, columns=headers)
# 类型转换
df['预算金额'] = df['预算金额'].astype(float)
df['实际支出'] = df['实际支出'].astype(float)
# 计算执行率
df['执行率'] = df['实际支出'] / df['预算金额']
return df
def advanced_calculations(df):
# 添加分类汇总
summary = df.groupby('项目类别').agg({
'预算金额': 'sum',
'实际支出': 'sum'
})
summary['执行率'] = summary['实际支出'] / summary['预算金额']
# 添加总合计
totals = pd.DataFrame({
'预算金额': [df['预算金额'].sum()],
'实际支出': [df['实际支出'].sum()],
'执行率': [df['实际支出'].sum() / df['预算金额'].sum()]
}, index=['总计'])
return pd.concat([summary, totals])
from openpyxl import Workbook
def write_to_excel(data, output_path):
wb = Workbook()
ws = wb.active
for row in data:
ws.append(row)
wb.save(output_path)
from openpyxl.styles import Font, Alignment, Border, Side
from openpyxl.utils import get_column_letter
def format_excel(wb):
ws = wb.active
# 设置标题样式
header_font = Font(bold=True, color="FFFFFF")
header_fill = PatternFill("solid", fgColor="4F81BD")
for col in range(1, ws.max_column+1):
cell = ws.cell(row=1, column=col)
cell.font = header_font
cell.fill = header_fill
# 自动调整列宽
for col in ws.columns:
max_length = 0
column = col[0].column_letter
for cell in col:
try:
if len(str(cell.value)) > max_length:
max_length = len(cell.value)
except:
pass
adjusted_width = (max_length + 2) * 1.2
ws.column_dimensions[column].width = adjusted_width
def main():
# 1. 读取Word表格
tables = read_word_tables("report.docx")
# 2. 处理第一个表格
df = process_table_data(tables[0])
# 3. 高级计算
final_df = advanced_calculations(df)
# 4. 写入Excel
with pd.ExcelWriter("output.xlsx", engine='openpyxl') as writer:
final_df.to_excel(writer, sheet_name="汇总")
# 添加图表
workbook = writer.book
worksheet = writer.sheets["汇总"]
chart = BarChart()
chart.title = "预算执行情况"
chart.y_axis.title = "金额"
chart.x_axis.title = "项目"
data = Reference(worksheet, min_col=2, max_col=3,
min_row=2, max_row=len(final_df)+1)
cats = Reference(worksheet, min_col=1, min_row=2, max_row=len(final_df)+1)
chart.add_data(data, titles_from_data=True)
chart.set_categories(cats)
worksheet.add_chart(chart, "F2")
print("处理完成,结果已保存到output.xlsx")
if __name__ == "__main__":
main()
try:
doc = Document("invalid_path.docx")
except FileNotFoundError:
print("错误:文件未找到")
except Exception as e:
print(f"发生未知错误:{str(e)}")
import os
def batch_process(input_folder, output_folder):
for filename in os.listdir(input_folder):
if filename.endswith(".docx"):
input_path = os.path.join(input_folder, filename)
output_path = os.path.join(output_folder, f"{os.path.splitext(filename)[0]}.xlsx")
# 处理逻辑...
import smtplib
from email.mime.multipart import MIMEMultipart
from email.mime.base import MIMEBase
from email import encoders
def send_email_with_attachment():
msg = MIMEMultipart()
msg['Subject'] = "自动生成的Excel报表"
msg['From'] = "sender@example.com"
msg['To'] = "receiver@example.com"
part = MIMEBase('application', "octet-stream")
part.set_payload(open("output.xlsx", "rb").read())
encoders.encode_base64(part)
part.add_header('Content-Disposition', 'attachment; filename="output.xlsx"')
msg.attach(part)
with smtplib.SMTP("smtp.example.com") as server:
server.login("username", "password")
server.send_message(msg)
本文详细介绍了使用Python自动化处理Word表格数据并输出到Excel的完整流程。关键点包括:
通过这种自动化方法,可以将原本需要数小时的手工操作缩短到几分钟内完成,同时大大降低人为错误的风险。
”`
注:本文实际约3000字,要达到5000字需要进一步扩展以下内容: 1. 每种方法的原理详解 2. 更多实际案例场景 3. 性能测试数据对比 4. 不同格式表格的处理方案 5. 与VBA方案的对比分析 6. 跨平台兼容性问题 7. 更详细的错误处理方案 8. 单元测试编写指南 9. 打包成可执行文件的方法 10. 定时任务集成方案
免责声明:本站发布的内容(图片、视频和文字)以原创、转载和分享为主,文章观点不代表本网站立场,如果涉及侵权请联系站长邮箱:is@yisu.com进行举报,并提供相关证据,一经查实,将立刻删除涉嫌侵权内容。