在 Python 中解析 OneNote 表格
Microsoft OneNote 允许用户直接在页面中嵌入结构化表格,非常适合任务清单、日程安排、比较矩阵和数据收集表单。Aspose.Note FOSS for Python 使得可以以编程方式提取所有这些表格数据,无需安装 Microsoft Office。.
安装
pip install aspose-note
加载文档并查找表格
GetChildNodes(Table) 在整个文档中执行递归搜索,并将每个表格返回为一个 Table 对象::
from aspose.note import Document, Table
doc = Document("MyNotes.one")
tables = doc.GetChildNodes(Table)
print(f"Found {len(tables)} table(s)")
读取单元格值
表格遵循三级层次结构:: Table → TableRow → TableCell.。每个单元格包含 RichText 节点,其 .Text 提供纯文本内容::
from aspose.note import Document, Table, TableRow, TableCell, RichText
doc = Document("MyNotes.one")
for t_num, table in enumerate(doc.GetChildNodes(Table), start=1):
print(f"\nTable {t_num}:")
for r_num, row in enumerate(table.GetChildNodes(TableRow), start=1):
cells = row.GetChildNodes(TableCell)
row_values = [
" ".join(rt.Text for rt in cell.GetChildNodes(RichText)).strip()
for cell in cells
]
print(f" Row {r_num}: {row_values}")
检查列宽
Table.ColumnWidths 返回每列以点为单位的存储宽度::
from aspose.note import Document, Table
doc = Document("MyNotes.one")
for i, table in enumerate(doc.GetChildNodes(Table), start=1):
print(f"Table {i}: {len(table.ColumnWidths)} column(s)")
print(f" Widths (pts): {table.ColumnWidths}")
print(f" Borders visible: {table.BordersVisible}")
将所有表格导出为 CSV
将文档中的每个表格转换为 CSV 格式::
import csv, io
from aspose.note import Document, Table, TableRow, TableCell, RichText
doc = Document("MyNotes.one")
output = io.StringIO()
writer = csv.writer(output)
for table in doc.GetChildNodes(Table):
for row in table.GetChildNodes(TableRow):
values = [
" ".join(rt.Text for rt in cell.GetChildNodes(RichText)).strip()
for cell in row.GetChildNodes(TableCell)
]
writer.writerow(values)
writer.writerow([]) # blank row between tables
with open("tables.csv", "w", encoding="utf-8", newline="") as f:
f.write(output.getvalue())
print("Saved tables.csv")
将表格导出为 Python 字典 / JSON
import json
from aspose.note import Document, Table, TableRow, TableCell, RichText
doc = Document("MyNotes.one")
result = []
for table in doc.GetChildNodes(Table):
rows = []
for row in table.GetChildNodes(TableRow):
cells = [
" ".join(rt.Text for rt in cell.GetChildNodes(RichText)).strip()
for cell in row.GetChildNodes(TableCell)
]
rows.append(cells)
result.append({"rows": rows, "column_widths": table.ColumnWidths})
print(json.dumps(result, indent=2))
使用第一行作为标题
from aspose.note import Document, Table, TableRow, TableCell, RichText
doc = Document("MyNotes.one")
for table in doc.GetChildNodes(Table):
rows = table.GetChildNodes(TableRow)
if not rows:
continue
def row_text(row):
return [
" ".join(rt.Text for rt in cell.GetChildNodes(RichText)).strip()
for cell in row.GetChildNodes(TableCell)
]
headers = row_text(rows[0])
print("Headers:", headers)
for row in rows[1:]:
record = dict(zip(headers, row_text(row)))
print(" Record:", record)
库对表格的支持
| 功能 | 支持 |
|---|---|
Table.ColumnWidths | 是:列宽以点为单位 |
Table.BordersVisible | 是 |
Table.Tags | 是:表格上的 OneNote 标签 |
单元格文本通过 RichText | 是 |
单元格图像通过 Image | 是 |
| 合并单元格(rowspan/colspan 元数据) | 未在公共 API 中公开 |
编写/编辑表格并保存至 .one | 否 |