Extract Text from OneNote Files Using Python

If you need to read text from Microsoft OneNote .one files in a Python script, without installing Microsoft Office or running Windows, Aspose.Note FOSS for Python is the solution. It is a 100% free, open-source library that parses the OneNote binary format directly and exposes a clean Python API.

Install

pip install aspose-note

No API key. No license file. No Microsoft Office.


The Simplest Approach: GetChildNodes(RichText)

OneNote text is stored in RichText nodes distributed across pages, outlines, and outline elements. GetChildNodes(RichText) performs a recursive search of the entire document tree and returns every text node as a flat list:

from aspose.note import Document, RichText

doc = Document("MyNotes.one")
for rt in doc.GetChildNodes(RichText):
    if rt.Text:
        print(rt.Text)

This is the fastest way to get all text content out of a .one file.


Save Text to a File

from aspose.note import Document, RichText

doc = Document("MyNotes.one")
lines = [rt.Text for rt in doc.GetChildNodes(RichText) if rt.Text]

with open("extracted.txt", "w", encoding="utf-8") as f:
    f.write("\n".join(lines))

print(f"Saved {len(lines)} text blocks to extracted.txt")

Extract Text Per Page

When you need to know which page each text block came from:

from aspose.note import Document, Page, RichText

doc = Document("MyNotes.one")
for page in doc.GetChildNodes(Page):
    title = (
        page.Title.TitleText.Text
        if page.Title and page.Title.TitleText
        else "(untitled)"
    )
    page_texts = [rt.Text for rt in page.GetChildNodes(RichText) if rt.Text]
    print(f"\n=== {title} ===")
    for text in page_texts:
        print(text)

Hyperlinks are stored on individual TextRun objects within RichText nodes. Check run.Style.IsHyperlink:

from aspose.note import Document, RichText

doc = Document("MyNotes.one")
for rt in doc.GetChildNodes(RichText):
    for run in rt.Runs:
        if run.Style.IsHyperlink and run.Style.HyperlinkAddress:
            print(f"{run.Text!r}  ->  {run.Style.HyperlinkAddress}")

Detect Formatting: Bold, Italic, Underline

Each TextRun carries per-character formatting through its TextStyle:

from aspose.note import Document, RichText

doc = Document("MyNotes.one")
for rt in doc.GetChildNodes(RichText):
    for run in rt.Runs:
        s = run.Style
        if any([s.Bold, s.Italic, s.Underline]):
            flags = ", ".join(f for f, v in [
                ("bold", s.Bold), ("italic", s.Italic), ("underline", s.Underline)
            ] if v)
            print(f"[{flags}] {run.Text.strip()!r}")

Read from a Stream

Works with cloud storage, HTTP response bodies, or in-memory buffers:

import io, urllib.request
from aspose.note import Document, RichText

##Example: load from bytes already in memory
one_bytes = open("MyNotes.one", "rb").read()
doc = Document(io.BytesIO(one_bytes))
texts = [rt.Text for rt in doc.GetChildNodes(RichText) if rt.Text]
print(f"Extracted {len(texts)} text block(s)")

Windows Encoding Fix

On Windows terminals, sys.stdout may use a legacy encoding that crashes on Unicode characters. Add this at the start of your script:

import sys
if hasattr(sys.stdout, "reconfigure"):
    sys.stdout.reconfigure(encoding="utf-8", errors="replace")

What the Library Supports

FeatureSupported
Read .one files (path or stream)Yes
Extract RichText.Text (plain text)Yes
Inspect TextRun.Style (bold, italic, hyperlink, font)Yes
Extract text from table cellsYes
Read page titlesYes
Write back to .oneNo
Encrypted documentsNo

Next Steps