Introduction

Aspose.Words FOSS is an open-source Python library for working with Word documents. It reads DOCX, DOC, RTF, TXT, and Markdown files, and can export them to PDF, Markdown, or plain text — all without requiring Microsoft Word or any native dependencies.

The library is released under the MIT License and is available on PyPI. Install it with:

pip install aspose-words-foss>=26.4.0

Aspose.Words FOSS requires Python 3.10 or later and depends on three pure-Python packages (olefile, fpdf2, pydantic), installed automatically by pip.


Key Features

Document Loading and Conversion

The Document class is the primary entry point. Load a file in any supported input format and call save() to convert it to a different output format.

import aspose.words_foss as aw

doc = aw.Document("input.docx")  # or .doc, .rtf, .txt, .md
doc.save("output.md", aw.SaveFormat.MARKDOWN)

PDF Export

Export Word documents to PDF using SaveFormat.PDF for default settings or PdfSaveOptions for fine-grained control.

import aspose.words_foss as aw

doc = aw.Document("input.docx")
doc.save("output.pdf", aw.SaveFormat.PDF)

Markdown Export with Save Options

Use MarkdownSaveOptions and PdfSaveOptions for fine-grained control over output formatting.

import aspose.words_foss as aw
from aspose.words_foss.saving import MarkdownSaveOptions, PdfSaveOptions

doc = aw.Document("input.docx")

md_opts = MarkdownSaveOptions()
doc.save("output.md", md_opts)

pdf_opts = PdfSaveOptions()
doc.save("output.pdf", pdf_opts)

Text Extraction

Extract plain text from any supported document format using Document.get_text().

import aspose.words_foss as aw

doc = aw.Document("input.docx")
text = doc.get_text()

Document Structure Parsing

Specialized parsers extract structured data from DOCX internals. NumberingParser reads list numbering definitions and StyleParser parses style names into structured objects.

Multi-Format Input Support

Load documents from five input formats — DOCX, DOC, RTF, TXT, and Markdown — using the same Document constructor. The LoadFormat enum provides constants for explicit format selection (LoadFormat.DOCX, LoadFormat.DOC, LoadFormat.RTF, LoadFormat.TEXT, LoadFormat.MARKDOWN).


Quick Start

Install the package and convert a DOCX file to all three output formats:

pip install aspose-words-foss>=26.4.0
import aspose.words_foss as aw

# Load a Word document
doc = aw.Document("report.docx")

# Export to Markdown
doc.save("report.md", aw.SaveFormat.MARKDOWN)

# Export to PDF
doc.save("report.pdf", aw.SaveFormat.PDF)

# Export to plain text
doc.save("report.txt", aw.SaveFormat.TEXT)

# Extract text directly
text = doc.get_text()
print(f"Extracted {len(text)} characters")

Supported Formats

FormatExtensionReadWrite
DOCX.docx
DOC.doc
RTF.rtf
TXT.txt
Markdown.md
PDF.pdf

Open Source & Licensing

Aspose.Words FOSS for Python is released under the MIT License. You can use it freely in personal, internal, and commercial projects without license fees. The full source code is available on GitHub at the Aspose Words FOSS organization.


Getting Started