Introduction
Aspose.Words FOSS is an open-source Python library for working with Word documents. It reads DOCX, DOC, RTF, TXT, and Markdown files, and can export them to PDF, Markdown, or plain text — all without requiring Microsoft Word or any native dependencies.
The library is released under the MIT License and is available on PyPI. Install it with:
pip install aspose-words-foss>=26.4.0
Aspose.Words FOSS requires Python 3.10 or later and depends on three pure-Python packages (olefile, fpdf2, pydantic), installed automatically by pip.
Key Features
Document Loading and Conversion
The Document class is the primary entry point. Load a file in any supported input format and call save() to convert it to a different output format.
import aspose.words_foss as aw
doc = aw.Document("input.docx") # or .doc, .rtf, .txt, .md
doc.save("output.md", aw.SaveFormat.MARKDOWN)
PDF Export
Export Word documents to PDF using SaveFormat.PDF for default settings or PdfSaveOptions for fine-grained control.
import aspose.words_foss as aw
doc = aw.Document("input.docx")
doc.save("output.pdf", aw.SaveFormat.PDF)
Markdown Export with Save Options
Use MarkdownSaveOptions and PdfSaveOptions for fine-grained control over output formatting.
import aspose.words_foss as aw
from aspose.words_foss.saving import MarkdownSaveOptions, PdfSaveOptions
doc = aw.Document("input.docx")
md_opts = MarkdownSaveOptions()
doc.save("output.md", md_opts)
pdf_opts = PdfSaveOptions()
doc.save("output.pdf", pdf_opts)
Text Extraction
Extract plain text from any supported document format using Document.get_text().
import aspose.words_foss as aw
doc = aw.Document("input.docx")
text = doc.get_text()
Document Structure Parsing
Specialized parsers extract structured data from DOCX internals. NumberingParser reads list numbering definitions and StyleParser parses style names into structured objects.
Multi-Format Input Support
Load documents from five input formats — DOCX, DOC, RTF, TXT, and Markdown — using the same Document constructor. The LoadFormat enum provides constants for explicit format selection (LoadFormat.DOCX, LoadFormat.DOC, LoadFormat.RTF, LoadFormat.TEXT, LoadFormat.MARKDOWN).
Quick Start
Install the package and convert a DOCX file to all three output formats:
pip install aspose-words-foss>=26.4.0
import aspose.words_foss as aw
# Load a Word document
doc = aw.Document("report.docx")
# Export to Markdown
doc.save("report.md", aw.SaveFormat.MARKDOWN)
# Export to PDF
doc.save("report.pdf", aw.SaveFormat.PDF)
# Export to plain text
doc.save("report.txt", aw.SaveFormat.TEXT)
# Extract text directly
text = doc.get_text()
print(f"Extracted {len(text)} characters")
Supported Formats
| Format | Extension | Read | Write |
|---|---|---|---|
| DOCX | .docx | ✓ | — |
| DOC | .doc | ✓ | — |
| RTF | .rtf | ✓ | — |
| TXT | .txt | ✓ | — |
| Markdown | .md | ✓ | ✓ |
| — | ✓ |
Open Source & Licensing
Aspose.Words FOSS for Python is released under the MIT License. You can use it freely in personal, internal, and commercial projects without license fees. The full source code is available on GitHub at the Aspose Words FOSS organization.