Back to All changelogs
September 8, 2025
Document Ingestion now supports XML, DOC, and Markdown files
Document ingestion now supports XML, legacy DOC, and Markdown files with the same parsing capabilities as existing formats.
Key Highlights
- Native XML parsing for config files and structured data exports
- Legacy DOC file support for older document repositories
- Markdown processing for documentation and technical specs
What's new
Document ingestion now handles three additional file formats: XML documents, legacy DOC files (Microsoft Word 97-2003), and native Markdown files. These join our existing support for PDF, DOCX, images, and other formats in a unified parsing pipeline.
Why it matters
- XML files are everywhere in enterprise workflows (config files, data exports, structured documents)
- Legacy DOC files still appear in legacy systems and older document repositories
- Markdown files are standard for documentation, README files, and technical specs
- Unified processing means fewer custom preprocessing steps in your pipeline
Highlights
- Native parsing preserves document structure and metadata
- Same extraction and classification capabilities as other formats
- Automatic format detection - no manual format specification required
- Full compatibility with structured extraction and summarization features
How to use
Works automatically when you upload any of these file types. No configuration changes needed.
1[.code-block-title]Code[.code-block-title]doc_ai = DocumentAI()
2
3# All of these now work seamlessly
4xml_file_id = doc_ai.upload(path="/path/to/config.xml")
5xml_result = doc_ai.parse_and_wait(xml_file_id)
6
7doc_file_id = doc_ai.upload(path="/path/to/legacy_report.doc")
8doc_result = doc_ai.parse_and_wait(doc_file_id)
9
10md_file_id = doc_ai.upload(path="/path/to/README.md")
11md_result = doc_ai.parse_and_wait(md_file_id)Status
✅ Live now. All existing parsing features work across the new formats.
Get server-less runtime for agents and data ingestion
Data ingestion like never before.
TRUSTED BY PRO DEVS GLOBALLY
Tensorlake is the Agentic Compute Runtime the durable serverless platform that runs Agents at scale.