Changelog

TENSORLAKE UPDATES

Keep up to date with all the progress and developments underway or recently completed at Tensorlake labs HQ

New: Barcode Detection & Reading in the Document Ingestion API

Tensorlake DocumentAI can now automatically detect and decode barcodes from your documents and scans, returning barcode type, value, and bounding boxes as structured output.

New: Vision Language Models for Document Processing

10/16/25

Tensorlake now uses Vision Language Models (VLMs) across multiple features including page classification, figure/table summarization, and structured extraction, enabling faster and more intelligent document understanding.

New: Tracked Changes Parsing for Word Documents

10/10/25

Tensorlake now preserves tracked changes (insertions, deletions, and comments) from Word documents as structured HTML, enabling programmatic access to document revision history.

New: Header Detection and Correction for accurate document hierarchy

9/30/25

Tensorlake now detects and corrects document headers across pages, maintaining proper hierarchy even when OCR misidentifies header levels.

Fixed: Citation filtering now respects page classification limits

9/19/25

Fixed bug where citations ignored page classification filtering, ensuring citations only reference pages you're actually extracting from.

Fixed token limit issues with large CSV/Excel tables

9/17/25

Fixed token limit issues with large, dense CSV and Excel tables through automatic splitting and intelligent result merging.

Page classification now includes reasoning explanations

9/15/25

Page classification results now include the model's reasoning for each decision to help with debugging and prompt engineering.

Page classification now defaults to multi-label (multiple classes per page)

9/12/25

Page classification now defaults to multi-label mode, allowing pages to receive multiple classification labels simultaneously.

Summaries now include optional full-page image context

9/10/25

Optionally reference the full-page during figure and table summarization to preserve spatial context in complex layouts.

Document Ingestion now supports XML, DOC, and Markdown files

9/8/25

Document ingestion now supports XML, legacy DOC, and Markdown files with the same parsing capabilities as existing formats.

Table Recognition now parses ~1,500-cell tables (with structure preserved)

8/13/25

New model is live—reliably extracting very large, dense tables from PDFs (incl. scans) while preserving header hierarchy, row/col spans, and cell boundaries, with fast HTML/CSV export and bbox for citations.

DocumentAI API v2

8/11/25

V2 of the DocumentAI API is fully in production in the Python SDK and on the Playground, offering unified document processing with advanced structured extraction, page classification, and enrichment capabilities.

Advanced Schema Extraction

3/15/24

Extract structured data from any document using Pydantic schemas with improved accuracy and multi-format support

Changelog

Ship AI Automation Faster With Tensorlake