Keep up to date with all the progress and developments underway or recently completed at Tensorlake labs HQ
Tensorlake DocumentAI can now automatically detect and decode barcodes from your documents and scans, returning barcode type, value, and bounding boxes as structured output.
Tensorlake now uses Vision Language Models (VLMs) across multiple features including page classification, figure/table summarization, and structured extraction, enabling faster and more intelligent document understanding.
Tensorlake now preserves tracked changes (insertions, deletions, and comments) from Word documents as structured HTML, enabling programmatic access to document revision history.
Tensorlake now detects and corrects document headers across pages, maintaining proper hierarchy even when OCR misidentifies header levels.
Fixed bug where citations ignored page classification filtering, ensuring citations only reference pages you're actually extracting from.
Fixed token limit issues with large, dense CSV and Excel tables through automatic splitting and intelligent result merging.
Page classification results now include the model's reasoning for each decision to help with debugging and prompt engineering.
Page classification now defaults to multi-label mode, allowing pages to receive multiple classification labels simultaneously.
Optionally reference the full-page during figure and table summarization to preserve spatial context in complex layouts.
Document ingestion now supports XML, legacy DOC, and Markdown files with the same parsing capabilities as existing formats.
New model is live—reliably extracting very large, dense tables from PDFs (incl. scans) while preserving header hierarchy, row/col spans, and cell boundaries, with fast HTML/CSV export and bbox for citations.
V2 of the DocumentAI API is fully in production in the Python SDK and on the Playground, offering unified document processing with advanced structured extraction, page classification, and enrichment capabilities.
Extract structured data from any document using Pydantic schemas with improved accuracy and multi-format support