Back to All changelogs
September 12, 2025

Page classification now defaults to multi-label (multiple classes per page)

Page classification now defaults to multi-label mode, allowing pages to receive multiple classification labels simultaneously.

Key Highlights

  • Single pages can be classified as multiple page types (e.g., account_info AND transactions)
  • Better handling of complex documents like bank statements and legal docs
  • Backward compatible - multi-class mode still available via configuration

What's new

Page classification now defaults to multi-label mode, allowing each page to be assigned multiple page classes simultaneously. Previously, pages could only receive one classification label (multi-class mode). Multi-class classification remains available as a configuration option.

Why it matters

  • Complex documents often have overlapping content types on single pages
  • Bank statements - first page contains both account info AND transaction data
  • Legal documents - pages mix contract terms, signatures, and exhibits
  • Better extraction targeting - extract account_info AND transactions from the same page

Highlights

  • Single page can receive multiple relevant classifications
  • More accurate representation of complex document structure
  • Improved downstream extraction accuracy
  • Backward compatible - multi-class mode still available

Technical details

Multi-class (old default): Each page gets exactly one label - mutually exclusive
Multi-label (new default): Each page can get multiple labels - non-exclusive

Example: A bank statement's first page that contains account details AND the start of transaction history now gets both account_info and transactions labels instead of forcing a choice.

How to use

Multi-label classification works automatically with existing page classification prompts:

1[.code-block-title]Code[.code-block-title]doc_ai = DocumentAI()
2
3result = doc_ai.parse_and_wait(
4    file="bank_statement.pdf",
5    page_classification=PageClassificationConfig(
6        page_classes=["account_info", "transactions", "summary"]
7    )
8    # Multi-label is now default - no config change needed
9)
10# Pages can now have multiple classifications
11for page in result.pages:
12    classifications = page.classifications  # Can contain multiple labels

To revert to multi-class behavior:

1[.code-block-title]Code[.code-block-title]page_classification=PageClassificationConfig(
2    page_classes=["account_info", "transactions", "summary"],
3    classification_mode="multiclass"  # Explicitly set multi-class
4)

Status

âś… Multi-label by default is live now. Existing prompts will automatically benefit from multi-label classification.đźš§ Choosing between mutli-label and multi-class is coming to the API and SDK soon.

Get server-less runtime for agents and data ingestion

Data ingestion like never before.
TRUSTED BY PRO DEVS GLOBALLY

Tensorlake is the Agentic Compute Runtime the durable serverless platform that runs Agents at scale.

"At SIXT, we're building AI-powered experiences for millions of customers while managing the complexity of enterprise-scale data. TensorLake gives us the foundation we need—reliable document ingestion that runs securely in our VPC to power our generative AI initiatives."

Boyan Dimitrov
CTO, Sixt

“Tensorlake enabled us to avoid building and operating an in-house OCR pipeline by providing a robust, scalable OCR and document ingestion layer with excellent accuracy and feature coverage. Ongoing improvements to the platform, combined with strong technical support, make it a dependable foundation for our scientific document workflows.”

Yaroslav Sklabinskyi
CEO, Reliant AI

"For BindHQ customers, the integration with Tensorlake represents a shift from manual data handling to intelligent automation, helping insurance businesses operate with greater precision, and responsiveness across a variety of transactions"

Cristian Joe
CEO @ BindHQ

“Tensorlake let us ship faster and stay reliable from day one. Complex stateful AI workloads that used to require serious infra engineering are now just long-running functions. As we scale, that means we can stay lean—building product, not managing infrastructure.”

Arpan Bhattacharya
Founder & CEO @ The Intelligent Search Company