Gemini 3 OCR - Quick Findings

Dec 5, 2025
|
2
min read

TL;DR

Gemini 3 has good OCR but is unstructured and limited. Tensorlake provides precise page slicing and well-structured JSON output with no cleanup required.

Tensorlake Integration vs Direct Gemini 3

Gemini 3-Pro brings strong OCR capabilities, but its raw output still needs significant structuring before it’s usable for downstream document workflows. While integrating Gemini 3 into our Document AI pipeline, I captured a few quick observations comparing direct Gemini 3 usage vs. running it through Tensorlake’s unified extraction layer.

PDF Handling

Gemini 3 accepts PDFs directly, but does not handle page slicing.
If you want to parse only a subset of pages, the control is limited, you have to manually split the PDF and stitch results back together.

Tensorlake supports precise page slicing out-of-the-box:

1[.code-block-title]Code[.code-block-title]parse_id = doc_ai.read(
2  file_url=file_url,
3  page_range="1-3",            # Parse only pages 1–3
4  parsing_options=parsing_options,
5)

Result: Users can extract specific pages or ranges without processing the entire document.

Output Structure

Gemini 3 can generate HTML, but the structure is not well organized for downstream use:

  • sections are not clearly separated
  • layout elements aren’t grouped
  • users must manually reorganize the structure

Let’s look at this example which is the top portion of an invoice pdf

Google 2024 Environmental Report - Water Use Table

Output from Gemini-3 generated html

1[.code-block-title]Code[.code-block-title]<!-- Page 1 -->
2<div class="page-container">
3  <div class="header">
4    <div class="company-info">
5      <h1>ARK GLOSS CLOTHING</h1>
6      <p>123 SAN SEBASTIAN ST.</p>
7      <p>LOS ANGELES, CA 90015 (US)</p>
8      <p>(123) 555-1234</p>
9      <p>info@arkglossclothing.com</p>
10      <p style="margin-top: 10px;">Sales Rep. :</p>
11    </div>
12
13    <div class="invoice-title">
14      <h1>I N V O I C E</h1>
15      <h2>INV-20212</h2>
16
17      <div class="invoice-details">
18        <table>
19          <tr><td>INVOICE DATE</td><td>01/23/2024</td></tr>
20          <tr><td>CUSTOMER TYPE</td><td>STORE</td></tr>
21          <tr><td>PO NUMBER</td><td></td></tr>
22          <tr><td>SHIP DATE</td><td>01/26/2024</td></tr>
23        </table>
24      </div>
25    </div>
26  </div>
27</div>
28...

Output as Tensorlake’s unified JSON (Gemini-3 plugged in):

1[.code-block-title]Code[.code-block-title]{
2  "page_number": 1,
3  "page_fragments": [
4    {
5      "fragment_type": "title",
6      "content": {
7        "content": "INVOICE"
8      },
9      "reading_order": 1
10    },
11    {
12      "fragment_type": "text",
13      "content": {
14        "content": "INV-20212"
15      },
16      "reading_order": 2
17    },
18    {
19      "fragment_type": "text",
20      "content": {
21        "content": "ARK GLOSS CLOTHING\n\n123 SAN SEBASTIAN ST.\nLOS ANGELES, CA 90015 (US)\n(123) 555-1234\ninfo@arkglossclothing.com"
22      },
23      "reading_order": 3
24    },
25    {
26      "fragment_type": "table",
27      "content": {
28        "content": "INVOICE DATE01/23/2024CUSTOMER TYPESTOREPO NUMBERSHIP DATE01/26/2024",
29        "html": "<table><tbody><tr><td>INVOICE DATE</td><td>01/23/2024</td></tr><tr><td>CUSTOMER TYPE</td><td>STORE</td></tr><tr><td>PO NUMBER</td><td></td></tr><tr><td>SHIP DATE</td><td>01/26/2024</td></tr></tbody></table>",
30        "markdown": "| INVOICE DATE | 01/23/2024 |\n| CUSTOMER TYPE | STORE |\n| PO NUMBER |  |\n| SHIP DATE | 01/26/2024 |"
31      },
32      "reading_order": 4
33    }
34  ]
35}
36...

How Tensorlake Differs

Tensorlake’s integration produces clean, well-organized structured output, including:

  • clear layout groups
  • well-defined document sections
  • table structures represented cleanly in both html and markdown
  • consistent fragment types that work across all OCR/VLM backends

Result: Developers receive a clean, predictable document structure without custom parsing or prompt iteration.

Advanced Usage vs. Simple Usage

Advanced Gemini users can approximate a similar level of structure with multiple prompt iterations and custom post-processing.With Tensorlake, users get a clean, structured result with a single API call.

What's Next

Try Tensorlake free

Want to discuss your specific use case?
Schedule a technical demo with our team.

Questions about the benchmark?
Join our Slack community

Related articles

No items found.

Get server-less runtime for agents and data ingestion

Data ingestion like never before.
TRUSTED BY PRO DEVS GLOBALLY

Tensorlake is the Agentic Compute Runtime the durable serverless platform that runs Agents at scale.

"At SIXT, we're building AI-powered experiences for millions of customers while managing the complexity of enterprise-scale data. TensorLake gives us the foundation we need—reliable document ingestion that runs securely in our VPC to power our generative AI initiatives."

Boyan Dimitrov
CTO, Sixt

“Tensorlake enabled us to avoid building and operating an in-house OCR pipeline by providing a robust, scalable OCR and document ingestion layer with excellent accuracy and feature coverage. Ongoing improvements to the platform, combined with strong technical support, make it a dependable foundation for our scientific document workflows.”

Yaroslav Sklabinskyi
CEO, Reliant AI

"For BindHQ customers, the integration with Tensorlake represents a shift from manual data handling to intelligent automation, helping insurance businesses operate with greater precision, and responsiveness across a variety of transactions"

Cristian Joe
CEO @ BindHQ

“Tensorlake let us ship faster and stay reliable from day one. Complex stateful AI workloads that used to require serious infra engineering are now just long-running functions. As we scale, that means we can stay lean—building product, not managing infrastructure.”

Arpan Bhattacharya
Founder & CEO @ The Intelligent Search Company