Gemini 3 OCR - Quick Findings

TL;DR
Gemini 3 has good OCR but is unstructured and limited. Tensorlake provides precise page slicing and well-structured JSON output with no cleanup required.
Tensorlake Integration vs Direct Gemini 3
Gemini 3-Pro brings strong OCR capabilities, but its raw output still needs significant structuring before it’s usable for downstream document workflows. While integrating Gemini 3 into our Document AI pipeline, I captured a few quick observations comparing direct Gemini 3 usage vs. running it through Tensorlake’s unified extraction layer.
PDF Handling
Gemini 3 accepts PDFs directly, but does not handle page slicing.
If you want to parse only a subset of pages, the control is limited, you have to manually split the PDF and stitch results back together.
Tensorlake supports precise page slicing out-of-the-box:
1[.code-block-title]Code[.code-block-title]parse_id = doc_ai.read(
2 file_url=file_url,
3 page_range="1-3", # Parse only pages 1–3
4 parsing_options=parsing_options,
5)Result: Users can extract specific pages or ranges without processing the entire document.
Output Structure
Gemini 3 can generate HTML, but the structure is not well organized for downstream use:
- sections are not clearly separated
- layout elements aren’t grouped
- users must manually reorganize the structure
Let’s look at this example which is the top portion of an invoice pdf

Output from Gemini-3 generated html
1[.code-block-title]Code[.code-block-title]<!-- Page 1 -->
2<div class="page-container">
3 <div class="header">
4 <div class="company-info">
5 <h1>ARK GLOSS CLOTHING</h1>
6 <p>123 SAN SEBASTIAN ST.</p>
7 <p>LOS ANGELES, CA 90015 (US)</p>
8 <p>(123) 555-1234</p>
9 <p>info@arkglossclothing.com</p>
10 <p style="margin-top: 10px;">Sales Rep. :</p>
11 </div>
12
13 <div class="invoice-title">
14 <h1>I N V O I C E</h1>
15 <h2>INV-20212</h2>
16
17 <div class="invoice-details">
18 <table>
19 <tr><td>INVOICE DATE</td><td>01/23/2024</td></tr>
20 <tr><td>CUSTOMER TYPE</td><td>STORE</td></tr>
21 <tr><td>PO NUMBER</td><td></td></tr>
22 <tr><td>SHIP DATE</td><td>01/26/2024</td></tr>
23 </table>
24 </div>
25 </div>
26 </div>
27</div>
28...Output as Tensorlake’s unified JSON (Gemini-3 plugged in):
1[.code-block-title]Code[.code-block-title]{
2 "page_number": 1,
3 "page_fragments": [
4 {
5 "fragment_type": "title",
6 "content": {
7 "content": "INVOICE"
8 },
9 "reading_order": 1
10 },
11 {
12 "fragment_type": "text",
13 "content": {
14 "content": "INV-20212"
15 },
16 "reading_order": 2
17 },
18 {
19 "fragment_type": "text",
20 "content": {
21 "content": "ARK GLOSS CLOTHING\n\n123 SAN SEBASTIAN ST.\nLOS ANGELES, CA 90015 (US)\n(123) 555-1234\ninfo@arkglossclothing.com"
22 },
23 "reading_order": 3
24 },
25 {
26 "fragment_type": "table",
27 "content": {
28 "content": "INVOICE DATE01/23/2024CUSTOMER TYPESTOREPO NUMBERSHIP DATE01/26/2024",
29 "html": "<table><tbody><tr><td>INVOICE DATE</td><td>01/23/2024</td></tr><tr><td>CUSTOMER TYPE</td><td>STORE</td></tr><tr><td>PO NUMBER</td><td></td></tr><tr><td>SHIP DATE</td><td>01/26/2024</td></tr></tbody></table>",
30 "markdown": "| INVOICE DATE | 01/23/2024 |\n| CUSTOMER TYPE | STORE |\n| PO NUMBER | |\n| SHIP DATE | 01/26/2024 |"
31 },
32 "reading_order": 4
33 }
34 ]
35}
36...How Tensorlake Differs
Tensorlake’s integration produces clean, well-organized structured output, including:
- clear layout groups
- well-defined document sections
- table structures represented cleanly in both html and markdown
- consistent fragment types that work across all OCR/VLM backends
Result: Developers receive a clean, predictable document structure without custom parsing or prompt iteration.
Advanced Usage vs. Simple Usage
Advanced Gemini users can approximate a similar level of structure with multiple prompt iterations and custom post-processing.With Tensorlake, users get a clean, structured result with a single API call.
What's Next
Want to discuss your specific use case?
Schedule a technical demo with our team.
Questions about the benchmark?
Join our Slack community
Related articles
Get server-less runtime for agents and data ingestion
Tensorlake is the Agentic Compute Runtime the durable serverless platform that runs Agents at scale.