How to chat with scanned (OCR'd) PDFs
Scanned PDFs are images of text. Here's what makes them work — or fail — when you try to ask an AI questions about them.
Half the PDFs in the world are scans of paper documents. Court filings, old contracts, receipts, tax forms, medical records. Most AI tools quietly fail on these because there's no text layer — just pixels.
Here's what actually works.
Why scanned PDFs break naive AI tools
A "born-digital" PDF has a text layer the AI can read directly. A scanned PDF is just an image; the AI needs OCR (optical character recognition) to extract text before it can answer anything. Without OCR, the model sees an empty document and either refuses or hallucinates.
SeekFiles AI runs OCR automatically on upload. That covers ~95% of cases. The remaining 5% needs technique.
Best practices for scan quality
- Resolution: 300 DPI or higher. Most phone cameras default to plenty; flatbed scanners default to 200 — bump them up.
- Straight, well-lit, full-frame. Skewed or shadowed scans halve OCR accuracy.
- One page per image when shooting on phone. Combining 4 pages into one phone-photo destroys readability.
- Use scan apps over raw photos. Apps like Adobe Scan, Microsoft Lens, or your phone's Notes app auto-correct skew and contrast.
Workflow: scanned contracts
- Photograph each page with a scan app → exports a clean PDF.
- Upload to a SeekFiles assistant scoped to "Contracts."
- Ask: "Find the indemnification clause. Quote it verbatim and cite the page."
- If the OCR pulled it cleanly, the answer comes with the quote + page. If OCR struggled (handwritten edits, faded ink), the assistant may say "not found" — re-scan the offending page at higher contrast.
Workflow: old printed reviewers
Bar exam reviewers, board exam materials, old textbooks — these scan beautifully because the print is high-contrast and the layout is regular.
- Scan or photograph the entire reviewer (yes, all 400 pages).
- Upload as one PDF (we handle up to 200 MB per file).
- Build a study assistant scoped to it.
- Quiz yourself — the model will cite the reviewer page for every answer.
What scanned PDFs can't do (yet)
- Handwritten margin notes. OCR misses these.
- Stamped seals or signatures. AI won't "read" a signature; it sees it as a glyph.
- Tables with merged cells from low-resolution scans. Cell structure often gets mangled.
For those, you still need a human eye on the source. SeekFiles will tell you when it's unsure rather than guess.
Bonus — multilingual scans
If your scan is in any non-English language we support, OCR + retrieval work the same way. Just keep questions in the same language as the document for best recall.
Scanned PDFs are a solved problem with the right preparation. The trick is good capture in, then trust the system.
Like this? Get the next one in your inbox.
Weekly tips on getting more out of your file library — RAG, retrieval tricks, and product updates. No spam.
Try it free
Ask your files anything. Get answers with citations.
50 welcome credits. 3 assistants. No credit card. Upload your first file in under two minutes.