Keep AI grounded: the 4 techniques that actually matter
Why some AI tools confidently make things up — and the four techniques that meaningfully reduce hallucination in document Q&A.
"Grounded" is a word the AI industry uses a lot, with a lot of marketing-speak around it. Boiled down, it means: the AI's answers come from a specific source, not from the model's training data or imagination.
Four techniques meaningfully improve grounding. Most "AI for documents" tools use one or two. Real grounding needs all four.
1. Retrieval-augmented generation (RAG)
Don't stuff the whole document into context. Index it into chunks. At question time, retrieve only the chunks relevant to the question and feed those to the model.
Why this matters: when the model only sees relevant context, it has nothing to bluff with. When it sees a 200-page document at once, it skims and invents transitions.
Watch out: RAG is only as good as its retrieval. Poor chunking or weak embedding models = the right answer is in your library but the retrieval never surfaces it.
2. Citation-required prompting
Tell the model, in its system prompt, that every answer must cite the chunk(s) it used. Refuse answers without citations.
Why this matters: the act of having to cite forces the model to actually use what it retrieved instead of pattern-matching from training data.
Watch out: models can fake citations if you don't verify. Always click through.
3. Refusal training
Tell the model to refuse — explicitly, with a templated message — when the retrieved chunks don't actually answer the question.
Why this matters: the most common failure mode is "model finds chunks tangentially related to the question and bluffs an answer." Explicit refusal prompts cut this dramatically.
Watch out: over-refusal. Tune carefully. A model that refuses too often becomes unusable.
4. Re-ranking
Between retrieval and the LLM, run a re-ranker that scores each retrieved chunk against the question and culls weak matches.
Why this matters: vector search finds semantically similar chunks, which often includes near-misses. A re-ranker filters those out so the LLM gets a tight, focused set.
Watch out: re-rankers add latency and cost. Worth it for accuracy-critical use cases (legal, medical); overkill for casual Q&A.
What "good" looks like in practice
A grounded AI document tool will:
- Refuse when your library is silent on a topic.
- Quote the literal text it retrieved, with the page number.
- Give the same answer when asked the same question twice (no drift).
- Not invent facts that aren't in your files, even when the user pushes for an answer.
If a tool you're evaluating fails any of these tests, it's not actually grounded — it's just a chatbot with file attachments.
A test you can run
Upload a single PDF you know intimately. Ask it a question that has a definite answer on page X. Ask it a follow-up question about something that's not in the PDF at all. A grounded tool answers the first with the page citation and refuses the second.
Most tools fail on the second. The ones that pass are the ones worth paying for.
Like this? Get the next one in your inbox.
Weekly tips on getting more out of your file library — RAG, retrieval tricks, and product updates. No spam.
Try it free
Ask your files anything. Get answers with citations.
50 welcome credits. 3 assistants. No credit card. Upload your first file in under two minutes.