Hybrid search explained — combining vector + BM25
The retrieval pattern every production RAG system converges on. Why neither vector search nor keyword search wins alone.
If you've built a vector-only RAG system, you've probably hit the same wall everyone hits: it crushes some queries and inexplicably fails others. The failures are almost always queries with literal, specific terms — a section number, an ID, a code, a rare proper noun.
Hybrid search fixes this by running vector search and keyword search in parallel and merging.
Why pure vector search fails on specific terms
Vector embeddings are designed to capture meaning. "Severance pay" and "termination compensation" become close vectors because they mean similar things. That's the strength.
The weakness: rare, specific terms get smoothed out. "G.R. No. 12345" might embed as a number-y vector that doesn't differentiate from "G.R. No. 12346" or even "case 12345." The semantic similarity is high; the user wanted exact match.
Why pure keyword search fails on conceptual queries
BM25 / TF-IDF rank by literal word overlap. "Severance pay" finds chunks with "severance" or "pay" but misses chunks talking about "termination compensation" or "separation benefits" — which is what the user actually wanted.
Keyword search is high-precision-low-recall on concepts.
The hybrid pattern
Run both, merge intelligently.
query → vector search (top 30) →┐
├→ dedupe → re-rank → top 10 → LLM
query → keyword search (top 30) →┘
Variations on the merge:
- Reciprocal rank fusion (RRF): combine rankings from each searcher into a unified score. Robust, doesn't require tuning.
- Score interpolation:
final_score = α × vector_score + (1-α) × keyword_score. Requires tuning α (typically 0.5–0.7 in our testing). - Cascade: keyword first, vector as a fallback (or vice versa). Simpler but loses some recall.
We use RRF + a re-ranker for final ordering.
What "good" looks like
Test cases that should all work in a hybrid system:
| Query type | Example | What should retrieve | |---|---|---| | Conceptual | "What does the contract say about indemnity?" | Indemnity-discussing chunks even if "indemnity" isn't the heading | | Exact phrase | "Where does it say 'gross negligence'?" | The literal clause | | ID / code | "G.R. No. 12345" | The case digest with that number | | Synonym | "Remote work policy" | Chunks labelled "telecommute" or "work from anywhere" | | Multi-term | "Late fee for first month rent" | Chunks discussing late fees specifically for first-month rent |
A vector-only system misses cases 2 and 3. A keyword-only system misses cases 1 and 4. Hybrid catches all five.
The re-ranker
After merging, a cross-encoder (a small dedicated model) scores each candidate against the query for relevance. This catches cases where the retrieval pulled tangentially-related chunks. The re-ranker is the difference between "decent" and "tight" retrieval.
Implementation note (Postgres-native)
If you're building this yourself on Postgres:
- pgvector for vectors with an HNSW index.
tsvector+tsqueryfor keyword search.- Compute scores from both, RRF in SQL or application code.
- Cohere or BGE re-ranker for the final pass.
This stack scales to millions of chunks before you need a dedicated vector DB. It's what SeekFiles AI runs on.
Cost
Running both retrievers is cheap. The hot cost is the re-ranker (one model call per query). For high-volume use cases, cache the re-ranking by (assistant_id, normalised_query) hash — most user queries repeat.
Hybrid is the default for production RAG in 2026. If a system you're evaluating is vector-only, you'll feel it on the queries where literal match matters.
Like this? Get the next one in your inbox.
Weekly tips on getting more out of your file library — RAG, retrieval tricks, and product updates. No spam.
Try it free
Ask your files anything. Get answers with citations.
50 welcome credits. 3 assistants. No credit card. Upload your first file in under two minutes.