Back to blog
May 16, 20264 min readRAGengineering

Hybrid search explained — combining vector + BM25

The retrieval pattern every production RAG system converges on. Why neither vector search nor keyword search wins alone.

If you've built a vector-only RAG system, you've probably hit the same wall everyone hits: it crushes some queries and inexplicably fails others. The failures are almost always queries with literal, specific terms — a section number, an ID, a code, a rare proper noun.

Hybrid search fixes this by running vector search and keyword search in parallel and merging.

Why pure vector search fails on specific terms

Vector embeddings are designed to capture meaning. "Severance pay" and "termination compensation" become close vectors because they mean similar things. That's the strength.

The weakness: rare, specific terms get smoothed out. "G.R. No. 12345" might embed as a number-y vector that doesn't differentiate from "G.R. No. 12346" or even "case 12345." The semantic similarity is high; the user wanted exact match.

Why pure keyword search fails on conceptual queries

BM25 / TF-IDF rank by literal word overlap. "Severance pay" finds chunks with "severance" or "pay" but misses chunks talking about "termination compensation" or "separation benefits" — which is what the user actually wanted.

Keyword search is high-precision-low-recall on concepts.

The hybrid pattern

Run both, merge intelligently.

query → vector search (top 30) →┐
                                 ├→ dedupe → re-rank → top 10 → LLM
query → keyword search (top 30) →┘

Variations on the merge:

  • Reciprocal rank fusion (RRF): combine rankings from each searcher into a unified score. Robust, doesn't require tuning.
  • Score interpolation: final_score = α × vector_score + (1-α) × keyword_score. Requires tuning α (typically 0.5–0.7 in our testing).
  • Cascade: keyword first, vector as a fallback (or vice versa). Simpler but loses some recall.

We use RRF + a re-ranker for final ordering.

What "good" looks like

Test cases that should all work in a hybrid system:

| Query type | Example | What should retrieve | |---|---|---| | Conceptual | "What does the contract say about indemnity?" | Indemnity-discussing chunks even if "indemnity" isn't the heading | | Exact phrase | "Where does it say 'gross negligence'?" | The literal clause | | ID / code | "G.R. No. 12345" | The case digest with that number | | Synonym | "Remote work policy" | Chunks labelled "telecommute" or "work from anywhere" | | Multi-term | "Late fee for first month rent" | Chunks discussing late fees specifically for first-month rent |

A vector-only system misses cases 2 and 3. A keyword-only system misses cases 1 and 4. Hybrid catches all five.

The re-ranker

After merging, a cross-encoder (a small dedicated model) scores each candidate against the query for relevance. This catches cases where the retrieval pulled tangentially-related chunks. The re-ranker is the difference between "decent" and "tight" retrieval.

Implementation note (Postgres-native)

If you're building this yourself on Postgres:

  • pgvector for vectors with an HNSW index.
  • tsvector + tsquery for keyword search.
  • Compute scores from both, RRF in SQL or application code.
  • Cohere or BGE re-ranker for the final pass.

This stack scales to millions of chunks before you need a dedicated vector DB. It's what SeekFiles AI runs on.

Cost

Running both retrievers is cheap. The hot cost is the re-ranker (one model call per query). For high-volume use cases, cache the re-ranking by (assistant_id, normalised_query) hash — most user queries repeat.

Hybrid is the default for production RAG in 2026. If a system you're evaluating is vector-only, you'll feel it on the queries where literal match matters.

Newsletter

Like this? Get the next one in your inbox.

Weekly tips on getting more out of your file library — RAG, retrieval tricks, and product updates. No spam.

no spam · unsubscribe in one click

Try it free

Ask your files anything. Get answers with citations.

50 welcome credits. 3 assistants. No credit card. Upload your first file in under two minutes.

We use cookies

We use essential cookies for sign-in and session security, plus local storage for your theme preference. We don't set third-party advertising cookies. See our Privacy Policy.