Blog » AI » Hybrid Retrieval in PostgreSQL

Hybrid Retrieval in PostgreSQL

Integrating BM25 and Dense Vectors with RRF

26. Juni 2026

Dense vector search is good at meaning. Lexical search is good at exact evidence. Real search and RAG systems usually need both.

A user might search for a concept:

"how do I retry a failed payment webhook"

Dense retrieval can find passages that say the same thing in different words:

"rerun the checkout callback after a transient gateway error"

But users also search for exact strings:

"ERR-8492 webhook_timeout_seconds"

For that query, exact lexical matching matters. An embedding might blur the identifier; BM25 will not.

m w5 Kurze Unterbrechung · mAIstack Von PoC zu Produktion. In Wochen, nicht Quartalen. Median-Time-to-first-Usecase: 5 Wochen · On-Premise · 100+ Connectors mAIstack Ansehen →

pgturbohybrid combines these signals inside PostgreSQL. It retrieves dense candidates from a TurboQuant graph, retrieves lexical candidates from a BM25-style text branch, and fuses both result lists inside one turbohybrid index access method. The default fusion method is reciprocal-rank fusion, or RRF.

Project status: dense + BM25 hybrid retrieval in pgturbohybrid is beta. It is suitable for serious evaluation and controlled deployments, but you should validate relevance, latency, memory, and operational behaviour on your own data.

Why hybrid retrieval exists

Dense retrieval and lexical retrieval have different failure modes.

Dense vector retrieval is strong when the query and the document use different words for the same idea:

Query:    "reset customer password"
Document: "users can recover account access through the forgot-password flow"

Lexical retrieval is strong when exact words matter:

Query:    "OAuthTokenRefreshError"
Document: "OAuthTokenRefreshError is raised when the refresh token is invalid"

Hybrid retrieval tries to reduce both kinds of misses.

The key is not that one branch is always better. The key is that they provide different evidence.

BM25 in one paragraph

BM25 is a classic keyword-ranking function. It rewards documents that contain the query terms, gives more weight to rare terms, and normalizes for document length so long documents do not win just because they contain more words.

A simplified mental model is:

BM25 contribution ≈ IDF(term) * normalized_term_frequency(term, document)

Where IDF increases when a term is rare in the corpus. Exact identifiers, product names, class names, error codes, and rare technical terms often become strong lexical signals.

Why RRF is a practical default

Dense vector distances and BM25 scores do not naturally live on the same scale. A cosine distance of 0.18 and a BM25 score of 7.4 are not directly comparable. You can calibrate scores, but calibration is workload-dependent.

RRF avoids that problem by fusing ranks rather than raw scores.

For two ranked lists, a simple RRF score is:

RRF(doc) = 1 / (k + dense_rank(doc))
         + 1 / (k + bm25_rank(doc))

If a document appears near the top of both lists, it gets a strong fused score. If it appears only in one list, it can still rank if that branch found it confidently.

This is why RRF is a good default for hybrid search: it is simple, robust, and does not require dense and lexical scores to be calibrated before fusion.

The PostgreSQL API

Start with a table containing text, embeddings, and a generated tsvector column:

CREATE EXTENSION vector;
CREATE EXTENSION pgturbohybrid;

CREATE TABLE documents (
    id        bigserial PRIMARY KEY,
    title     text,
    body      text,
    embedding vector(1536),
    body_tsv  tsvector GENERATED ALWAYS AS (
        to_tsvector(
            'english',
            coalesce(title, '') || ' ' || coalesce(body, '')
        )
    ) STORED
);

Create a hybrid TurboHybrid index with one dense key and one BM25 text key:

CREATE INDEX documents_turbohybrid_idx
ON documents
USING turbohybrid (
    embedding vector_cosine_turbohybrid_ops,
    body_tsv  bm25_tsvector_turbohybrid_ops
);

Query both branches through one turbohybrid_query(...) value:

SELECT id, title
FROM documents
ORDER BY embedding <~> turbohybrid_query(
    vector_query => $1::vector,
    text_query   => websearch_to_tsquery('english', $2),
    fusion       => 'rrf'
)
LIMIT 10;

The ORDER BY still references the dense column because that is how the access method is invoked. The query payload carries both the dense vector query and the text query.

What happens inside the index

A common PostgreSQL hybrid setup uses two index paths:

pgvector HNSW or IVFFlat for dense search;
PostgreSQL full-text search, often with GIN, for lexical search;
SQL to combine both result lists.

That works, but it means the two branches are planned and executed as separate retrieval operations. Candidate budgets, ranking, and fusion happen outside the index access method.

pgturbohybrid explores a different shape:

The dense branch and lexical branch are both internal to the turbohybrid index scan. That gives the access method direct control over candidate budgets, fusion, and diagnostics.

Candidate budgets

The query constructor accepts optional candidate budgets:

SELECT id, title
FROM documents
ORDER BY embedding <~> turbohybrid_query(
    vector_query => $1::vector,
    text_query   => websearch_to_tsquery('english', $2),
    fusion       => 'rrf',
    dense_k      => 200,
    bm25_k       => 200,
    rrf_k        => 60,
    final_k      => 10
)
LIMIT 10;

The important fields are:

Parameter	Meaning
`dense_k`	How many dense candidates to collect before fusion
`bm25_k`	How many lexical candidates to collect before fusion
`rrf_k`	RRF smoothing constant
`final_k`	Final top-k target, usually inferred from SQL `LIMIT`
`fusion`	Fusion strategy, usually `'rrf'` for the default path

For most applications, the best first version is the simpler query:

SELECT id, title
FROM documents
ORDER BY embedding <~> turbohybrid_query(
    vector_query => $1::vector,
    text_query   => websearch_to_tsquery('english', $2)
)
LIMIT 10;

Let the profile defaults and SQL LIMIT drive the first experiment. Then tune candidate budgets only after measuring relevance and latency.

The BM25 branch is not just a wrapper around `ts_rank`

The lexical branch in pgturbohybrid has its own BM25-oriented storage inside the index. Conceptually it includes:

BM25 metadata
    -> document statistics
    -> lexicon
    -> postings
    -> block-max data
    -> impact data
    -> delta pages for incremental changes

This layout allows the query path to choose different strategies depending on query shape and corpus statistics.

Examples:

A single-term query may use an impact shortcut.
A pure OR query can use impact-style traversal when safe.
A pure AND query can use the rarest term as the driver and verify other terms.
General multi-term queries can use WAND or DAAT-style scoring.
SIMD scoring paths can accelerate encoded postings.
Hot postings and doc statistics can be cached.

For a search engineer, the important point is that BM25 candidate generation is an index-native part of the retrieval path, not a separate SQL subquery bolted on after dense search.

RRF fusion inside the access method

After both branches produce candidates, pgturbohybrid fuses them by document identity.

This matters because a document that appears in both branches gets evidence from both. A document that appears only in one branch is still eligible, but it must be strong enough in that branch to survive fusion.

In practice, that gives hybrid search a useful behaviour:

semantic-only match       -> can rank through dense branch
exact-only match          -> can rank through BM25 branch
semantic + exact evidence -> often ranks strongly

Other fusion modes

RRF is the safe default. pgturbohybrid also exposes score-level fusion modes such as weighted, fast_weighted, and calibrated.

For example:

SELECT id, title
FROM documents
ORDER BY embedding <~> turbohybrid_query(
    vector_query => $1::vector,
    text_query   => websearch_to_tsquery('english', $2),
    fusion       => 'calibrated'
)
LIMIT 10;

These modes are useful for experiments where you want normalized dense and BM25 scores to interact directly. They are not the same as RRF. RRF fuses ranks. Score-level modes fuse normalized scores. That distinction matters when interpreting results.

As a rule of thumb:

Fusion mode	Best first use
`rrf`	Default hybrid retrieval; robust across score scales
`weighted`	Controlled score-fusion experiments
`fast_weighted`	Score-fusion path with additional pruning opportunities
`calibrated`	Query-shape-aware score fusion experiments

For a first production evaluation, start with RRF.

Query examples

General RAG search

SELECT id, title, body
FROM documents
ORDER BY embedding <~> turbohybrid_query(
    vector_query => $1::vector,
    text_query   => websearch_to_tsquery('english', $2),
    fusion       => 'rrf'
)
LIMIT 10;

Identifier-heavy query

When you expect exact identifiers to matter, keep the BM25 branch active and consider requiring a BM25 match:

SELECT id, title, body
FROM documents
ORDER BY embedding <~> turbohybrid_query(
    vector_query       => $1::vector,
    text_query         => websearch_to_tsquery('simple', $2),
    fusion             => 'rrf',
    require_bm25_match => true
)
LIMIT 10;

This is useful for workflows where semantic neighbours without the exact identifier are not acceptable.

Quality-oriented run

SET turbohybrid.profile = 'quality';

SELECT id, title, body
FROM documents
ORDER BY embedding <~> turbohybrid_query(
    vector_query => $1::vector,
    text_query   => websearch_to_tsquery('english', $2),
    fusion       => 'rrf',
    dense_k      => 400,
    bm25_k       => 400,
    rrf_k        => 60,
    final_k      => 10
)
LIMIT 10;

Use this kind of query for relevance evaluation, not as a default before you know the latency tradeoff.

Diagnostics

Hybrid retrieval has more moving parts than dense-only search. You want to know which branch did work, how many candidates were scored, whether caches were cold, whether WAND pruned postings, and whether candidate budgets are too high or too low.

pgturbohybrid exposes scan diagnostics:

SELECT turbohybrid_last_scan_stats();

SELECT turbohybrid_last_scan_diagnosis();

The detailed stats are JSON. A typical workflow is:

SET turbohybrid.profile = 'latency';

SELECT id, title
FROM documents
ORDER BY embedding <~> turbohybrid_query(
    vector_query => $1::vector,
    text_query   => websearch_to_tsquery('english', $2)
)
LIMIT 10;

SELECT turbohybrid_last_scan_stats();

Look for fields that tell you:

how many dense candidates were collected;
how many BM25 terms and postings were processed;
whether WAND or impact paths were used;
whether caches were cold or reused;
whether exact rescore dominated latency;
whether the scan was candidate-budget-heavy.

The purpose is not to memorize every field. The purpose is to build a tuning loop:

run representative queries
    -> inspect branch and fusion stats
    -> adjust profile or candidate budgets
    -> measure relevance and latency again

Benchmark framing without turning it into a fight

It is tempting to make hybrid retrieval a competition between engines. That is usually less useful than comparing architecture tradeoffs.

A traditional PostgreSQL hybrid setup might be:

pgvector HNSW
    + PostgreSQL full-text search
    + SQL-level RRF

pgturbohybrid’s experiment is:

TurboQuant dense graph
    + BM25 branch
    + index-internal RRF
    inside one PostgreSQL access method

The right question is not “which engine wins every benchmark?” The better question is:

For this application, do we want retrieval to stay inside PostgreSQL, and can one index access method give us the relevance, latency, and operational simplicity we need?

For PostgreSQL-first RAG systems, that is a very practical question.

Caveats

Hybrid retrieval is powerful, but it is not a substitute for relevance evaluation.

Things to validate:

query distribution: short keyword queries behave differently from long natural-language questions;
corpus shape: documentation, support tickets, product catalogues, and code search have different lexical patterns;
embedding model: dense branch quality depends heavily on the model;
language configuration: english, simple, and custom text search configurations produce different lexemes;
candidate budgets: too low can miss good results; too high can waste latency;
operational profile: cache behaviour, concurrent backends, and memory settings matter.

Also remember that BM25 scoring is bag-of-words. Phrase-like tsquery input contributes terms to BM25 scoring, but BM25 itself is not a positional phrase scorer. If exact phrase semantics are critical, validate that separately.

Summary

Hybrid retrieval works because semantic and lexical signals complement each other.

Dense search finds meaning:

"payment callback failed" ≈ "checkout webhook error"

BM25 finds exact evidence:

"ERR-8492" means ERR-8492

RRF combines the two ranked lists without pretending their raw scores are naturally comparable.

pgturbohybrid’s contribution is architectural: dense candidate generation, BM25 candidate generation, and fusion happen inside one PostgreSQL index access method. The result is a hybrid retrieval path that still looks like normal SQL:

ORDER BY embedding <~> turbohybrid_query(...)
LIMIT 10

Dense + BM25 hybrid retrieval covers many practical RAG and search workloads. But some retrieval models do not produce one vector per document at all. ColBERT-style late interaction keeps many token vectors per row and scores documents with MaxSim.

That is the next step.

P.S.

Gefällt dir, wie wir über KI denken?

Diese drei Produkte sind genau aus diesem Denken heraus entstanden. Jedes davon aus einem direkten Kunden-Need heraus, nicht aus einer Pitch-Deck-Session. Wenn eines davon zu deiner aktuellen Baustelle passt: Ein Gespräch. Kostenlos. Kein Commitment.

→ Foundation

mAIstack

KI-Fundament für Unternehmen. On-prem.
Einsatzbereit in Wochen, nicht Quartalen.

→ Platform

Amicable

Citizen Developer bauen Apps, IT hält die Kontrolle.
So wird aus Schatten-IT eine sanktionierte Plattform.

→ Voice

Enterprise VoiceAI

Realtime S2S, keine SaaS-Pipeline. Integriert in alle gängigen Telefonanlagen.

– Björn, Johann & die Crew

Dein Kontakt zu uns

Johann-Peter Hartmann

Johann bezeichnet sich als „Chief Tailwind Officer“ und sieht Rückenwind für Kollegen, Teams und das Unternehmen als seine oberste Aufgabe. Er mag tatsächlich seine Arbeit mit den Kollegen sehr und ist dankbar dafür, mit ihnen zusammenarbeiten zu dürfen.

Dein Thema?

Das Thema interessiert dich? Wenn Du fragen hast, dann melde dich ganz unverbindlich bei uns!

Kommentare

Schreibe einen Kommentar Antwort abbrechen

Newsletter

Aktuelle Artikel per Mail

Verpasse keinen Artikel mehr und lass Dich von uns benachrichtigen, sobald es etwas Neues im Blog gibt.

Für das Handling unseres Newsletters nutzen wir den Dienst HubSpot. Mehr Informationen, insbesondere auch zu Deinem Widerrufsrecht, kannst Du jederzeit unserer Datenschutzerklärung entnehmen.