Search Engineering

How to Modernize ElasticSearch
with Semantic Search and Embeddings

Your users search with intent, not keywords. Here is how to add semantic understanding to ElasticSearch without ripping out your existing infrastructure.

ElasticSearch powers search for millions of applications. But keyword-based BM25 search has a fundamental limitation: it matches words, not meaning. A user searching for "comfortable shoes for running" won't find documents that mention "cushioned athletic footwear." Semantic search solves this by matching on meaning through vector embeddings. The good news: you don't need to replace ElasticSearch. Since version 8.0, Elastic supports native vector search alongside traditional BM25, giving you the best of both worlds.

Why Hybrid Search Beats Pure Approaches

Pure keyword search misses synonyms and intent. Pure semantic search misses exact terms and entity names (product IDs, error codes, brand names). Hybrid search combines both and consistently outperforms either approach alone.

Search Type Strengths Weaknesses
BM25 (Keyword)Exact matches, entity names, codesNo synonym understanding
Vector (Semantic)Intent understanding, synonymsMisses exact terms, higher latency
HybridBoth strengths combinedMore complex to tune

Step 1: Add a Dense Vector Field to Your Index

# Add embedding field to existing index mapping

PUT /products/_mapping

{

"properties": {

"description_embedding": {

"type": "dense_vector",

"dims": 1536,

"index": true,

"similarity": "cosine"

}

}

}

Step 2: Generate and Index Embeddings

Use an embedding model (OpenAI text-embedding-3-small, Cohere embed-v3, or a self-hosted model) to generate vector representations of your documents. Index these alongside your existing text fields.

# Backfill embeddings for existing documents

from elasticsearch import Elasticsearch

from openai import OpenAI

 

es = Elasticsearch("http://localhost:9200")

openai_client = OpenAI()

 

def backfill_embeddings(index_name: str):

for doc in scroll_all_documents(index_name):

embedding = openai_client.embeddings.create(

model="text-embedding-3-small",

input=doc["description"]

).data[0].embedding

es.update(index=index_name, id=doc["_id"],

body={"doc": {"description_embedding": embedding}})

Step 3: Implement Hybrid Search

# Hybrid query combining BM25 + kNN vector search

query = {

"query": {

"match": {

"description": "comfortable shoes for running"

}

},

"knn": {

"field": "description_embedding",

"query_vector": query_embedding,

"k": 10,

"num_candidates": 100

},

"rank": {

"rrf": {} # Reciprocal Rank Fusion to combine scores

}

}

Reciprocal Rank Fusion (RRF) is the recommended scoring strategy. It normalizes scores from both BM25 and vector search and combines them without requiring manual weight tuning.

Step 4: Add Re-Ranking for Quality

For the highest search quality, add a cross-encoder re-ranker as a final stage. ElasticSearch retrieves candidates using hybrid search, then a cross-encoder (like a Cohere reranker or a custom BERT model) re-scores the top 20-50 results for precise relevance ranking.

Migration Strategy: Zero Downtime

You don't need to reindex everything at once. Use this phased approach:

  1. Add the vector field to your mapping (non-breaking change).
  2. Backfill embeddings in batches during off-peak hours.
  3. Run hybrid search in shadow mode - log both BM25-only and hybrid results, compare quality.
  4. Switch to hybrid search once quality metrics confirm improvement.
  5. Add re-ranking as a second optimization pass after hybrid is stable.

For a comparison of dedicated vector databases when you outgrow ElasticSearch's vector capabilities, see our vector DB comparison. For using semantic search as part of a RAG pipeline, read about fixing RAG failures with agentic AI.

Frequently Asked Questions

Should I replace ElasticSearch with a vector database?

Not necessarily. If you need full-text search, faceting, aggregations, and vector search, ElasticSearch handles all of them. Replace only if vector search is your primary use case and you need better performance at scale, in which case look at Pinecone or Weaviate.

What embedding model should I use?

For English text: OpenAI text-embedding-3-small (good balance of quality and cost), Cohere embed-v3 (strong multilingual), or sentence-transformers/all-MiniLM-L6-v2 (free, self-hosted, fast). Match the model's dimension count to your dense_vector field configuration.

How much does this add to my ElasticSearch costs?

Embeddings increase storage by approximately 6 KB per document (for 1536-dimensional vectors). kNN search adds 20-50ms per query. The embedding API costs (if using OpenAI) are approximately $0.02 per 1M tokens for the backfill, then per-query embedding costs. See our guide on reducing OpenAI costs for optimization strategies.

Upgrade Your Search

We modernize search infrastructure with semantic capabilities. From ElasticSearch optimization to full RAG pipelines.

Modernize Your Search
© 2026 EkaivaKriti. All rights reserved.