Semantic Search - The enterprise reality

About:

We often hear people telling us,

“Oh! semantic search was just storing embeddings and leveraging Vector DB. Can be done in a month, right?”

Well! If it’s just that, it could be done pretty shortly. But getting it to production and having it usable for enterprise users requires more than that!

This blog talks about some of the problems that you’ll have to solve to get a semantic search implementation to production for enterprise users!

semantic_search

Eval choices:

Eval is an important part of any AI problem and Semantic Search isn’t any exception. There is a wide variety of eval choices that you have to choose from. The first choice that you have to make is to ensure you’re choosing the right metric that meets your business user’s needs.

For example, if your user(or use case) demands that the most relevant result to stay on top compared to a result that is relatively less relevant, then probably nDCG is what you need.

If you mainly care that all relevant results rank above irrelevant ones (without strict ordering among relevant results), then **MAP should be better for you.

But the point is, you have to make a call and align your feedback loop accordingly! And it’s very important to ensure that it aligns with your business needs.

Ground truth prep:

Once you’re fixed on your eval metric, then you’ll have to start preparing the ground truth for your evaluation. This would involve ensuring that,

You’re starting with pretty much representative data of your production data.
You’re starting with representative queries.
You’ll need a reliable way to label the data,
1. Human feedback from your system.
2. (And) by doing an automated labelling using LLMs.

Custom Chunking:

How you split your data (chunking) can make or break retrieval quality. And based on the data that you’re handling, the chunking would vary. And typically for a system that handles a variety of data that are served through unified search, you’ll need custom chunking.

For example, a simple “RecursiveCharacterTextSplitter” might do good for simple texts like names and short descriptions. Whereas a subtitle would need a custom chunker of its own. Though all are text, there is “No silver bullet” that solves it all.

Hard negatives:

Hard negatives are “negative results” that are identified as “more relevant” than the actual relevant results.

This would need a deeper analysis, with fixes ranging from fixes ranging from stop-word patterns and re-rankers to query rewrites, switching models, and (as a last resort) fine-tuning.

Modality gap:

When you’re implementing systems that have to support multi-modal semantic search, you’ll have to solve the problem of disparity among different modalities.

Fact: the similarity score range is different for different modalities.

If 2 similar texts have a similarity score in the range of 0.9. A text and a corresponding image would be having a similarity score in the range of 0.3-0.4. You’ll be required to solve for this disparity either through normalization or by using custom training techniques.

Inherent limitations of embedding:

Embeddings can feel magical. But it works only for semantically rich contents.

Embeddings (specifically bi-encoders) feel like magic until they encounter non-verbose or structured data. They struggle when:

Context is missing: e.g., “files updated yesterday.”
Data is non-semantic: Booleans, dates, or ID numbers.
Negation/Nuance is key: As shown below, the model struggles to distinguish between “enabled” and “disabled” because their vector representations are nearly identical.

Text1(object data)	Text2(query data)	Cosine Similarity
Enabled:true	enabled asset	0.95408654
Enabled:false	enabled asset	0.9494107

As you can see, the cosine similarity is almost identical for semantically opposite contents.

Google has published a research paper and a related dataset to prove (specifically point#3 mentioned above).

These limitations can often be mitigated using techniques like query re-writes along with hybrid(embeddings + lexical) search, late-interaction models(ColBERT), etc, and each has their own trade-off.

Summary

It’s true that solving semantic search is quick to put it together. But getting it done right requires meticulous effort. Do talk to us if you’re in the process of getting semantic search implemented for your enterprise.