Semantic vs vector vs hybrid search: the most confusable topic on the AI-103 exam
A technical explainer distinguishing semantic, vector, and hybrid search β what each does under the hood, when to use which, and how Azure AI Search implements all three for RAG grounding on the AI-103 exam.
Ask ten AI-103 candidates to define semantic, vector, and hybrid search and you will get ten overlapping, half-right answers. This is the single most confusable topic on the AI-103 exam, and it shows up across both the generative/agentic and information extraction domains because it underpins RAG grounding. Here is the clear version.
The three approaches, in one paragraph each
Keyword (lexical / BM25) search β the baseline
Classic keyword search ranks documents by term overlap, weighted by term frequency and rarity (the BM25 algorithm). It is exact and fast, and it excels at precise tokens β product IDs, error codes, names, SKUs. Its weakness: it does not understand meaning. Search "car" and it will not match "automobile" unless you have synonyms configured. It is the foundation the other approaches build on.
Vector search β meaning by embeddings
Vector search converts both your query and your documents into embeddings (high-dimensional numeric vectors) and ranks by similarity (typically cosine distance) in that space. Because embeddings capture meaning, "car" and "automobile" land near each other, so vector search handles paraphrases and synonyms well. Its weakness is the mirror image of keyword search: it can miss exact tokens β a specific invoice number or a rare product code may not be well represented in embedding space. For the foundations, see our embeddings & vector search guide.
Semantic search β reranking with language understanding
Semantic search (in Azure AI Search, the semantic ranker) is a reranking step applied on top of an initial result set. It uses a language model to re-score and reorder candidate results by how well they actually answer the query, and it can surface captions and answers. Crucially, semantic ranking does not replace retrieval β it improves the ordering of whatever candidates the retrieval step returned. Think of it as a quality pass, not a retrieval strategy by itself.
Hybrid search runs both keyword (BM25) and vector retrieval and fuses their results (Azure AI Search uses Reciprocal Rank Fusion to merge the two ranked lists). You get keyword precision (exact IDs, rare terms) andmeaning-based recall (paraphrases, synonyms) in one query. Add the semantic ranker on top and you have the strongest default for RAG grounding.
Under the hood: what actually happens
Approach
Retrieval signal
Handles synonyms?
Handles exact IDs/codes?
Extra step
Keyword (BM25)
Term overlap + frequency
No (without synonyms)
Yes
β
Vector
Embedding similarity
Yes
Weak
Embed query + docs
Semantic
Reranks existing candidates
Improves ordering
Improves ordering
LM reranking pass
Hybrid
BM25 and vector, fused
Yes
Yes
Rank fusion (RRF)
The key mental model: keyword and vector are retrieval strategies; semantic is a reranking layer; hybrid is the combination of the two retrieval strategies. Semantic ranking can be layered on any of them, most powerfully on hybrid.
When to use which
Exact-match heavy (codes, IDs, legal citations, SKUs) β keyword, or hybrid to also catch meaning.
Natural-language, paraphrase-heavy questions β vector, or hybrid to also catch exact tokens.
You need the best ordering of results β add semantic reranking.
General RAG grounding for an agent or app β hybrid + semantic reranking is the reliable default.
The exam tell: a scenario describing "combining keyword precision with meaning-based relevance" is hybrid β not "semantic alone." Candidates who pick semantic there are conflating the reranking layer with the retrieval combination. Similarly, if a scenario says "users search with synonyms and paraphrases and miss exact-keyword documents," the fix is usually to move from keyword-only to hybrid (or add vector), not to crank up chunk size.
How Azure AI Search implements all three
Azure AI Search is the documented retrieval engine for AI-103, and it supports the whole stack in one service:
Index your content with both a searchable text field (for BM25) and a vector field holding embeddings.
Issue a keyword, vector, or hybrid query. Hybrid queries run both and fuse results with Reciprocal Rank Fusion.
Optionally enable the semantic ranker to rerank the fused results and return semantic captions/answers.
Enrich during indexing with built-in skills (native) or custom skills (a hosted function/API you call in the skillset) β for chunking, embedding generation, OCR, and more.
This ties directly into RAG ingestion: chunk documents, generate embeddings, attach grounding metadata, and index β then ground your Foundry app or agent on hybrid + semantic results. For the ingestion side, see our RAG pipeline design guide.
Common mistakes to avoid on the exam
Calling semantic a retrieval strategy. It is a reranking step on top of retrieval.
Picking semantic when the scenario says "keyword precision + meaning." That is hybrid.
Assuming vector search handles exact IDs well. It often does not β that is where hybrid earns its keep.
Confusing built-in vs custom skills in the enrichment pipeline β custom skills require a hosted function/API.
Reaching for fine-tuning when a retrieval upgrade (keyword β hybrid, or adding semantic rerank) is the real fix.
Bottom line
Vector = meaning by embeddings. Semantic = a language-model reranking pass. Hybrid = keyword and vector fused, and the best RAG default when you add semantic reranking. Azure AI Search does all of it in one engine. Nail this distinction and you neutralize the exam's favorite retrieval trap. Keep going with the full AI-103 exam guide, the certification study guide, and the learning pathway.
Behavior described reflects Azure AI Search as of early 2026; verify current features on Microsoft Learn. explainx.ai is not affiliated with Microsoft.