Summary: This post explores how to implement semantic search capabilities in .NET applications using Azure Cognitive Search. Learn how to leverage vector embeddings, semantic ranking, and hybrid search to create more intelligent search experiences that understand user intent beyond simple keyword matching.
Introduction
Search functionality is a critical component of many applications, but traditional keyword-based search often falls short of user expectations. Users increasingly expect search systems to understand the meaning behind their queries, not just match keywords. This is where semantic search comes in.
Semantic search aims to improve search accuracy by understanding the searcher’s intent and the contextual meaning of terms to generate more relevant results. In May 2023, Azure Cognitive Search introduced semantic search capabilities, including vector search and semantic ranking, making it easier for .NET developers to implement these advanced search features.
In this post, we’ll explore how to implement semantic search in .NET applications using Azure Cognitive Search. We’ll cover vector embeddings, semantic ranking, and hybrid search approaches to create more intelligent search experiences that truly understand user intent.
Understanding Semantic Search
Before diving into implementation, let’s understand what makes semantic search different from traditional search approaches.
Traditional Keyword Search vs. Semantic Search
Traditional Keyword Search:
- Matches exact keywords or their variants
- Uses techniques like stemming, lemmatization, and synonym matching
- Ranks results based on term frequency, inverse document frequency (TF-IDF), and other statistical measures
- Struggles with understanding context, intent, and meaning
Semantic Search:
- Understands the meaning and intent behind queries
- Captures semantic relationships between words and phrases
- Can find relevant results even when keywords don’t match exactly
- Handles natural language queries more effectively
- Provides more contextually relevant results
Key Components of Semantic Search
- Vector Embeddings: Numerical representations of text that capture semantic meaning
- Vector Search: Finding similar content by measuring the distance between vectors
- Semantic Ranking: Re-ranking search results based on semantic relevance
- Hybrid Search: Combining traditional keyword search with semantic approaches
Setting Up Azure Cognitive Search
Let’s start by setting up Azure Cognitive Search with semantic search capabilities.
Prerequisites
- An Azure subscription
- An Azure Cognitive Search service (Standard tier or above for semantic search)
- An Azure OpenAI Service resource (for generating embeddings)
- A .NET 6 or later project
Creating the Search Service
You can create an Azure Cognitive Search service through the Azure portal, Azure CLI, or Azure PowerShell. Here’s how to do it with Azure CLI:
bash
# Create a resource group
az group create --name semantic-search-rg --location eastus
# Create a search service (Standard tier or above is required for semantic search)
az search service create \
--name your-search-service-name \
--resource-group semantic-search-rg \
--sku Standard \
--partition-count 1 \
--replica-count 1
Setting Up Your .NET Project
Create a new .NET project and add the necessary packages:
bash
dotnet new console -n SemanticSearchDemo
cd SemanticSearchDemo
dotnet add package Azure.Search.Documents
dotnet add package Azure.AI.OpenAI
Implementing Vector Search
Vector search is a key component of semantic search. It involves converting text into vector embeddings and then finding similar content by measuring the distance between these vectors.
Generating Vector Embeddings
First, let’s create a service to generate embeddings using Azure OpenAI:
csharp
using Azure;
using Azure.AI.OpenAI;
using System;
using System.Collections.Generic;
using System.Linq;
using System.Threading.Tasks;
public class EmbeddingService
{
private readonly OpenAIClient _client;
private readonly string _deploymentName;
public EmbeddingService(string endpoint, string apiKey, string deploymentName)
{
_client = new OpenAIClient(new Uri(endpoint), new AzureKeyCredential(apiKey));
_deploymentName = deploymentName;
}
public async Task<float[]> GenerateEmbeddingAsync(string text)
{
var response = await _client.GetEmbeddingsAsync(new EmbeddingsOptions(_deploymentName, new List<string> { text }));
return response.Value.Data[0].Embedding.ToArray();
}
public async Task<List<float[]>> GenerateBatchEmbeddingsAsync(List<string> texts)
{
var embeddings = new List<float[]>();
// Process in batches to avoid rate limits
int batchSize = 20;
for (int i = 0; i < texts.Count; i += batchSize)
{
var batch = texts.Skip(i).Take(batchSize).ToList();
var response = await _client.GetEmbeddingsAsync(new EmbeddingsOptions(_deploymentName, batch));
foreach (var embedding in response.Value.Data)
{
embeddings.Add(embedding.Embedding.ToArray());
}
}
return embeddings;
}
}
Creating a Search Index with Vector Fields
Now, let’s create a search index that supports vector search:
csharp
using Azure;
using Azure.Search.Documents;
using Azure.Search.Documents.Indexes;
using Azure.Search.Documents.Indexes.Models;
using System;
using System.Collections.Generic;
using System.Threading.Tasks;
public class SearchIndexService
{
private readonly SearchIndexClient _indexClient;
private readonly string _indexName;
public SearchIndexService(string searchServiceEndpoint, string adminApiKey, string indexName)
{
_indexClient = new SearchIndexClient(
new Uri(searchServiceEndpoint),
new AzureKeyCredential(adminApiKey));
_indexName = indexName;
}
public async Task CreateIndexAsync()
{
// Define the vector search configuration
var vectorSearchConfig = new VectorSearchConfiguration("my-algorithm", 1536);
// Define the fields for the index
var fields = new List<SearchField>
{
new SearchField("id", SearchFieldDataType.String) { IsKey = true, IsFilterable = true },
new SearchField("title", SearchFieldDataType.String) { IsSearchable = true, IsFilterable = true },
new SearchField("content", SearchFieldDataType.String) { IsSearchable = true }),
new SearchField("category", SearchFieldDataType.String) { IsFilterable = true, IsFacetable = true },
new SearchField("titleVector", SearchFieldDataType.Collection(SearchFieldDataType.Single))
{
IsSearchable = true,
VectorSearchDimensions = 1536,
VectorSearchConfiguration = "my-algorithm"
},
new SearchField("contentVector", SearchFieldDataType.Collection(SearchFieldDataType.Single))
{
IsSearchable = true,
VectorSearchDimensions = 1536,
VectorSearchConfiguration = "my-algorithm"
}
};
// Create the index
var index = new SearchIndex(_indexName, fields)
{
VectorSearch = new VectorSearch
{
Algorithms = { vectorSearchConfig }
},
SemanticSearch = new SemanticSearch
{
Configurations =
{
new SemanticConfiguration("default", new SemanticPrioritizedFields
{
TitleField = new SemanticField { FieldName = "title" },
ContentFields =
{
new SemanticField { FieldName = "content" }
},
KeywordsFields = { }
})
}
}
};
await _indexClient.CreateOrUpdateIndexAsync(index);
}
}
Indexing Documents with Vector Embeddings
Next, let’s create a service to index documents with their vector embeddings:
csharp
using Azure;
using Azure.Search.Documents;
using Azure.Search.Documents.Models;
using System;
using System.Collections.Generic;
using System.Threading.Tasks;
public class DocumentIndexingService
{
private readonly SearchClient _searchClient;
private readonly EmbeddingService _embeddingService;
public DocumentIndexingService(
string searchServiceEndpoint,
string adminApiKey,
string indexName,
EmbeddingService embeddingService)
{
_searchClient = new SearchClient(
new Uri(searchServiceEndpoint),
indexName,
new AzureKeyCredential(adminApiKey));
_embeddingService = embeddingService;
}
public async Task IndexDocumentsAsync(List<Document> documents)
{
// Generate embeddings for titles and content
var titles = documents.Select(d => d.Title).ToList();
var contents = documents.Select(d => d.Content).ToList();
var titleEmbeddings = await _embeddingService.GenerateBatchEmbeddingsAsync(titles);
var contentEmbeddings = await _embeddingService.GenerateBatchEmbeddingsAsync(contents);
// Create search documents with embeddings
var searchDocuments = new List<SearchDocument>();
for (int i = 0; i < documents.Count; i++)
{
var document = documents[i];
var searchDocument = new SearchDocument
{
["id"] = document.Id,
["title"] = document.Title,
["content"] = document.Content,
["category"] = document.Category,
["titleVector"] = titleEmbeddings[i],
["contentVector"] = contentEmbeddings[i]
};
searchDocuments.Add(searchDocument);
}
// Index the documents
await _searchClient.IndexDocumentsAsync(IndexDocumentsBatch.Upload(searchDocuments));
}
}
public class Document
{
public string Id { get; set; }
public string Title { get; set; }
public string Content { get; set; }
public string Category { get; set; }
}
Performing Vector Search
Now, let’s implement vector search to find semantically similar documents:
csharp
using Azure;
using Azure.Search.Documents;
using Azure.Search.Documents.Models;
using System;
using System.Collections.Generic;
using System.Threading.Tasks;
public class SemanticSearchService
{
private readonly SearchClient _searchClient;
private readonly EmbeddingService _embeddingService;
public SemanticSearchService(
string searchServiceEndpoint,
string queryApiKey,
string indexName,
EmbeddingService embeddingService)
{
_searchClient = new SearchClient(
new Uri(searchServiceEndpoint),
indexName,
new AzureKeyCredential(queryApiKey));
_embeddingService = embeddingService;
}
public async Task<List<SearchResult>> VectorSearchAsync(string query, int top = 10)
{
// Generate embedding for the query
float[] queryEmbedding = await _embeddingService.GenerateEmbeddingAsync(query);
// Create vector query
var vectorQuery = new VectorizedQuery(queryEmbedding)
{
KNearestNeighborsCount = top,
Fields = { "contentVector" }
};
// Set up search options
var searchOptions = new SearchOptions
{
VectorSearch = new VectorSearchOptions
{
Queries = { vectorQuery }
},
Size = top,
Select = { "id", "title", "content", "category" }
};
// Execute search
var response = await _searchClient.SearchAsync<SearchDocument>(null, searchOptions);
// Process results
var results = new List<SearchResult>();
await foreach (var result in response.GetResultsAsync())
{
results.Add(new SearchResult
{
Id = result.Document["id"].ToString(),
Title = result.Document["title"].ToString(),
Content = result.Document["content"].ToString(),
Category = result.Document["category"].ToString(),
Score = result.Score ?? 0
});
}
return results;
}
}
public class SearchResult
{
public string Id { get; set; }
public string Title { get; set; }
public string Content { get; set; }
public string Category { get; set; }
public double Score { get; set; }
}
Implementing Semantic Ranking
Semantic ranking enhances search results by re-ranking them based on semantic relevance. Azure Cognitive Search provides built-in semantic ranking capabilities.
Performing Semantic Search
Let’s implement semantic search with ranking:
csharp
public async Task<List<SearchResult>> SemanticSearchAsync(string query, string semanticConfigurationName = "default", int top = 10)
{
// Set up search options
var searchOptions = new SearchOptions
{
QueryType = SearchQueryType.Semantic,
SemanticConfigurationName = semanticConfigurationName,
QueryLanguage = QueryLanguage.EnUs,
Size = top,
Select = { "id", "title", "content", "category" }
};
// Execute search
var response = await _searchClient.SearchAsync<SearchDocument>(query, searchOptions);
// Process results
var results = new List<SearchResult>();
await foreach (var result in response.GetResultsAsync())
{
results.Add(new SearchResult
{
Id = result.Document["id"].ToString(),
Title = result.Document["title"].ToString(),
Content = result.Document["content"].ToString(),
Category = result.Document["category"].ToString(),
Score = result.Score ?? 0
});
}
return results;
}