Implementing Semantic Search with Azure Cognitive Search and .NET

Summary: This post explores how to implement semantic search capabilities in .NET applications using Azure Cognitive Search. Learn how to leverage vector embeddings, semantic ranking, and hybrid search to create more intelligent search experiences that understand user intent beyond simple keyword matching.

Introduction

Search functionality is a critical component of many applications, but traditional keyword-based search often falls short of user expectations. Users increasingly expect search systems to understand the meaning behind their queries, not just match keywords. This is where semantic search comes in.

Semantic search aims to improve search accuracy by understanding the searcher’s intent and the contextual meaning of terms to generate more relevant results. In May 2023, Azure Cognitive Search introduced semantic search capabilities, including vector search and semantic ranking, making it easier for .NET developers to implement these advanced search features.

In this post, we’ll explore how to implement semantic search in .NET applications using Azure Cognitive Search. We’ll cover vector embeddings, semantic ranking, and hybrid search approaches to create more intelligent search experiences that truly understand user intent.

Understanding Semantic Search

Before diving into implementation, let’s understand what makes semantic search different from traditional search approaches.

Traditional Keyword Search vs. Semantic Search

Traditional Keyword Search:

  • Matches exact keywords or their variants
  • Uses techniques like stemming, lemmatization, and synonym matching
  • Ranks results based on term frequency, inverse document frequency (TF-IDF), and other statistical measures
  • Struggles with understanding context, intent, and meaning

Semantic Search:

  • Understands the meaning and intent behind queries
  • Captures semantic relationships between words and phrases
  • Can find relevant results even when keywords don’t match exactly
  • Handles natural language queries more effectively
  • Provides more contextually relevant results

Key Components of Semantic Search

  1. Vector Embeddings: Numerical representations of text that capture semantic meaning
  2. Vector Search: Finding similar content by measuring the distance between vectors
  3. Semantic Ranking: Re-ranking search results based on semantic relevance
  4. Hybrid Search: Combining traditional keyword search with semantic approaches

Setting Up Azure Cognitive Search

Let’s start by setting up Azure Cognitive Search with semantic search capabilities.

Prerequisites

  1. An Azure subscription
  2. An Azure Cognitive Search service (Standard tier or above for semantic search)
  3. An Azure OpenAI Service resource (for generating embeddings)
  4. A .NET 6 or later project

Creating the Search Service

You can create an Azure Cognitive Search service through the Azure portal, Azure CLI, or Azure PowerShell. Here’s how to do it with Azure CLI:

bash

# Create a resource group
az group create --name semantic-search-rg --location eastus

# Create a search service (Standard tier or above is required for semantic search)
az search service create \
  --name your-search-service-name \
  --resource-group semantic-search-rg \
  --sku Standard \
  --partition-count 1 \
  --replica-count 1

Setting Up Your .NET Project

Create a new .NET project and add the necessary packages:

bash

dotnet new console -n SemanticSearchDemo
cd SemanticSearchDemo
dotnet add package Azure.Search.Documents
dotnet add package Azure.AI.OpenAI

Implementing Vector Search

Vector search is a key component of semantic search. It involves converting text into vector embeddings and then finding similar content by measuring the distance between these vectors.

Generating Vector Embeddings

First, let’s create a service to generate embeddings using Azure OpenAI:

csharp

using Azure;
using Azure.AI.OpenAI;
using System;
using System.Collections.Generic;
using System.Linq;
using System.Threading.Tasks;

public class EmbeddingService
{
    private readonly OpenAIClient _client;
    private readonly string _deploymentName;

    public EmbeddingService(string endpoint, string apiKey, string deploymentName)
    {
        _client = new OpenAIClient(new Uri(endpoint), new AzureKeyCredential(apiKey));
        _deploymentName = deploymentName;
    }

    public async Task<float[]> GenerateEmbeddingAsync(string text)
    {
        var response = await _client.GetEmbeddingsAsync(new EmbeddingsOptions(_deploymentName, new List<string> { text }));
        return response.Value.Data[0].Embedding.ToArray();
    }

    public async Task<List<float[]>> GenerateBatchEmbeddingsAsync(List<string> texts)
    {
        var embeddings = new List<float[]>();
        
        // Process in batches to avoid rate limits
        int batchSize = 20;
        for (int i = 0; i < texts.Count; i += batchSize)
        {
            var batch = texts.Skip(i).Take(batchSize).ToList();
            var response = await _client.GetEmbeddingsAsync(new EmbeddingsOptions(_deploymentName, batch));
            
            foreach (var embedding in response.Value.Data)
            {
                embeddings.Add(embedding.Embedding.ToArray());
            }
        }
        
        return embeddings;
    }
}

Creating a Search Index with Vector Fields

Now, let’s create a search index that supports vector search:

csharp

using Azure;
using Azure.Search.Documents;
using Azure.Search.Documents.Indexes;
using Azure.Search.Documents.Indexes.Models;
using System;
using System.Collections.Generic;
using System.Threading.Tasks;

public class SearchIndexService
{
    private readonly SearchIndexClient _indexClient;
    private readonly string _indexName;

    public SearchIndexService(string searchServiceEndpoint, string adminApiKey, string indexName)
    {
        _indexClient = new SearchIndexClient(
            new Uri(searchServiceEndpoint),
            new AzureKeyCredential(adminApiKey));
        _indexName = indexName;
    }

    public async Task CreateIndexAsync()
    {
        // Define the vector search configuration
        var vectorSearchConfig = new VectorSearchConfiguration("my-algorithm", 1536);
        
        // Define the fields for the index
        var fields = new List<SearchField>
        {
            new SearchField("id", SearchFieldDataType.String) { IsKey = true, IsFilterable = true },
            new SearchField("title", SearchFieldDataType.String) { IsSearchable = true, IsFilterable = true },
            new SearchField("content", SearchFieldDataType.String) { IsSearchable = true }),
            new SearchField("category", SearchFieldDataType.String) { IsFilterable = true, IsFacetable = true },
            new SearchField("titleVector", SearchFieldDataType.Collection(SearchFieldDataType.Single))
            {
                IsSearchable = true,
                VectorSearchDimensions = 1536,
                VectorSearchConfiguration = "my-algorithm"
            },
            new SearchField("contentVector", SearchFieldDataType.Collection(SearchFieldDataType.Single))
            {
                IsSearchable = true,
                VectorSearchDimensions = 1536,
                VectorSearchConfiguration = "my-algorithm"
            }
        };

        // Create the index
        var index = new SearchIndex(_indexName, fields)
        {
            VectorSearch = new VectorSearch
            {
                Algorithms = { vectorSearchConfig }
            },
            SemanticSearch = new SemanticSearch
            {
                Configurations =
                {
                    new SemanticConfiguration("default", new SemanticPrioritizedFields
                    {
                        TitleField = new SemanticField { FieldName = "title" },
                        ContentFields =
                        {
                            new SemanticField { FieldName = "content" }
                        },
                        KeywordsFields = { }
                    })
                }
            }
        };

        await _indexClient.CreateOrUpdateIndexAsync(index);
    }
}

Indexing Documents with Vector Embeddings

Next, let’s create a service to index documents with their vector embeddings:

csharp

using Azure;
using Azure.Search.Documents;
using Azure.Search.Documents.Models;
using System;
using System.Collections.Generic;
using System.Threading.Tasks;

public class DocumentIndexingService
{
    private readonly SearchClient _searchClient;
    private readonly EmbeddingService _embeddingService;

    public DocumentIndexingService(
        string searchServiceEndpoint,
        string adminApiKey,
        string indexName,
        EmbeddingService embeddingService)
    {
        _searchClient = new SearchClient(
            new Uri(searchServiceEndpoint),
            indexName,
            new AzureKeyCredential(adminApiKey));
        _embeddingService = embeddingService;
    }

    public async Task IndexDocumentsAsync(List<Document> documents)
    {
        // Generate embeddings for titles and content
        var titles = documents.Select(d => d.Title).ToList();
        var contents = documents.Select(d => d.Content).ToList();
        
        var titleEmbeddings = await _embeddingService.GenerateBatchEmbeddingsAsync(titles);
        var contentEmbeddings = await _embeddingService.GenerateBatchEmbeddingsAsync(contents);
        
        // Create search documents with embeddings
        var searchDocuments = new List<SearchDocument>();
        for (int i = 0; i < documents.Count; i++)
        {
            var document = documents[i];
            var searchDocument = new SearchDocument
            {
                ["id"] = document.Id,
                ["title"] = document.Title,
                ["content"] = document.Content,
                ["category"] = document.Category,
                ["titleVector"] = titleEmbeddings[i],
                ["contentVector"] = contentEmbeddings[i]
            };
            
            searchDocuments.Add(searchDocument);
        }
        
        // Index the documents
        await _searchClient.IndexDocumentsAsync(IndexDocumentsBatch.Upload(searchDocuments));
    }
}

public class Document
{
    public string Id { get; set; }
    public string Title { get; set; }
    public string Content { get; set; }
    public string Category { get; set; }
}

Performing Vector Search

Now, let’s implement vector search to find semantically similar documents:

csharp

using Azure;
using Azure.Search.Documents;
using Azure.Search.Documents.Models;
using System;
using System.Collections.Generic;
using System.Threading.Tasks;

public class SemanticSearchService
{
    private readonly SearchClient _searchClient;
    private readonly EmbeddingService _embeddingService;

    public SemanticSearchService(
        string searchServiceEndpoint,
        string queryApiKey,
        string indexName,
        EmbeddingService embeddingService)
    {
        _searchClient = new SearchClient(
            new Uri(searchServiceEndpoint),
            indexName,
            new AzureKeyCredential(queryApiKey));
        _embeddingService = embeddingService;
    }

    public async Task<List<SearchResult>> VectorSearchAsync(string query, int top = 10)
    {
        // Generate embedding for the query
        float[] queryEmbedding = await _embeddingService.GenerateEmbeddingAsync(query);
        
        // Create vector query
        var vectorQuery = new VectorizedQuery(queryEmbedding)
        {
            KNearestNeighborsCount = top,
            Fields = { "contentVector" }
        };
        
        // Set up search options
        var searchOptions = new SearchOptions
        {
            VectorSearch = new VectorSearchOptions
            {
                Queries = { vectorQuery }
            },
            Size = top,
            Select = { "id", "title", "content", "category" }
        };
        
        // Execute search
        var response = await _searchClient.SearchAsync<SearchDocument>(null, searchOptions);
        
        // Process results
        var results = new List<SearchResult>();
        await foreach (var result in response.GetResultsAsync())
        {
            results.Add(new SearchResult
            {
                Id = result.Document["id"].ToString(),
                Title = result.Document["title"].ToString(),
                Content = result.Document["content"].ToString(),
                Category = result.Document["category"].ToString(),
                Score = result.Score ?? 0
            });
        }
        
        return results;
    }
}

public class SearchResult
{
    public string Id { get; set; }
    public string Title { get; set; }
    public string Content { get; set; }
    public string Category { get; set; }
    public double Score { get; set; }
}

Implementing Semantic Ranking

Semantic ranking enhances search results by re-ranking them based on semantic relevance. Azure Cognitive Search provides built-in semantic ranking capabilities.

Performing Semantic Search

Let’s implement semantic search with ranking:

csharp

public async Task<List<SearchResult>> SemanticSearchAsync(string query, string semanticConfigurationName = "default", int top = 10)
{
    // Set up search options
    var searchOptions = new SearchOptions
    {
        QueryType = SearchQueryType.Semantic,
        SemanticConfigurationName = semanticConfigurationName,
        QueryLanguage = QueryLanguage.EnUs,
        Size = top,
        Select = { "id", "title", "content", "category" }
    };
    
    // Execute search
    var response = await _searchClient.SearchAsync<SearchDocument>(query, searchOptions);
    
    // Process results
    var results = new List<SearchResult>();
    await foreach (var result in response.GetResultsAsync())
    {
        results.Add(new SearchResult
        {
            Id = result.Document["id"].ToString(),
            Title = result.Document["title"].ToString(),
            Content = result.Document["content"].ToString(),
            Category = result.Document["category"].ToString(),
            Score = result.Score ?? 0
        });
    }
    
    return results;
}

Semantic