Integrating OLLAMA with .NET: Running Local LLMs in Your Applications

Summary: This post explores how to integrate OLLAMA, an open-source tool for running large language models locally, with .NET applications. Learn how to set up OLLAMA, create a .NET client, and build applications that leverage local LLMs for privacy-focused AI capabilities.

Introduction

The landscape of AI development is rapidly evolving, with large language models (LLMs) becoming increasingly accessible to developers. While cloud-based AI services like Azure OpenAI Service offer powerful capabilities, there are compelling reasons to run LLMs locally:

  1. Data privacy: Keep sensitive data within your infrastructure
  2. Cost control: Eliminate per-token charges for high-volume applications
  3. Offline operation: Run AI features without internet connectivity
  4. Customization: Fine-tune models for specific domains or use cases
  5. Reduced latency: Eliminate network round-trips for faster responses

OLLAMA, which gained popularity in mid-2023, is an open-source tool that makes it remarkably easy to run LLMs locally. It provides a simple interface for downloading, running, and interacting with various open-source models like Llama 2, Mistral, and Vicuna.

In this post, we’ll explore how to integrate OLLAMA with .NET applications, enabling you to build AI-powered features that run entirely on your local infrastructure. We’ll cover setting up OLLAMA, creating a .NET client, and building practical applications that leverage local LLMs.

Understanding OLLAMA

OLLAMA is a lightweight runtime for running LLMs locally. It simplifies the process of downloading and running models, handling the complex infrastructure requirements behind the scenes. Here’s what makes OLLAMA particularly valuable:

  • Simple API: RESTful API for easy integration with any programming language
  • Model management: Easy downloading and switching between different models
  • Optimized performance: Efficient inference on consumer hardware
  • Customization: Support for creating custom models with “Modelfiles”
  • Cross-platform: Available for Windows, macOS, and Linux

OLLAMA supports a wide range of open-source models, including:

  • Llama 2 (7B, 13B, 70B parameters)
  • Mistral (7B)
  • Vicuna
  • CodeLlama
  • Phi-2
  • And many others

These models vary in size, capabilities, and resource requirements, allowing you to choose the right balance for your specific needs.

Setting Up OLLAMA

Before we can integrate OLLAMA with .NET, we need to install and set up OLLAMA on our development machine.

Installation

OLLAMA can be installed on Windows, macOS, or Linux. Here’s how to install it on each platform:

Windows:

  1. Download the installer fromĀ ollama.ai
  2. Run the installer and follow the prompts
  3. OLLAMA will be installed as a service that runs in the background

macOS:

bash

brew install ollama

Linux:

bash

curl -fsSL https://ollama.ai/install.sh | sh

Starting OLLAMA

After installation, you can start the OLLAMA service:

Windows: The service should start automatically after installation.

macOS/Linux:

bash

ollama serve

This will start the OLLAMA server, which listens on localhost:11434 by default.

Pulling Your First Model

Before we can use a model, we need to download it. Let’s pull the Llama 2 7B model, which is a good balance of capability and resource requirements:

bash

ollama pull llama2

This will download the model, which may take some time depending on your internet connection. The Llama 2 7B model is approximately 4GB in size.

You can also pull other models:

bash

# Pull the Mistral 7B model
ollama pull mistral

# Pull the CodeLlama model optimized for code generation
ollama pull codellama

# Pull a smaller, faster model
ollama pull phi

Testing OLLAMA from the Command Line

Before integrating with .NET, let’s make sure OLLAMA is working correctly by testing it from the command line:

bash

ollama run llama2 "Explain the concept of dependency injection in .NET"

You should see the model generate a response explaining dependency injection. If this works, OLLAMA is set up correctly, and we can proceed with .NET integration.

Creating a .NET Client for OLLAMA

Now that OLLAMA is set up, let’s create a .NET client to interact with it. We’ll start by creating a new .NET project and implementing a client class.

Creating a New Project

bash

dotnet new console -n OllamaDemo
cd OllamaDemo
dotnet add package System.Net.Http.Json

Implementing the OLLAMA Client

Let’s create a client class that encapsulates the OLLAMA API:

csharp

using System;
using System.Net.Http;
using System.Net.Http.Json;
using System.Text.Json;
using System.Text.Json.Serialization;
using System.Threading.Tasks;

namespace OllamaDemo
{
    public class OllamaClient
    {
        private readonly HttpClient _httpClient;
        private readonly string _baseUrl;

        public OllamaClient(string baseUrl = "http://localhost:11434" )
        {
            _baseUrl = baseUrl;
            _httpClient = new HttpClient( );
        }

        public async Task<GenerateResponse> GenerateAsync(GenerateRequest request)
        {
            var response = await _httpClient.PostAsJsonAsync($"{_baseUrl}/api/generate", request );
            response.EnsureSuccessStatusCode();
            
            var content = await response.Content.ReadAsStringAsync();
            return JsonSerializer.Deserialize<GenerateResponse>(content);
        }

        public async Task<ChatResponse> ChatAsync(ChatRequest request)
        {
            var response = await _httpClient.PostAsJsonAsync($"{_baseUrl}/api/chat", request );
            response.EnsureSuccessStatusCode();
            
            var content = await response.Content.ReadAsStringAsync();
            return JsonSerializer.Deserialize<ChatResponse>(content);
        }

        public async Task<EmbeddingResponse> EmbeddingAsync(EmbeddingRequest request)
        {
            var response = await _httpClient.PostAsJsonAsync($"{_baseUrl}/api/embeddings", request );
            response.EnsureSuccessStatusCode();
            
            var content = await response.Content.ReadAsStringAsync();
            return JsonSerializer.Deserialize<EmbeddingResponse>(content);
        }

        public async Task<ListModelsResponse> ListModelsAsync()
        {
            var response = await _httpClient.GetAsync($"{_baseUrl}/api/tags" );
            response.EnsureSuccessStatusCode();
            
            var content = await response.Content.ReadAsStringAsync();
            return JsonSerializer.Deserialize<ListModelsResponse>(content);
        }
    }

    // Request and response models for the OLLAMA API

    public class GenerateRequest
    {
        [JsonPropertyName("model")]
        public string Model { get; set; }

        [JsonPropertyName("prompt")]
        public string Prompt { get; set; }

        [JsonPropertyName("system")]
        public string System { get; set; }

        [JsonPropertyName("template")]
        public string Template { get; set; }

        [JsonPropertyName("context")]
        public List<int> Context { get; set; }

        [JsonPropertyName("options")]
        public Dictionary<string, object> Options { get; set; }
    }

    public class GenerateResponse
    {
        [JsonPropertyName("model")]
        public string Model { get; set; }

        [JsonPropertyName("response")]
        public string Response { get; set; }

        [JsonPropertyName("context")]
        public List<int> Context { get; set; }

        [JsonPropertyName("total_duration")]
        public long TotalDuration { get; set; }

        [JsonPropertyName("load_duration")]
        public long LoadDuration { get; set; }

        [JsonPropertyName("prompt_eval_count")]
        public int PromptEvalCount { get; set; }

        [JsonPropertyName("prompt_eval_duration")]
        public long PromptEvalDuration { get; set; }

        [JsonPropertyName("eval_count")]
        public int EvalCount { get; set; }

        [JsonPropertyName("eval_duration")]
        public long EvalDuration { get; set; }
    }

    public class ChatRequest
    {
        [JsonPropertyName("model")]
        public string Model { get; set; }

        [JsonPropertyName("messages")]
        public List<ChatMessage> Messages { get; set; }

        [JsonPropertyName("stream")]
        public bool Stream { get; set; }

        [JsonPropertyName("options")]
        public Dictionary<string, object> Options { get; set; }
    }

    public class ChatMessage
    {
        [JsonPropertyName("role")]
        public string Role { get; set; }

        [JsonPropertyName("content")]
        public string Content { get; set; }
    }

    public class ChatResponse
    {
        [JsonPropertyName("model")]
        public string Model { get; set; }

        [JsonPropertyName("message")]
        public ChatMessage Message { get; set; }

        [JsonPropertyName("total_duration")]
        public long TotalDuration { get; set; }

        [JsonPropertyName("load_duration")]
        public long LoadDuration { get; set; }

        [JsonPropertyName("prompt_eval_count")]
        public int PromptEvalCount { get; set; }

        [JsonPropertyName("prompt_eval_duration")]
        public long PromptEvalDuration { get; set; }

        [JsonPropertyName("eval_count")]
        public int EvalCount { get; set; }

        [JsonPropertyName("eval_duration")]
        public long EvalDuration { get; set; }
    }

    public class EmbeddingRequest
    {
        [JsonPropertyName("model")]
        public string Model { get; set; }

        [JsonPropertyName("prompt")]
        public string Prompt { get; set; }
    }

    public class EmbeddingResponse
    {
        [JsonPropertyName("embedding")]
        public List<float> Embedding { get; set; }
    }

    public class ListModelsResponse
    {
        [JsonPropertyName("models")]
        public List<ModelInfo> Models { get; set; }
    }

    public class ModelInfo
    {
        [JsonPropertyName("name")]
        public string Name { get; set; }

        [JsonPropertyName("modified_at")]
        public string ModifiedAt { get; set; }

        [JsonPropertyName("size")]
        public long Size { get; set; }
    }
}

Basic Usage Example

Now, let’s use our client to interact with OLLAMA:

csharp

using System;
using System.Collections.Generic;
using System.Threading.Tasks;

namespace OllamaDemo
{
    class Program
    {
        static async Task Main(string[] args)
        {
            Console.WriteLine("OLLAMA .NET Client Demo");
            Console.WriteLine("======================");

            var client = new OllamaClient();

            // List available models
            Console.WriteLine("Available models:");
            var models = await client.ListModelsAsync();
            foreach (var model in models.Models)
            {
                Console.WriteLine($"- {model.Name} ({model.Size / 1024 / 1024} MB)");
            }
            Console.WriteLine();

            // Generate text
            Console.WriteLine("Generating text with Llama 2...");
            var generateRequest = new GenerateRequest
            {
                Model = "llama2",
                Prompt = "Explain the concept of dependency injection in .NET",
                System = "You are a helpful assistant that provides concise explanations about .NET concepts."
            };

            var generateResponse = await client.GenerateAsync(generateRequest);
            Console.WriteLine($"Response: {generateResponse.Response}");
            Console.WriteLine($"Generation time: {generateResponse.TotalDuration / 1000000.0:F2} seconds");
            Console.WriteLine();

            // Chat conversation
            Console.WriteLine("Starting a chat conversation...");
            var chatRequest = new ChatRequest
            {
                Model = "llama2",
                Messages = new List<ChatMessage>
                {
                    new ChatMessage { Role = "system", Content = "You are a helpful assistant that provides concise explanations about .NET concepts." },
                    new ChatMessage { Role = "user", Content = "What are the new features in .NET 7?" }
                }
            };

            var chatResponse = await client.ChatAsync(chatRequest);
            Console.WriteLine($"Assistant: {chatResponse.Message.Content}");
            Console.WriteLine($"Chat response time: {chatResponse.TotalDuration / 1000000.0:F2} seconds");
        }
    }
}

Implementing Streaming Responses

One important feature for a good user experience is streaming responses. This allows you to display the model’s output as it’s being generated, rather than waiting for the complete response. Let’s implement streaming support:

csharp

public async Task StreamGenerateAsync(GenerateRequest request, Action<string> onToken, Action onComplete = null)
{
    var requestJson = JsonSerializer.Serialize(request);
    var content = new StringContent(requestJson, System.Text.Encoding.UTF8, "application/json");
    
    var streamRequest = new HttpRequestMessage(HttpMethod.Post, $"{_baseUrl}/api/generate");
    streamRequest.Content = content;
    streamRequest.Headers.Accept.Add(new System.Net.Http.Headers.MediaTypeWithQualityHeaderValue("application/x-ndjson"));
    
    using var response = await _httpClient.SendAsync(streamRequest, HttpCompletionOption.ResponseHeadersRead );
    response.EnsureSuccessStatusCode();
    
    using var stream = await response.Content.ReadAsStreamAsync();
    using var reader = new StreamReader(stream);
    
    while (!reader.EndOfStream)
    {
        var line = await reader.ReadLineAsync();
        if (string.IsNullOrEmpty(line)) continue;
        
        try
        {
            var streamResponse = JsonSerializer.Deserialize<GenerateResponse>(line);
            if (streamResponse != null && !string.IsNullOrEmpty(streamResponse.Response))
            {
                onToken(streamResponse.Response);
            }
        }
        catch (JsonException)
        {
            // Skip malformed JSON
        }
    }
    
    onComplete?.Invoke();
}

public async Task StreamChatAsync(ChatRequest request, Action<string> onToken, Action onComplete = null)
{
    // Ensure streaming is enabled
    request.Stream = true;
    
    var requestJson = JsonSerializer.Serialize(request);
    var content = new StringContent(requestJson, System.Text.Encoding.UTF8, "application/json");
    
    var streamRequest = new HttpRequestMessage(HttpMethod.Post, $"{_baseUrl}/api/chat");
    streamRequest.Content = content;
    streamRequest.Headers.Accept.Add(new System.Net.Http.Headers.MediaTypeWithQualityHeaderValue("application/x-ndjson"));
    
    using var response = await _httpClient.SendAsync(streamRequest, HttpCompletionOption.ResponseHeadersRead );
    response.EnsureSuccessStatusCode();
    
    using var stream = await response.Content.ReadAsStreamAsync();
    using var reader = new StreamReader(stream);
    
    while (!reader.EndOfStream)
    {
        var line = await reader.ReadLineAsync();
        if (string.IsNullOrEmpty(line)) continue;
        
        try
        {
            var streamResponse = JsonSerializer.Deserialize<ChatResponse>(line);
            if (streamResponse != null && streamResponse.Message != null && !string.IsNullOrEmpty(streamResponse.Message.Content))
            {
                onToken(streamResponse.Message.Content);
            }
        }
        catch (JsonException)
        {
            // Skip malformed JSON
        }
    }
    
    onComplete?.Invoke();
}