Summary: This post explores how to integrate OLLAMA, an open-source tool for running large language models locally, with .NET applications. Learn how to set up OLLAMA, create a .NET client, and build applications that leverage local LLMs for privacy-focused AI capabilities.
Introduction
The landscape of AI development is rapidly evolving, with large language models (LLMs) becoming increasingly accessible to developers. While cloud-based AI services like Azure OpenAI Service offer powerful capabilities, there are compelling reasons to run LLMs locally:
- Data privacy: Keep sensitive data within your infrastructure
- Cost control: Eliminate per-token charges for high-volume applications
- Offline operation: Run AI features without internet connectivity
- Customization: Fine-tune models for specific domains or use cases
- Reduced latency: Eliminate network round-trips for faster responses
OLLAMA, which gained popularity in mid-2023, is an open-source tool that makes it remarkably easy to run LLMs locally. It provides a simple interface for downloading, running, and interacting with various open-source models like Llama 2, Mistral, and Vicuna.
In this post, we’ll explore how to integrate OLLAMA with .NET applications, enabling you to build AI-powered features that run entirely on your local infrastructure. We’ll cover setting up OLLAMA, creating a .NET client, and building practical applications that leverage local LLMs.
Understanding OLLAMA
OLLAMA is a lightweight runtime for running LLMs locally. It simplifies the process of downloading and running models, handling the complex infrastructure requirements behind the scenes. Here’s what makes OLLAMA particularly valuable:
- Simple API: RESTful API for easy integration with any programming language
- Model management: Easy downloading and switching between different models
- Optimized performance: Efficient inference on consumer hardware
- Customization: Support for creating custom models with “Modelfiles”
- Cross-platform: Available for Windows, macOS, and Linux
OLLAMA supports a wide range of open-source models, including:
- Llama 2 (7B, 13B, 70B parameters)
- Mistral (7B)
- Vicuna
- CodeLlama
- Phi-2
- And many others
These models vary in size, capabilities, and resource requirements, allowing you to choose the right balance for your specific needs.
Setting Up OLLAMA
Before we can integrate OLLAMA with .NET, we need to install and set up OLLAMA on our development machine.
Installation
OLLAMA can be installed on Windows, macOS, or Linux. Here’s how to install it on each platform:
Windows:
- Download the installer fromĀ ollama.ai
- Run the installer and follow the prompts
- OLLAMA will be installed as a service that runs in the background
macOS:
bash
brew install ollama
Linux:
bash
curl -fsSL https://ollama.ai/install.sh | sh
Starting OLLAMA
After installation, you can start the OLLAMA service:
Windows: The service should start automatically after installation.
macOS/Linux:
bash
ollama serve
This will start the OLLAMA server, which listens on localhost:11434
by default.
Pulling Your First Model
Before we can use a model, we need to download it. Let’s pull the Llama 2 7B model, which is a good balance of capability and resource requirements:
bash
ollama pull llama2
This will download the model, which may take some time depending on your internet connection. The Llama 2 7B model is approximately 4GB in size.
You can also pull other models:
bash
# Pull the Mistral 7B model
ollama pull mistral
# Pull the CodeLlama model optimized for code generation
ollama pull codellama
# Pull a smaller, faster model
ollama pull phi
Testing OLLAMA from the Command Line
Before integrating with .NET, let’s make sure OLLAMA is working correctly by testing it from the command line:
bash
ollama run llama2 "Explain the concept of dependency injection in .NET"
You should see the model generate a response explaining dependency injection. If this works, OLLAMA is set up correctly, and we can proceed with .NET integration.
Creating a .NET Client for OLLAMA
Now that OLLAMA is set up, let’s create a .NET client to interact with it. We’ll start by creating a new .NET project and implementing a client class.
Creating a New Project
bash
dotnet new console -n OllamaDemo
cd OllamaDemo
dotnet add package System.Net.Http.Json
Implementing the OLLAMA Client
Let’s create a client class that encapsulates the OLLAMA API:
csharp
using System;
using System.Net.Http;
using System.Net.Http.Json;
using System.Text.Json;
using System.Text.Json.Serialization;
using System.Threading.Tasks;
namespace OllamaDemo
{
public class OllamaClient
{
private readonly HttpClient _httpClient;
private readonly string _baseUrl;
public OllamaClient(string baseUrl = "http://localhost:11434" )
{
_baseUrl = baseUrl;
_httpClient = new HttpClient( );
}
public async Task<GenerateResponse> GenerateAsync(GenerateRequest request)
{
var response = await _httpClient.PostAsJsonAsync($"{_baseUrl}/api/generate", request );
response.EnsureSuccessStatusCode();
var content = await response.Content.ReadAsStringAsync();
return JsonSerializer.Deserialize<GenerateResponse>(content);
}
public async Task<ChatResponse> ChatAsync(ChatRequest request)
{
var response = await _httpClient.PostAsJsonAsync($"{_baseUrl}/api/chat", request );
response.EnsureSuccessStatusCode();
var content = await response.Content.ReadAsStringAsync();
return JsonSerializer.Deserialize<ChatResponse>(content);
}
public async Task<EmbeddingResponse> EmbeddingAsync(EmbeddingRequest request)
{
var response = await _httpClient.PostAsJsonAsync($"{_baseUrl}/api/embeddings", request );
response.EnsureSuccessStatusCode();
var content = await response.Content.ReadAsStringAsync();
return JsonSerializer.Deserialize<EmbeddingResponse>(content);
}
public async Task<ListModelsResponse> ListModelsAsync()
{
var response = await _httpClient.GetAsync($"{_baseUrl}/api/tags" );
response.EnsureSuccessStatusCode();
var content = await response.Content.ReadAsStringAsync();
return JsonSerializer.Deserialize<ListModelsResponse>(content);
}
}
// Request and response models for the OLLAMA API
public class GenerateRequest
{
[JsonPropertyName("model")]
public string Model { get; set; }
[JsonPropertyName("prompt")]
public string Prompt { get; set; }
[JsonPropertyName("system")]
public string System { get; set; }
[JsonPropertyName("template")]
public string Template { get; set; }
[JsonPropertyName("context")]
public List<int> Context { get; set; }
[JsonPropertyName("options")]
public Dictionary<string, object> Options { get; set; }
}
public class GenerateResponse
{
[JsonPropertyName("model")]
public string Model { get; set; }
[JsonPropertyName("response")]
public string Response { get; set; }
[JsonPropertyName("context")]
public List<int> Context { get; set; }
[JsonPropertyName("total_duration")]
public long TotalDuration { get; set; }
[JsonPropertyName("load_duration")]
public long LoadDuration { get; set; }
[JsonPropertyName("prompt_eval_count")]
public int PromptEvalCount { get; set; }
[JsonPropertyName("prompt_eval_duration")]
public long PromptEvalDuration { get; set; }
[JsonPropertyName("eval_count")]
public int EvalCount { get; set; }
[JsonPropertyName("eval_duration")]
public long EvalDuration { get; set; }
}
public class ChatRequest
{
[JsonPropertyName("model")]
public string Model { get; set; }
[JsonPropertyName("messages")]
public List<ChatMessage> Messages { get; set; }
[JsonPropertyName("stream")]
public bool Stream { get; set; }
[JsonPropertyName("options")]
public Dictionary<string, object> Options { get; set; }
}
public class ChatMessage
{
[JsonPropertyName("role")]
public string Role { get; set; }
[JsonPropertyName("content")]
public string Content { get; set; }
}
public class ChatResponse
{
[JsonPropertyName("model")]
public string Model { get; set; }
[JsonPropertyName("message")]
public ChatMessage Message { get; set; }
[JsonPropertyName("total_duration")]
public long TotalDuration { get; set; }
[JsonPropertyName("load_duration")]
public long LoadDuration { get; set; }
[JsonPropertyName("prompt_eval_count")]
public int PromptEvalCount { get; set; }
[JsonPropertyName("prompt_eval_duration")]
public long PromptEvalDuration { get; set; }
[JsonPropertyName("eval_count")]
public int EvalCount { get; set; }
[JsonPropertyName("eval_duration")]
public long EvalDuration { get; set; }
}
public class EmbeddingRequest
{
[JsonPropertyName("model")]
public string Model { get; set; }
[JsonPropertyName("prompt")]
public string Prompt { get; set; }
}
public class EmbeddingResponse
{
[JsonPropertyName("embedding")]
public List<float> Embedding { get; set; }
}
public class ListModelsResponse
{
[JsonPropertyName("models")]
public List<ModelInfo> Models { get; set; }
}
public class ModelInfo
{
[JsonPropertyName("name")]
public string Name { get; set; }
[JsonPropertyName("modified_at")]
public string ModifiedAt { get; set; }
[JsonPropertyName("size")]
public long Size { get; set; }
}
}
Basic Usage Example
Now, let’s use our client to interact with OLLAMA:
csharp
using System;
using System.Collections.Generic;
using System.Threading.Tasks;
namespace OllamaDemo
{
class Program
{
static async Task Main(string[] args)
{
Console.WriteLine("OLLAMA .NET Client Demo");
Console.WriteLine("======================");
var client = new OllamaClient();
// List available models
Console.WriteLine("Available models:");
var models = await client.ListModelsAsync();
foreach (var model in models.Models)
{
Console.WriteLine($"- {model.Name} ({model.Size / 1024 / 1024} MB)");
}
Console.WriteLine();
// Generate text
Console.WriteLine("Generating text with Llama 2...");
var generateRequest = new GenerateRequest
{
Model = "llama2",
Prompt = "Explain the concept of dependency injection in .NET",
System = "You are a helpful assistant that provides concise explanations about .NET concepts."
};
var generateResponse = await client.GenerateAsync(generateRequest);
Console.WriteLine($"Response: {generateResponse.Response}");
Console.WriteLine($"Generation time: {generateResponse.TotalDuration / 1000000.0:F2} seconds");
Console.WriteLine();
// Chat conversation
Console.WriteLine("Starting a chat conversation...");
var chatRequest = new ChatRequest
{
Model = "llama2",
Messages = new List<ChatMessage>
{
new ChatMessage { Role = "system", Content = "You are a helpful assistant that provides concise explanations about .NET concepts." },
new ChatMessage { Role = "user", Content = "What are the new features in .NET 7?" }
}
};
var chatResponse = await client.ChatAsync(chatRequest);
Console.WriteLine($"Assistant: {chatResponse.Message.Content}");
Console.WriteLine($"Chat response time: {chatResponse.TotalDuration / 1000000.0:F2} seconds");
}
}
}
Implementing Streaming Responses
One important feature for a good user experience is streaming responses. This allows you to display the model’s output as it’s being generated, rather than waiting for the complete response. Let’s implement streaming support:
csharp
public async Task StreamGenerateAsync(GenerateRequest request, Action<string> onToken, Action onComplete = null)
{
var requestJson = JsonSerializer.Serialize(request);
var content = new StringContent(requestJson, System.Text.Encoding.UTF8, "application/json");
var streamRequest = new HttpRequestMessage(HttpMethod.Post, $"{_baseUrl}/api/generate");
streamRequest.Content = content;
streamRequest.Headers.Accept.Add(new System.Net.Http.Headers.MediaTypeWithQualityHeaderValue("application/x-ndjson"));
using var response = await _httpClient.SendAsync(streamRequest, HttpCompletionOption.ResponseHeadersRead );
response.EnsureSuccessStatusCode();
using var stream = await response.Content.ReadAsStreamAsync();
using var reader = new StreamReader(stream);
while (!reader.EndOfStream)
{
var line = await reader.ReadLineAsync();
if (string.IsNullOrEmpty(line)) continue;
try
{
var streamResponse = JsonSerializer.Deserialize<GenerateResponse>(line);
if (streamResponse != null && !string.IsNullOrEmpty(streamResponse.Response))
{
onToken(streamResponse.Response);
}
}
catch (JsonException)
{
// Skip malformed JSON
}
}
onComplete?.Invoke();
}
public async Task StreamChatAsync(ChatRequest request, Action<string> onToken, Action onComplete = null)
{
// Ensure streaming is enabled
request.Stream = true;
var requestJson = JsonSerializer.Serialize(request);
var content = new StringContent(requestJson, System.Text.Encoding.UTF8, "application/json");
var streamRequest = new HttpRequestMessage(HttpMethod.Post, $"{_baseUrl}/api/chat");
streamRequest.Content = content;
streamRequest.Headers.Accept.Add(new System.Net.Http.Headers.MediaTypeWithQualityHeaderValue("application/x-ndjson"));
using var response = await _httpClient.SendAsync(streamRequest, HttpCompletionOption.ResponseHeadersRead );
response.EnsureSuccessStatusCode();
using var stream = await response.Content.ReadAsStreamAsync();
using var reader = new StreamReader(stream);
while (!reader.EndOfStream)
{
var line = await reader.ReadLineAsync();
if (string.IsNullOrEmpty(line)) continue;
try
{
var streamResponse = JsonSerializer.Deserialize<ChatResponse>(line);
if (streamResponse != null && streamResponse.Message != null && !string.IsNullOrEmpty(streamResponse.Message.Content))
{
onToken(streamResponse.Message.Content);
}
}
catch (JsonException)
{
// Skip malformed JSON
}
}
onComplete?.Invoke();
}