Skip to main content
The Ollama plugin connects Genkit to an Ollama server running on your machine or a private host. This enables fully local, private AI inference — your data never leaves your network. Use cases:
  • Development and testing without API costs or rate limits
  • Privacy-sensitive applications where data cannot leave the premises
  • Air-gapped environments
  • Comparing open-source models against hosted ones within the same flow

Prerequisites

1

Install Ollama

Download and install Ollama from ollama.com. The Ollama server starts automatically and listens on http://localhost:11434.
2

Pull a model

# Pull Llama 3.2 (3B, good for most tasks)
ollama pull llama3.2

# Pull Gemma 3 (Google's open-weight model)
ollama pull gemma3:4b

# Pull a multimodal model
ollama pull llava

# Pull an embedding model
ollama pull nomic-embed-text
Run ollama list to see what you have installed.
3

Verify the server is running

curl http://localhost:11434/api/tags

Installation

npm install @genkit-ai/ollama

Configuration

import { genkit } from 'genkit';
import { ollama } from '@genkit-ai/ollama';

const ai = genkit({
  plugins: [
    ollama({
      // Defaults to http://localhost:11434
      serverAddress: 'http://localhost:11434',

      // Optional: pre-declare specific models
      models: [
        { name: 'llama3.2' },
        { name: 'gemma3:4b' },
        // Multimodal model
        { name: 'llava' },
        // Model that supports tool calling
        { name: 'mistral', supports: { tools: true } },
      ],

      // Optional: pre-declare embedding models
      embedders: [
        { name: 'nomic-embed-text', dimensions: 768 },
      ],
    }),
  ],
});

Plugin options

OptionTypeDefaultDescription
serverAddressstringhttp://localhost:11434URL of the Ollama server.
modelsModelDefinition[][]Models to pre-register. If omitted, any locally-pulled model is auto-discovered.
embeddersEmbeddingModelDefinition[][]Embedding models to register. Must specify dimensions as the Ollama API does not expose this.
requestHeadersRecord\<string, string\> | functionStatic or dynamic headers added to every request. Useful for authentication when Ollama runs behind a proxy.

Generating text

// Reference a model by name — it must be pulled in Ollama first
const response = await ai.generate({
  model: ollama.model('llama3.2'),
  prompt: 'Write a haiku about compilers.',
});

console.log(response.text);

Dynamic model resolution

If you don’t pre-declare models in the plugin config, Genkit will ask Ollama to resolve the model on first use — provided it is already pulled:
// Works as long as `ollama pull mistral` has been run
const response = await ai.generate({
  model: ollama.model('mistral'),
  prompt: 'Explain the halting problem.',
});
Embedding models must be declared upfront with their dimensions value because the Ollama API does not expose dimensionality metadata. Models can be resolved dynamically.

Model configuration

Pass generation options directly:
const response = await ai.generate({
  model: ollama.model('llama3.2', {
    temperature: 0.6,
    topK: 40,
    topP: 0.9,
    maxOutputTokens: 512,
  }),
  prompt: 'Summarise the following article in two sentences.',
});

Embeddings

const embeddings = await ai.embed({
  embedder: ollama.embedder('nomic-embed-text'),
  content: 'Store this document for later retrieval.',
});
// embeddings[0].embedding → number[] of length 768
Combine the Ollama embedder with @genkit-ai/dev-local-vectorstore for a fully local RAG pipeline with no external dependencies.

Multi-turn chat

const session = ai.createSession();
const chat = session.chat({ model: ollama.model('llama3.2') });

const r1 = await chat.send('Hello! My name is Alice.');
const r2 = await chat.send('What is my name?');
console.log(r2.text()); // "Your name is Alice."

Function calling (tools)

Several Ollama models support OpenAI-compatible tool calling. The plugin automatically enables tools for known-compatible models (Llama 3.1/3.2, Mistral, Qwen 2.5, Phi-4, and others):
const calculator = ai.defineTool(
  {
    name: 'calculate',
    description: 'Evaluates a mathematical expression.',
    inputSchema: z.object({ expression: z.string() }),
    outputSchema: z.number(),
  },
  async ({ expression }) => eval(expression) as number
);

const response = await ai.generate({
  model: ollama.model('llama3.2'),
  prompt: 'What is 1234 × 5678?',
  tools: [calculator],
});

Remote Ollama server

Point the plugin at any reachable Ollama server (e.g. a GPU workstation on your LAN or a private VM):
const ai = genkit({
  plugins: [
    ollama({
      serverAddress: 'http://192.168.1.100:11434',
      // Attach an auth header if the server is behind a proxy
      requestHeaders: async ({ serverAddress }) => ({
        Authorization: `Bearer ${await getToken(serverAddress)}`,
      }),
    }),
  ],
});

Multimodal input

Models like llava and bakllava accept images alongside text. Send them as data URIs or base64 strings:
const response = await ai.generate({
  model: ollama.model('llava'),
  messages: [
    {
      role: 'user',
      content: [
        { text: 'Describe what you see in this image.' },
        { media: { url: 'data:image/jpeg;base64,...', contentType: 'image/jpeg' } },
      ],
    },
  ],
});

Google AI plugin

Hosted Gemini models with more capabilities.

Sessions

Maintain conversation history across multiple turns.

RAG guide

Build retrieval pipelines with local embeddings.

Tools

Let models call functions during generation.