Skip to main content
Streaming lets your application display or process model output progressively — useful for chat interfaces, long-form generation, and real-time feedback.

Basic streaming

Use ai.generateStream() instead of ai.generate(). It returns an object with two properties:
  • stream — an async iterable of GenerateResponseChunk objects.
  • response — a Promise<GenerateResponse> that resolves when generation is complete.
import { genkit } from 'genkit';
import { googleAI } from '@genkit-ai/google-genai';

const ai = genkit({ plugins: [googleAI()], model: 'googleai/gemini-2.0-flash' });

const { stream, response } = ai.generateStream({
  prompt: 'Write a short story about a robot learning to paint.',
});

// Consume chunks as they arrive
for await (const chunk of stream) {
  process.stdout.write(chunk.text);
}

// Await the final response for metadata (usage, finish reason, etc.)
const finalResponse = await response;
console.log('\n\nFinish reason:', finalResponse.finishReason);
console.log('Total tokens:', finalResponse.usage.totalTokens);

The GenerateResponseChunk object

Each chunk delivered by the stream is a GenerateResponseChunk with the following properties:
PropertyTypeDescription
chunk.textstringText content in this chunk only.
chunk.accumulatedTextstringAll text received up to and including this chunk.
chunk.contentPart[]Raw content parts in this chunk.
chunk.toolRequestsToolRequestPart[]Tool call requests included in this chunk.
chunk.outputT | nullPartial structured output (when using an output schema).
chunk.indexnumberMessage index this chunk belongs to (starts at 0).
const { stream } = ai.generateStream({
  prompt: 'Summarize the French Revolution in three bullet points.',
});

for await (const chunk of stream) {
  // chunk.text — new text in this chunk
  // chunk.accumulatedText — everything so far
  process.stdout.write(chunk.text);
}

Using onChunk callback

As an alternative to async iteration, you can pass an onChunk callback directly to generate(). This is useful when you want the final response object but also want to react to chunks:
const response = await ai.generate({
  prompt: 'Explain quantum entanglement simply.',
  onChunk: (chunk) => {
    // called for each chunk as it arrives
    process.stdout.write(chunk.text);
  },
});

// response is the fully assembled GenerateResponse
console.log('\n\nTotal tokens:', response.usage.totalTokens);
onChunk and generateStream() both stream the same underlying chunks. Use generateStream() when you want async iteration syntax; use onChunk when you only need a side effect and still want the final Promise<GenerateResponse>.

Streaming within a flow

Streaming works inside flows. Use generateStream() the same way you would outside of a flow:
const storyFlow = ai.defineFlow(
  {
    name: 'storyFlow',
    inputSchema: z.string(),
    outputSchema: z.string(),
    streamSchema: z.string(), // type of each streaming chunk
  },
  async (topic, { sendChunk }) => {
    const { stream, response } = ai.generateStream({
      prompt: `Write a short story about: ${topic}`,
    });

    for await (const chunk of stream) {
      sendChunk(chunk.text); // forward chunk to flow's stream
    }

    return (await response).text;
  }
);

// Streaming a flow
const { stream, output } = storyFlow.stream('a robot painter');
for await (const chunk of stream) {
  process.stdout.write(chunk);
}
console.log('\n\nFinal output:', await output);

Streaming structured output

You can stream structured output by combining generateStream() with an output.schema. Each chunk’s output property contains the partial JSON parsed so far:
import { z } from 'genkit';

const ItemSchema = z.object({
  title: z.string(),
  description: z.string(),
  price: z.number(),
});

const { stream, response } = ai.generateStream({
  prompt: 'Generate a product listing for a mechanical keyboard.',
  output: { schema: ItemSchema },
});

for await (const chunk of stream) {
  // chunk.output is the partial object built from JSON received so far
  if (chunk.output) {
    console.log('Partial output:', chunk.output);
  }
}

const finalResponse = await response;
console.log('Final product:', finalResponse.output);
For streaming structured output with the jsonl format, each chunk contains a complete JSON object on its own line. This is useful for streaming arrays of items one element at a time.

Streaming in a web server

When deploying flows as HTTP endpoints, you can stream the response to the client using server-sent events (SSE) or chunked transfer encoding. The Genkit flow server handles this automatically when a client sends a streaming request.
import express from 'express';
import { genkit, z } from 'genkit';

const app = express();

app.get('/stream', async (req, res) => {
  res.setHeader('Content-Type', 'text/event-stream');
  res.setHeader('Cache-Control', 'no-cache');

  const { stream, response } = ai.generateStream({
    prompt: req.query.prompt as string,
  });

  for await (const chunk of stream) {
    res.write(`data: ${JSON.stringify({ text: chunk.text })}\n\n`);
  }

  await response;
  res.write('data: [DONE]\n\n');
  res.end();
});

Prompt streaming

Prompts defined with ai.definePrompt() or loaded from .prompt files also support streaming via the .stream() method:
const storyPrompt = ai.definePrompt(
  { name: 'story', input: { schema: z.object({ topic: z.string() }) } },
  async ({ topic }) => ({
    messages: [{ role: 'user', content: [{ text: `Write a story about ${topic}` }] }],
  })
);

const { stream, response } = storyPrompt.stream({ topic: 'a lost astronaut' });
for await (const chunk of stream) {
  process.stdout.write(chunk.text);
}

Structured output

Stream typed JSON output chunk by chunk.

Flows

Define streamable flows with typed stream schemas.

Agents

Stream multi-turn agent loops.

Deployment

Deploy streaming flows to production.