OrkaJS
Orka.JS

Streaming

Real-time token streaming for responsiveAI applications.

OrkaJS supports real-time streaming of LLM responses, enabling you to display tokens as they are generated. This dramatically improves user experience by reducing perceived latency.

Key Features

High Precision

Token-by-token streaming with callbacks

Accuracy

Lightning Speed

Event-based architecture for fine-grained control

Performance

Time Efficiency

Time to First Token (TTFT) tracking

Reliability

Native Connectivity

Support for all LLM providers (OpenAI, Anthropic, Mistral, Ollama)

Ecosystem
  • Cancellation support via AbortController
  • Extended thinking support for Claude models

Quick Start

The simplest way to use streaming is with the onToken callback:

import { OpenAIAdapter } from '@orka-js/openai';
 
const llm = new OpenAIAdapter({ apiKey: process.env.OPENAI_API_KEY! });
 
// Stream with onToken callback
const result = await llm.streamGenerate('Explain quantum computing', {
onToken: (token, index) => {
process.stdout.write(token); // Print each token as it arrives
},
onEvent: (event) => {
if (event.type === 'done') {
console.log('\nStream complete!');
}
},
});
 
console.log('Total tokens:', result.usage.totalTokens);
console.log('Time to first token:', result.ttft, 'ms');

Using the stream() Method

For more control, use the async iterator pattern:

import { OpenAIAdapter } from '@orka-js/openai';
 
const llm = new OpenAIAdapter({ apiKey: process.env.OPENAI_API_KEY! });
 
// Use async iterator for full control
for await (const event of llm.stream('Write a poem about AI')) {
switch (event.type) {
case 'token':
process.stdout.write(event.token);
break;
case 'thinking':
console.log('[Thinking]:', event.delta);
break;
case 'usage':
console.log('Usage:', event.usage);
break;
case 'done':
console.log('\nFinished:', event.finishReason);
break;
case 'error':
console.error('Error:', event.message);
break;
}
}

Event Types

The streaming system emits various event types:

Event LifecycleFunctional DescriptionIdentifier

Individual token received

token

Accumulated content with delta

content

Model reasoning (Claude extended thinking)

thinking

Tool/function call started

tool_call

Token usage statistics

usage

Stream completed

done

Error occurred

error
import type { LLMStreamEvent } from '@orka-js/core';
 
// Event type definitions
type StreamEventType =
| 'token' // Individual token received
| 'content' // Content chunk with accumulated text
| 'tool_call' // Tool/function call started
| 'thinking' // Model reasoning (Claude)
| 'usage' // Token usage update
| 'done' // Stream completed
| 'error'; // Error occurred
 
// Token event structure
interface TokenEvent {
type: 'token';
token: string; // The token text
index: number; // Token position
timestamp: number; // Event timestamp
}
 
// Done event structure
interface DoneEvent {
type: 'done';
content: string; // Full response content
finishReason: 'stop' | 'length' | 'tool_calls' | 'error';
usage?: {
promptTokens: number;
completionTokens: number;
totalTokens: number;
};
}

Stream Result

The streamGenerate() method returns a StreamResult with additional metrics:

interface StreamResult {
content: string; // Full response content
usage: {
promptTokens: number;
completionTokens: number;
totalTokens: number;
};
model: string; // Model used
finishReason: 'stop' | 'length' | 'tool_calls' | 'error';
ttft?: number; // Time to first token (ms)
durationMs: number; // Total stream duration (ms)
}
 
// Example usage
const result = await llm.streamGenerate('Hello');
console.log('TTFT:', result.ttft, 'ms');
console.log('Duration:', result.durationMs, 'ms');
console.log('Tokens/sec:', result.usage.completionTokens / (result.durationMs / 1000));

Checking Streaming Support

You can check if an adapter supports streaming:

import { isStreamingAdapter } from '@orka-js/core';
import { OpenAIAdapter } from '@orka-js/openai';
 
const llm = new OpenAIAdapter({ apiKey: process.env.OPENAI_API_KEY! });
 
if (isStreamingAdapter(llm)) {
// TypeScript knows llm has stream() and streamGenerate()
const result = await llm.streamGenerate('Hello');
console.log(result.content);
} else {
// Fallback to regular generate
const result = await llm.generate('Hello');
console.log(result.content);
}
 
// Check property directly
console.log('Supports streaming:', llm.supportsStreaming); // true

Cancellation

Use AbortController to cancel a stream:

import { OpenAIAdapter } from '@orka-js/openai';
 
const llm = new OpenAIAdapter({ apiKey: process.env.OPENAI_API_KEY! });
 
// Create abort controller
const controller = new AbortController();
 
// Cancel after 5 seconds
setTimeout(() => controller.abort(), 5000);
 
try {
const result = await llm.streamGenerate('Write a very long essay...', {
signal: controller.signal,
onToken: (token) => process.stdout.write(token),
});
} catch (error) {
if (error.name === 'AbortError') {
console.log('Stream was cancelled');
}
}

Provider Support

All LLM adapters support streaming:

Service InfrastructureCore CapabilitiesProvider ID
OpenAI

Stack

GPT-4o / GPT-4 Turbo

Full streaming with usage stats

OpenAI
Anthropic

Stack

Claude 3.5 Sonnet / Opus

Streaming with extended thinking support

Anthropic
Mistral

Stack

Mistral Large / Codestral

OpenAI-compatible streaming

Mistral
Ollama

Stack

Local Llama 3 / Mistral / Phi

NDJSON streaming format

Ollama

Best Practices

  • Always handle errors in your onEvent callback
  • Use TTFT metrics to monitor performance
  • Implement cancellation for long-running streams
  • Buffer tokens for smoother UI updates

Integration with RAG

Streaming works seamlessly with RAG pipelines:

import { createOrka, OpenAIAdapter, MemoryVectorAdapter } from 'orkajs';
import { isStreamingAdapter } from '@orka-js/core';
 
const orka = createOrka({
llm: new OpenAIAdapter({ apiKey: process.env.OPENAI_API_KEY! }),
vectorDB: new MemoryVectorAdapter(),
});
 
// Create knowledge base
await orka.knowledge.create({
name: 'docs',
source: ['OrkaJS is a TypeScript framework for LLM systems.'],
});
 
// Retrieve context first
const context = await orka.knowledge.search('docs', 'What is OrkaJS?', { topK: 3 });
 
// Then stream the response with context
if (isStreamingAdapter(orka.llm)) {
const result = await orka.llm.streamGenerate(
`Context: ${context.map(c => c.content).join('\n')}\n\nQuestion: What is OrkaJS?`,
{
systemPrompt: 'Answer based on the provided context.',
onToken: (token) => process.stdout.write(token),
}
);
console.log('\nAnswer generated with', result.usage.totalTokens, 'tokens');
}

Streaming Tool Calls

StreamingToolAgent streams tokens in real time while executing tools in parallel. Users see the model "thinking" as tools are invoked, tool_result events are emitted mid-stream, and conversational memory is preserved automatically across requests.

streaming-tool-agent.ts
import { StreamingToolAgent } from '@orka-js/agent';
import { OpenAIAdapter } from '@orka-js/openai';
 
const llm = new OpenAIAdapter({ apiKey: process.env.OPENAI_API_KEY! });
 
const agent = new StreamingToolAgent({
goal: 'Answer questions using available tools',
tools: [
{
name: 'get_weather',
description: 'Get current weather for a location',
parameters: [{ name: 'location', type: 'string', description: 'City name', required: true }],
execute: async ({ location }) => ({ output: `Weather in ${location}: Sunny, 22°C` }),
},
],
}, llm);
 
// Stream tokens + tool execution in real time
for await (const event of agent.runStream('What is the weather in Paris?')) {
switch (event.type) {
case 'token':
process.stdout.write(event.token); // LLM "thinking" as tokens arrive
break;
case 'tool_result':
console.log('\n[Tool result]:', event.result); // Tool output mid-stream
break;
case 'done':
console.log('\n[Final answer]:', event.content);
break;
}
}
 
// Or use run() for a simple non-streaming result
const result = await agent.run('What is the weather in Lyon?');
console.log(result.output); // Final answer
console.log(result.steps); // Tool execution steps with observations

💡 tool_result events

Pass a Memory instance to the constructor to maintain conversation context. History is loaded before each runStream() call and saved after completion — the agent never loses context between turns.