OrkaJS
Orka.JS

Conversation Memory

Manage conversation history with single and multi-session memory architectures, ensuringcontextual persistence in Orka JS.

Why Conversation Memory?

LLMs are stateless — they don't remember previous messages. Conversation memory solves this by storing chat history and injecting it into each new request. Orka provides automatic memory management with configurable trimming strategies to stay within token limits.

# Single-Session Memory

Use single-session memory for simple chatbots or when you only need to track one conversation at a time.

import { Memory } from '@orka-js/memory-store';
 
const memory = new Memory({
maxMessages: 50, // Keep at most 50 messages
strategy: 'sliding_window' // Trimming strategy
});
 
// Add messages to the conversation
memory.addMessage({ role: 'user', content: 'My name is Alice.' });
memory.addMessage({ role: 'assistant', content: 'Hello Alice! How can I help you today?' });
memory.addMessage({ role: 'user', content: 'What is my name?' });
 
// Get the full history
const history = memory.getHistory();
/*
[
{ role: 'user', content: 'My name is Alice.', timestamp: 1708000000000 },
{ role: 'assistant', content: 'Hello Alice! How can I help you today?', timestamp: 1708000001000 },
{ role: 'user', content: 'What is my name?', timestamp: 1708000002000 }
]
*/
 
// Clear the conversation
memory.clear();

- Memory Methods

Memory Management Interface

Stateful Context Control & Message Rotation

addMessage()Ingestion

ChatMessage

Appends a message and triggers the auto-trimming logic (sliding window).

getHistory()Retrieval

ChatMessage[]

Retrieves the persistent state as a structured array for model injection.

getMessageCount()Telemetry

number

Returns the current stack depth to monitor context usage.

clear()Maintenance

void

Resets the memory buffer for a fresh session state.

# Multi-Session Memory

For multi-user applications (APIs, chatbots serving multiple users), use SessionMemory to manage separate conversations per user/session. Each session has its own isolated memory with automatic TTL-based cleanup.

import { SessionMemory } from '@orka-js/memory-store';
 
const sessions = new SessionMemory({
maxMessages: 50, // Per-session message limit
strategy: 'sliding_window',
ttlMs: 3600_000, // Sessions expire after 1 hour of inactivity
});
 
// User A's conversation
sessions.addMessage('user-alice', { role: 'user', content: 'Hello!' });
sessions.addMessage('user-alice', { role: 'assistant', content: 'Hi Alice!' });
 
// User B's conversation (completely separate)
sessions.addMessage('user-bob', { role: 'user', content: 'Help me with my order.' });
sessions.addMessage('user-bob', { role: 'assistant', content: 'Of course! What is your order ID?' });
 
// Retrieve history by session ID
const aliceHistory = sessions.getHistory('user-alice');
const bobHistory = sessions.getHistory('user-bob');
 
// List all active sessions
console.log(sessions.getActiveSessions()); // ['user-alice', 'user-bob']
 
// Clear a specific session
sessions.clearSession('user-alice');
 
// Clear all sessions
sessions.clearAll();

- SessionMemory Methods

Multi-Session Controller

Session Isolation & Context Routing

addMessage()Write

params: sessionId, message

Ingests a message into a specific thread; initializes session lazily if needed.

getHistory()Read

params: sessionId

Retrieves the isolated state for a given session ID to restore context.

getActiveSessions()Audit

params: none

Audits the memory stack by returning all currently managed session identifiers.

clearSession()Purge

params: sessionId

Purges and deallocates a specific session thread to free up resources.

# Trimming Strategies

When the conversation grows too long, Orka automatically trims old messages using one of three strategies:

StrategyDescription
sliding_windowKeeps the N most recent messages, preserving system messages
bufferKeeps messages that fit within estimated token budget
summaryCompresses old messages into a summary, preserving context while reducing size

💡 Summary Strategy

The summary strategy is ideal for long conversations where you want to preserve context without keeping all messages. When the message count exceeds the threshold, old messages are compressed into a system message summary.

const memory = new Memory({
strategy: 'summary',
maxMessages: 20,
summaryThreshold: 10, // Summarize when 10+ messages overflow
llm: orka.getLLM(), // LLM used to generate summaries
});
 
// After 30 messages, the first 10 are compressed into a summary:
// [{ role: 'system', content: 'Summary: User Alice discussed...' }, ...recent 20 messages]

# Configuration

Configure memory when creating your Orka instance:

import { createOrka, OpenAIAdapter, MemoryVectorAdapter } from 'orkajs';
 
const orka = createOrka({
llm: new OpenAIAdapter({ apiKey: process.env.OPENAI_API_KEY! }),
vectorDB: new MemoryVectorAdapter(),
memory: {
maxMessages: 50, // Maximum messages to keep
maxTokensEstimate: 4000, // For 'buffer' strategy
strategy: 'sliding_window', // 'sliding_window' | 'buffer' | 'summary'
},
});
 
// Access memory directly
const memory = orka.memory();
memory.addMessage({ role: 'user', content: 'Hello!' });
 
// Or use with ask() - memory is automatically managed
const response = await orka.ask({
question: 'What did I say earlier?',
useMemory: true, // Injects conversation history into the prompt
});

- Configuration Options

Memory Policy Configuration

Token Optimization & Session Lifecycle

strategyLogic

Type: Union Type

Defines the pruning logic: Sliding Window (FIFO), Buffer (Token-based), or Summary (Recursive condensation).

maxMessagesCap

Type: number

Hard limit on the message stack depth. Essential for fixed-window architectures.

maxTokensEstimateQuota

Type: number

Sets a dynamic token budget. Triggers compression or removal when the payload exceeds this threshold.

ttlMsLifecycle

Type: number

Time-To-Live for session persistence. Automates garbage collection for inactive conversation threads.

NEW

Autonomous Context Compression

Let your agents decide when to compress context instead of relying on fixed token thresholds. Agents can proactively call the compact_conversation tool at clean task boundaries.

🎯 Agent-Initiated

Agents decide when to compress based on task context, not arbitrary limits

📊 Token Savings

Returns detailed metrics: messages compressed, tokens saved, generated summary

Using SummaryMemory.compress()

The SummaryMemory class now includes a compress() method that can be called on-demand to compress the conversation history:

import { SummaryMemory } from '@orka-js/memory-store';
import { OpenAIAdapter } from '@orka-js/openai';
 
const llm = new OpenAIAdapter({ apiKey: process.env.OPENAI_API_KEY! });
 
const memory = new SummaryMemory({
llm,
maxMessages: 100, // High limit - we'll compress manually
progressiveCompression: true,
});
 
// Add messages to the conversation
await memory.addMessage({ role: 'user', content: 'Let me explain the project requirements...' });
await memory.addMessage({ role: 'assistant', content: 'I understand. Please go ahead.' });
// ... many more messages ...
 
// Compress on-demand when needed
const result = await memory.compress();
 
console.log(result);
// {
// success: true,
// reason: 'Compression completed successfully',
// summary: 'User explained project requirements including...',
// messagesCompressed: 15,
// tokensSaved: 850,
// compressedAt: 1708000000000
// }
 
// The history now contains the summary + any preserved system messages
const history = memory.getHistory();

Agent Tool: compact_conversation

Give your agents the power to autonomously decide when to compress context using the built-in compact_conversation tool:

import { ReActAgent, createCompactConversationTool, COMPACT_CONVERSATION_PROMPT_ADDITION } from '@orka-js/agent';
import { SummaryMemory } from '@orka-js/memory-store';
import { OpenAIAdapter } from '@orka-js/openai';
 
const llm = new OpenAIAdapter({ apiKey: process.env.OPENAI_API_KEY! });
const memory = new SummaryMemory({ llm, maxMessages: 100 });
 
// Create the compact conversation tool
const compactTool = createCompactConversationTool({
compress: async () => {
const result = await memory.compress();
return {
success: result.success,
reason: result.reason,
summary: result.summary,
messagesCompressed: result.messagesCompressed,
tokensSaved: result.tokensSaved,
compressedAt: result.compressedAt,
};
},
});
 
// Create agent with the tool
const agent = new ReActAgent({
llm,
goal: 'Help users with complex multi-step tasks',
tools: [compactTool, ...otherTools],
// Add guidance for when to use the tool
systemPrompt: `You are a helpful assistant.
${COMPACT_CONVERSATION_PROMPT_ADDITION}`,
});
 
// The agent can now autonomously decide to compress context
// Example: "I notice we've been discussing multiple topics. Let me compact
// the conversation to preserve the key points before we continue."
const result = await agent.run('Help me plan my project');

- CompressResult Interface

Compression Result Schema

Autonomous Context Compaction Metrics

successStatus

boolean

Whether the compression operation completed successfully.

summaryOutput

string

The generated summary of compressed messages, preserving key context.

messagesCompressedMetric

number

Count of messages that were compressed into the summary.

tokensSavedSavings

number

Estimated tokens saved by the compression (original - summary).

💡 When to Use Autonomous Compression

  • • Transitioning between major tasks or topics
  • • After completing a complex multi-step workflow
  • • When the agent notices repetitive context
  • • Before starting a new conversation thread

Tree-shaking Imports

// ✅ Import only what you need
import { Memory } from '@orka-js/memory-store';
import { SessionMemory } from '@orka-js/memory-store';
import { SummaryMemory, type CompressResult } from '@orka-js/memory-store';
 
// ✅ Import the compact conversation tool
import { createCompactConversationTool, COMPACT_CONVERSATION_PROMPT_ADDITION } from '@orka-js/agent';
 
// ✅ Or import from index
import { Memory, SessionMemory, SummaryMemory } from '@orka-js/memory-store';