Conversation Memory
Manage conversation history with single and multi-session memory architectures, ensuringcontextual persistence in Orka JS.
Why Conversation Memory?
LLMs are stateless — they don't remember previous messages. Conversation memory solves this by storing chat history and injecting it into each new request. Orka provides automatic memory management with configurable trimming strategies to stay within token limits.
# Single-Session Memory
Use single-session memory for simple chatbots or when you only need to track one conversation at a time.
import { Memory } from '@orka-js/memory-store'; const memory = new Memory({ maxMessages: 50, // Keep at most 50 messages strategy: 'sliding_window' // Trimming strategy}); // Add messages to the conversationmemory.addMessage({ role: 'user', content: 'My name is Alice.' });memory.addMessage({ role: 'assistant', content: 'Hello Alice! How can I help you today?' });memory.addMessage({ role: 'user', content: 'What is my name?' }); // Get the full historyconst history = memory.getHistory();/*[ { role: 'user', content: 'My name is Alice.', timestamp: 1708000000000 }, { role: 'assistant', content: 'Hello Alice! How can I help you today?', timestamp: 1708000001000 }, { role: 'user', content: 'What is my name?', timestamp: 1708000002000 }]*/ // Clear the conversationmemory.clear();- Memory Methods
Memory Management Interface
Stateful Context Control & Message Rotation
addMessage()IngestionChatMessage
Appends a message and triggers the auto-trimming logic (sliding window).
getHistory()RetrievalChatMessage[]
Retrieves the persistent state as a structured array for model injection.
getMessageCount()Telemetrynumber
Returns the current stack depth to monitor context usage.
clear()Maintenancevoid
Resets the memory buffer for a fresh session state.
# Multi-Session Memory
For multi-user applications (APIs, chatbots serving multiple users), use SessionMemory to manage separate conversations per user/session. Each session has its own isolated memory with automatic TTL-based cleanup.
import { SessionMemory } from '@orka-js/memory-store'; const sessions = new SessionMemory({ maxMessages: 50, // Per-session message limit strategy: 'sliding_window', ttlMs: 3600_000, // Sessions expire after 1 hour of inactivity}); // User A's conversationsessions.addMessage('user-alice', { role: 'user', content: 'Hello!' });sessions.addMessage('user-alice', { role: 'assistant', content: 'Hi Alice!' }); // User B's conversation (completely separate)sessions.addMessage('user-bob', { role: 'user', content: 'Help me with my order.' });sessions.addMessage('user-bob', { role: 'assistant', content: 'Of course! What is your order ID?' }); // Retrieve history by session IDconst aliceHistory = sessions.getHistory('user-alice');const bobHistory = sessions.getHistory('user-bob'); // List all active sessionsconsole.log(sessions.getActiveSessions()); // ['user-alice', 'user-bob'] // Clear a specific sessionsessions.clearSession('user-alice'); // Clear all sessionssessions.clearAll();- SessionMemory Methods
Multi-Session Controller
Session Isolation & Context Routing
addMessage()Writeparams: sessionId, message
Ingests a message into a specific thread; initializes session lazily if needed.
getHistory()Readparams: sessionId
Retrieves the isolated state for a given session ID to restore context.
getActiveSessions()Auditparams: none
Audits the memory stack by returning all currently managed session identifiers.
clearSession()Purgeparams: sessionId
Purges and deallocates a specific session thread to free up resources.
# Trimming Strategies
When the conversation grows too long, Orka automatically trims old messages using one of three strategies:
| Strategy | Description |
|---|---|
| sliding_window | Keeps the N most recent messages, preserving system messages |
| buffer | Keeps messages that fit within estimated token budget |
| summary | Compresses old messages into a summary, preserving context while reducing size |
💡 Summary Strategy
The summary strategy is ideal for long conversations where you want to preserve context without keeping all messages. When the message count exceeds the threshold, old messages are compressed into a system message summary.
const memory = new Memory({ strategy: 'summary', maxMessages: 20, summaryThreshold: 10, // Summarize when 10+ messages overflow llm: orka.getLLM(), // LLM used to generate summaries}); // After 30 messages, the first 10 are compressed into a summary:// [{ role: 'system', content: 'Summary: User Alice discussed...' }, ...recent 20 messages]# Configuration
Configure memory when creating your Orka instance:
import { createOrka, OpenAIAdapter, MemoryVectorAdapter } from 'orkajs'; const orka = createOrka({ llm: new OpenAIAdapter({ apiKey: process.env.OPENAI_API_KEY! }), vectorDB: new MemoryVectorAdapter(), memory: { maxMessages: 50, // Maximum messages to keep maxTokensEstimate: 4000, // For 'buffer' strategy strategy: 'sliding_window', // 'sliding_window' | 'buffer' | 'summary' },}); // Access memory directlyconst memory = orka.memory();memory.addMessage({ role: 'user', content: 'Hello!' }); // Or use with ask() - memory is automatically managedconst response = await orka.ask({ question: 'What did I say earlier?', useMemory: true, // Injects conversation history into the prompt});- Configuration Options
Memory Policy Configuration
Token Optimization & Session Lifecycle
strategyLogicType: Union Type
Defines the pruning logic: Sliding Window (FIFO), Buffer (Token-based), or Summary (Recursive condensation).
maxMessagesCapType: number
Hard limit on the message stack depth. Essential for fixed-window architectures.
maxTokensEstimateQuotaType: number
Sets a dynamic token budget. Triggers compression or removal when the payload exceeds this threshold.
ttlMsLifecycleType: number
Time-To-Live for session persistence. Automates garbage collection for inactive conversation threads.
Autonomous Context Compression
Let your agents decide when to compress context instead of relying on fixed token thresholds. Agents can proactively call the compact_conversation tool at clean task boundaries.
🎯 Agent-Initiated
Agents decide when to compress based on task context, not arbitrary limits
📊 Token Savings
Returns detailed metrics: messages compressed, tokens saved, generated summary
Using SummaryMemory.compress()
The SummaryMemory class now includes a compress() method that can be called on-demand to compress the conversation history:
import { SummaryMemory } from '@orka-js/memory-store';import { OpenAIAdapter } from '@orka-js/openai'; const llm = new OpenAIAdapter({ apiKey: process.env.OPENAI_API_KEY! }); const memory = new SummaryMemory({ llm, maxMessages: 100, // High limit - we'll compress manually progressiveCompression: true,}); // Add messages to the conversationawait memory.addMessage({ role: 'user', content: 'Let me explain the project requirements...' });await memory.addMessage({ role: 'assistant', content: 'I understand. Please go ahead.' });// ... many more messages ... // Compress on-demand when neededconst result = await memory.compress(); console.log(result);// {// success: true,// reason: 'Compression completed successfully',// summary: 'User explained project requirements including...',// messagesCompressed: 15,// tokensSaved: 850,// compressedAt: 1708000000000// } // The history now contains the summary + any preserved system messagesconst history = memory.getHistory();Agent Tool: compact_conversation
Give your agents the power to autonomously decide when to compress context using the built-in compact_conversation tool:
import { ReActAgent, createCompactConversationTool, COMPACT_CONVERSATION_PROMPT_ADDITION } from '@orka-js/agent';import { SummaryMemory } from '@orka-js/memory-store';import { OpenAIAdapter } from '@orka-js/openai'; const llm = new OpenAIAdapter({ apiKey: process.env.OPENAI_API_KEY! });const memory = new SummaryMemory({ llm, maxMessages: 100 }); // Create the compact conversation toolconst compactTool = createCompactConversationTool({ compress: async () => { const result = await memory.compress(); return { success: result.success, reason: result.reason, summary: result.summary, messagesCompressed: result.messagesCompressed, tokensSaved: result.tokensSaved, compressedAt: result.compressedAt, }; },}); // Create agent with the toolconst agent = new ReActAgent({ llm, goal: 'Help users with complex multi-step tasks', tools: [compactTool, ...otherTools], // Add guidance for when to use the tool systemPrompt: `You are a helpful assistant.${COMPACT_CONVERSATION_PROMPT_ADDITION}`,}); // The agent can now autonomously decide to compress context// Example: "I notice we've been discussing multiple topics. Let me compact // the conversation to preserve the key points before we continue."const result = await agent.run('Help me plan my project');- CompressResult Interface
Compression Result Schema
Autonomous Context Compaction Metrics
successStatusboolean
Whether the compression operation completed successfully.
summaryOutputstring
The generated summary of compressed messages, preserving key context.
messagesCompressedMetricnumber
Count of messages that were compressed into the summary.
tokensSavedSavingsnumber
Estimated tokens saved by the compression (original - summary).
💡 When to Use Autonomous Compression
- • Transitioning between major tasks or topics
- • After completing a complex multi-step workflow
- • When the agent notices repetitive context
- • Before starting a new conversation thread
Tree-shaking Imports
// ✅ Import only what you needimport { Memory } from '@orka-js/memory-store';import { SessionMemory } from '@orka-js/memory-store';import { SummaryMemory, type CompressResult } from '@orka-js/memory-store'; // ✅ Import the compact conversation toolimport { createCompactConversationTool, COMPACT_CONVERSATION_PROMPT_ADDITION } from '@orka-js/agent'; // ✅ Or import from indeximport { Memory, SessionMemory, SummaryMemory } from '@orka-js/memory-store';