Advanced Retrievers
Improve retrieval quality with multi-query expansion, contextual compression, and ensemble methods.
Why Advanced Retrievers?
Basic vector search can miss relevant documents due to query phrasing or semantic gaps. Advanced retrievers improve recall and precision through query expansion, result compression, and fusion techniques.
# MultiQueryRetriever
Generates multiple alternative queries using an LLM, retrieves results for each, and deduplicates to improve recall.
import { MultiQueryRetriever } from '@orka-js/tools';import { createOrka } from '@orka-js/core';import { OpenAIAdapter } from '@orka-js/openai';import { PineconeAdapter } from '@orka-js/pinecone'; const orka = createOrka({ llm: new OpenAIAdapter({ apiKey: process.env.OPENAI_API_KEY! }), vectorDB: new PineconeAdapter({ /* config */ })}); const retriever = new MultiQueryRetriever({ llm: orka.getLLM(), vectorDB: orka.knowledge['vectorDB'], queryCount: 3, // Generate 3 alternative queries topK: 5, // Return top 5 results per query deduplicateByContent: true // Remove duplicate results}); const results = await retriever.retrieve( 'How do I configure Orka JS?', 'my-knowledge-base'); // Returns deduplicated results from all query variationsRetrieval Augmentation Logic
Solving Semantic Mismatch via LLM Diversification
Origin Query
LLM Variation Engine
Fusion & Ranking
# ContextualCompressionRetriever
Retrieves more documents than needed, then uses an LLM to extract only the relevant parts, improving precision and reducing context size.
import { ContextualCompressionRetriever } from '@orka-js/tools'; const retriever = new ContextualCompressionRetriever({ llm: orka.getLLM(), vectorDB: orka.knowledge['vectorDB'], topK: 10, // Retrieve 10 documents maxCompressedLength: 500 // Compress each to ~500 chars}); const results = await retriever.retrieve( 'What are the benefits of RAG?', 'my-knowledge-base'); // Each result contains only the relevant extract, not the full documentStandard Retrieval
Full Document Bloat
Wastes 70-80% of the context window with 'noise' (headers, irrelevant paragraphs, footers).
Compressed Output
High Signal Density
Returns only the 'golden nuggets'. Reduces token cost and dramatically improves LLM accuracy.
# EnsembleRetriever
Combines multiple retrievers using Reciprocal Rank Fusion (RRF) for improved results. Useful for combining different retrieval strategies.
import { EnsembleRetriever } from '@orka-js/tools';import { VectorRetriever } from '@orka-js/tools';import { MultiQueryRetriever } from '@orka-js/tools'; // Create individual retrieversconst vectorRetriever = new VectorRetriever({ llm: orka.getLLM(), vectorDB: orka.knowledge['vectorDB'], topK: 10}); const multiQueryRetriever = new MultiQueryRetriever({ llm: orka.getLLM(), vectorDB: orka.knowledge['vectorDB'], queryCount: 3, topK: 10}); // Combine with weighted fusionconst ensemble = new EnsembleRetriever({ retrievers: [vectorRetriever, multiQueryRetriever], weights: [0.4, 0.6], // 40% vector, 60% multi-query topK: 5 // Return top 5 fused results}); const results = await ensemble.retrieve( 'Explain RAG architecture', 'my-knowledge-base');🔬 Reciprocal Rank Fusion (RRF)
RRF combines rankings from multiple sources by giving higher scores to documents that appear in top positions across multiple retrievers.
// Formula: score = weight * (1 / (rank + 60)}// Document at rank 1 in Retriever A: 0.4 * (1/61) = 0.0066// Same document at rank 3 in Retriever B: 0.6 * (1/63) = 0.0095// Final fusion score: 0.0066 + 0.0095 = 0.0161# VectorRetriever
Basic vector search wrapper that implements the Retriever interface. Useful as a building block for ensemble retrievers.
import { VectorRetriever } from '@orka-js/tools'; const retriever = new VectorRetriever({ llm: orka.getLLM(), vectorDB: orka.knowledge['vectorDB'], topK: 5, minScore: 0.7 // Filter results below 0.7 similarity}); const results = await retriever.retrieve( 'What is RAG?', 'my-knowledge-base');# ParentDocumentRetriever
Searches on small child chunks for precision, then returns the full parent document for context. This solves the classic trade-off: small chunks are better for search accuracy, but large chunks provide more context for the LLM.
import { ParentDocumentRetriever } from '@orka-js/tools'; const retriever = new ParentDocumentRetriever({ llm: orka.getLLM(), vectorDB: orka.knowledge['vectorDB'], childTopK: 10, // Search top 10 child chunks parentTopK: 3, // Return top 3 parent documents minScore: 0.6});const results = await retriever.retrieve( 'How does authentication work?', 'documentation'); // Returns full parent documents, ranked by best child chunk score// Each result includes metadata: { childCount, parentContent, ... }Granular Indexing
Split documents into small snippets. Each snippet stores its Parent ID in metadata for future reconstruction.
Step 1: IndexSmall ChunksVector Retrieval
Perform semantic search on small chunks to find the exact match without noise or dilution.
Step 2: SearchChild SearchParent Association
Consolidate the found child chunks by their Parent ID. Groups fragmented insights into logical units.
Step 3: GroupMetadata JoinContext Expansion
Return the full parent content of the best-ranked child, providing the LLM with complete context.
Step 4: ExpandFull Retrieval# SelfQueryRetriever
Uses an LLM to automatically extract metadata filters from natural language queries. Instead of just semantic search, it combines meaning-based search with structured metadata filtering for more precise results.
import { SelfQueryRetriever } from '@orka-js/tools'; const retriever = new SelfQueryRetriever({ llm: orka.getLLM(), vectorDB: orka.knowledge['vectorDB'], topK: 5, metadataFields: [ { name: 'category', type: 'string', description: 'The document category', enumValues: ['tutorial', 'api-reference', 'guide', 'changelog'] }, { name: 'language', type: 'string', description: 'Programming language', enumValues: ['typescript', 'python', 'javascript'] }, { name: 'version', type: 'number', description: 'The version number of the documentation' } ]}); // Natural language query with implicit filtersconst results = await retriever.retrieve( 'Show me TypeScript tutorials about authentication in version 3', 'documentation'); // The LLM extracts:// semanticQuery: "authentication"// filter: { language: "typescript", category: "tutorial", version: 3 }Query Decomposition Example
The LLM automatically separates the semantic meaning from the structured filters:
// User query: "Find Python guides about deployment from 2024"// LLM extracts:{ "semanticQuery": "deployment", "filter": { "language": "python", "category": "guide" }}# BM25Retriever
A keyword-based retriever using the BM25 (Best Matching 25) algorithm. Unlike vector search which relies on semantic similarity, BM25 uses term frequency and inverse document frequency for exact keyword matching. Perfect for combining with vector search in an EnsembleRetriever.
import { BM25Retriever } from '@orka-js/tools'; const bm25 = new BM25Retriever({ documents: [ { id: '1', content: 'TypeScript is a typed superset of JavaScript...', metadata: { source: 'docs' } }, { id: '2', content: 'React hooks allow you to use state in functional components...', metadata: { source: 'blog' } }, { id: '3', content: 'Node.js is a JavaScript runtime built on Chrome V8...', metadata: { source: 'docs' } }, ], topK: 5, k1: 1.5, // Term frequency saturation (default: 1.5) b: 0.75 // Document length normalization (default: 0.75)}); const results = await bm25.retrieve('JavaScript runtime', 'any');// Finds documents with exact keyword matches for "JavaScript" and "runtime" // Add more documents dynamicallybm25.addDocuments([ { id: '4', content: 'Deno is a modern JavaScript/TypeScript runtime...' }]);# BM25 + Vector Search (Hybrid)
The most powerful retrieval strategy combines BM25 (keyword matching) with vector search (semantic understanding) using the EnsembleRetriever:
import { EnsembleRetriever } from '@orka-js/tools';import { VectorRetriever } from '@orka-js/tools';import { BM25Retriever } from '@orka-js/tools'; // Keyword-based retrievalconst bm25 = new BM25Retriever({ documents: myDocuments, topK: 10}); // Semantic retrievalconst vector = new VectorRetriever({ llm: orka.getLLM(), vectorDB: orka.knowledge['vectorDB'], topK: 10}); // Hybrid: combine both with Reciprocal Rank Fusionconst hybrid = new EnsembleRetriever({ retrievers: [bm25, vector], weights: [0.3, 0.7], // 30% keyword, 70% semantic topK: 5}); const results = await hybrid.retrieve('authentication middleware', 'docs');// Finds docs matching keywords AND semantically similar contentComparison
| Retriever Strategy | Core Strength | Architectural Trade-off |
|---|---|---|
MultiQueryCreative | Recall Max | High Latency / Tokens |
CompressionClean | Ultra-Precision | LLM Overhead |
EnsembleRobust | Hybrid Power | Multi-pass processing |
VectorBaseline | Latency < 50ms | Semantic Drift |
ParentDocContext | Rich Context | Storage Complexity |
SelfQueryLogic | Smart Filter | Schema Dependency |
BM25Classic | Exact Match | No Semantics |
Complete Example
import { createOrka } from '@orka-js/core';import { OpenAIAdapter } from '@orka-js/adapters';import { PineconeAdapter } from '@orka-js/adapters';import { MultiQueryRetriever, ContextualCompressionRetriever, EnsembleRetriever } from '@orka-js/tools'; const orka = createOrka({ llm: new OpenAIAdapter({ apiKey: process.env.OPENAI_API_KEY! }), vectorDB: new PineconeAdapter({ /* config */ })}); // Strategy 1: Multi-query for better recallconst multiQuery = new MultiQueryRetriever({ llm: orka.getLLM(), vectorDB: orka.knowledge['vectorDB'], queryCount: 3, topK: 10}); // Strategy 2: Compression for better precisionconst compression = new ContextualCompressionRetriever({ llm: orka.getLLM(), vectorDB: orka.knowledge['vectorDB'], topK: 15, maxCompressedLength: 400}); // Combine both strategiesconst ensemble = new EnsembleRetriever({ retrievers: [multiQuery, compression], weights: [0.5, 0.5], topK: 5}); // Retrieve with best of both worldsconst results = await ensemble.retrieve( 'How does RAG improve LLM responses?', 'documentation'); console.log(`Found ${results.length} highly relevant results`);results.forEach(r => { console.log(`Score: ${r.score.toFixed(3)}`); console.log(`Content: ${r.content?.slice(0, 100)}...`);});Best Practices
1. Start Simple
Begin with VectorRetriever. Add MultiQuery if recall is low. Add Compression if precision is low.
2. Monitor Costs
MultiQuery and Compression make extra LLM calls. Use caching or limit query count in production.
3. Tune Weights
Experiment with ensemble weights based on your use case. Higher weight = more influence on final ranking.
Tree-shaking Imports
// ✅ Import only what you needimport { MultiQueryRetriever } from '@orka-js/tools';import { EnsembleRetriever } from '@orka-js/tools'; // ✅ Or import from indeximport { MultiQueryRetriever, ContextualCompressionRetriever } from '@orka-js/tools';