Chunking
Understand how Orka JS splits documents into optimally-sized chunks for embedding andretrieval.
How Chunking Works
Orka JS uses a recursive text splitter that intelligently cuts text at natural boundaries:
\n\nParagraph Breaks
Preserves natural structure first.
Step 1
\nLine Breaks
Split by lines if paragraphs are too long.
Step 2
. Sentences
Targets logical thought boundaries.
Step 3
Words
Last resort to avoid cutting a word in half.
Step 4
charCharacters
Strict limit enforcement (Emergency).
Step 5
Configuration
await orka.knowledge.create({ name: 'docs', source: myContent, chunkSize: 1000, // Max characters per chunk chunkOverlap: 200, // Overlap between consecutive chunks});Recommended Sizes
| Content Profile | Dimensions (Size / Overlap) | Strategic Goal |
|---|---|---|
FAQ / Q&A | Size: 300Ovp: 50 | Atomic precision |
Technical Docs | Size: 1000Ovp: 200 | Block Integrity |
Long Articles | Size: 1200Ovp: 250 | Narrative Flow |
Legal / Contracts | Size: 600Ovp: 200 | Clause context |
💡 Why Overlap?
Overlap ensures information at chunk boundaries isn't lost. When a chunk ends mid-sentence, the next chunk starts a few hundred characters earlier, capturing the full context.