{
  "$type": "site.standard.document",
  "bskyPostRef": {
    "cid": "bafyreidztkswg54txlsm4kad7gpveuz4mlaxyy6veg5sodoiqn4gozi47q",
    "uri": "at://did:plc:5sgu76a53rz3n6unbykmovqy/app.bsky.feed.post/3mlodgmv2q6v2"
  },
  "description": "Chunking is the process of splitting documents into smaller passages before embedding them for retrieval. Chunk size and boundaries directly determine what a retrieval system can find: a chunk that is too large blurs the meaning of its embedding, and a chunk that is too small lacks the context to answer most questions.\n\n\nCommon strategies\n\n * Fixed-size character or token splitting. Cuts every N characters or tokens. Simple but ignores semantic boundaries.\n * Recursive character splitting. Tries",
  "path": "/engineering-glossary/chunking-document-splitting/",
  "publishedAt": "2026-05-12T17:39:34.000Z",
  "site": "https://sahilkapoor.com",
  "tags": [
    "RAG",
    "Embeddings",
    "Vector Database",
    "Reranker",
    "Context Window"
  ],
  "textContent": "**Chunking** is the process of splitting documents into smaller passages before embedding them for retrieval. Chunk size and boundaries directly determine what a retrieval system can find: a chunk that is too large blurs the meaning of its embedding, and a chunk that is too small lacks the context to answer most questions.\n\n## Common strategies\n\n  * **Fixed-size character or token splitting.** Cuts every N characters or tokens. Simple but ignores semantic boundaries.\n  * **Recursive character splitting.** Tries to split on paragraph, then sentence, then word boundaries. The common baseline in LangChain and LlamaIndex.\n  * **Structural chunking.** Splits on headings, sections, code blocks, or table rows. Often suited to technical documentation.\n  * **Semantic chunking.** Splits on shifts in embedding similarity between adjacent sentences.\n  * **Overlap.** Adjacent chunks share a small tail and head (10 to 20 percent) so context isn't lost at boundaries.\n\n\n\nšŸ”—\n\n**Related Terms**\nRAG, Embeddings, Vector Database, Reranker, Context Window.",
  "title": "Chunking",
  "updatedAt": "2026-05-13T19:15:21.726Z"
}