Productivity

chunking-strategy

giuseppe-trisciuoglio/developer-kit · updated Apr 8, 2026

$npx skills add https://github.com/giuseppe-trisciuoglio/developer-kit --skill chunking-strategy
summary

Optimal chunking strategies for RAG systems and document processing pipelines.

  • Five strategy levels from fixed-size to advanced methods (late chunking, contextual retrieval), each suited to different document types and complexity
  • Includes recursive character chunking with hierarchical separators, structure-aware chunking for Markdown/code/PDFs, and embedding-based semantic chunking with configurable thresholds
  • Provides evaluation framework covering retrieval precision, recall, end-to
skill.md

Chunking Strategy for RAG Systems

Overview

Provides chunking strategies for RAG systems, vector databases, and document processing. Recommends chunk sizes, overlap percentages, and boundary detection methods; validates semantic coherence; evaluates retrieval metrics.

When to Use

Use when building or optimizing RAG systems, vector search pipelines, document chunking workflows, or performance-tuning existing systems with poor retrieval quality.

Instructions

Choose Chunking Strategy

Select based on document type and use case:

  1. Fixed-Size Chunking (Level 1)

    • Use for simple documents without clear structure
    • Start with 512 tokens and 10-20% overlap
    • Adjust: 256 for factoid queries, 1024 for analytical
  2. Recursive Character Chunking (Level 2)

    • Use for documents with structural boundaries
    • Hierarchical separators: paragraphs → sentences → words
    • Customize for document types (HTML, Markdown, JSON)
  3. Structure-Aware Chunking (Level 3)

    • Use for structured content (Markdown, code, tables, PDFs)
    • Preserve semantic units: functions, sections, table blocks
    • Validate structure preservation post-split
  4. Semantic Chunking (Level 4)

    • Use for complex documents with thematic shifts
    • Embedding-based boundary detection with 0.8 similarity threshold
    • Buffer size: 3-5 sentences
  5. Advanced Methods (Level 5)

    • Late Chunking for long-context models
    • Contextual Retrieval for high-precision requirements
    • Monitor computational cost vs. retrieval gain

Reference: references/strategies.md.

Implement Chunking Pipeline

  1. Pre-process documents

    • Analyze structure, content types, information density
    • Identify multi-modal content (tables, images, code)
  2. Select parameters

    • Chunk size: embedding model context window / 4
    • Overlap: 10-20% for most cases
    • Strategy-specific settings
  3. Process and validate

    • Apply chunking strategy
    • Validate coherence: run evaluate_chunks.py --coherence (see below)
    • Test with representative documents
  4. Evaluate and iterate

    • Measure precision and recall
    • If precision < 0.7: reduce chunk_size by 25% and re-evaluate
    • If recall < 0.6: increase overlap by 10% and re-evaluate
    • Monitor latency and memory usage

Reference: references/implementation.md.

Validate Chunk Quality

Run validation commands to assess chunk quality:

# Check semantic coherence (requires sentence-transformers)
python -c "
from sentence_transformers import SentenceTransformer
model = SentenceTransformer('all-MiniLM-L6-v2')
chunks = [...]  # your chunks
embeddings = model.encode(chunks)
similarity = (embeddings @ embeddings.T).mean()
print(f'Cohesion: {similarity:.3f}')  # target: 0.3-0.7
"

# Measure retrieval precision
python -c "
relevant = sum(1 for c in retrieved if c in relevant_chunks)
precision = relevant / len(retrieved)
print(f'Precision: {precision:.2f}')  # target: >= 0.7
"

# Check chunk size distribution
python -c "
import numpy as np
sizes = [len(c.split()) for c in chunks]
print(f'Mean: {np.mean(sizes):.0f}, Std: {np.std(sizes):.0f}')
print(f'Min: {min(sizes)}, Max: {max(sizes)}')
"

Reference: references/evaluation.md.

Examples

Fixed-Size Chunking

from langchain.text_splitter import RecursiveCharacterTextSplitter

splitter = RecursiveCharacterTextSplitter(
    chunk_size=256,
    chunk_overlap=25,
    length_function=len
)
chunks = splitter.split_documents(documents)

Structure-Aware Code Chunking

import ast

def chunk_python_code(code):
    tree = ast.parse(code)
    chunks = []
    for node in ast.walk(tree):
        if isinstance(node, (ast.FunctionDef, ast.ClassDef)):
            chunks.append(ast.get_source_segment(code, node))
    return chunks

Semantic Chunking

def semantic_chunk(text, similarity_threshold=0.8):
    sentences = split_into_sentences(text)
    embeddings = generate_embeddings(sentences)
    chunks, current = [], [sentences[0]]
    for i in range(1, len(sentences)):
        sim = cosine_similarity(embeddings[i-1], embeddings[i])
        if sim < similarity_threshold:
            chunks.append(" ".join(current))
            current = [sentences[i]]
        else:
            current.append(sentences[i])
    chunks.append(" ".join(current))
    return chunks

Best Practices

Core Principles

  • Balance context preservation with retrieval precision
  • Maintain semantic coherence within chunks
  • Optimize for embedding model context window constraints

Implementation

  • Start with fixed-size (512 tokens, 15% overlap)
  • Iterate based on document characteristics
  • Test with domain-specific documents before deployment

Pitfalls to Avoid

  • Over-chunking: context-poor small chunks
  • Under-chunking: missing information in oversized chunks
  • Ignoring semantic boundaries and document structure
  • One-size-fits-all for diverse content types

Constraints and Warnings

Resource Considerations

  • Semantic methods require significant compute resources
  • Late chunking needs long-context embedding models
  • Complex strategies increase processing latency
  • Monitor memory for large document batches

Quality Requirements

  • Validate semantic coherence post-processing
  • Test with representative documents before deployment
  • Ensure chunks maintain standalone meaning
  • Implement error handling for malformed content

References