Getting appropriate responses from a RAG empowered LLM is an art. The primary 3 knobs to adjust are Chunk Size, Document Return Count, and the RAG System Prompt.

Use this guide to help you fine tune your RAG empowered LLM responses.

Chunk Size

Chunk size is how much text you split documents into before embedding and retrieval (often measured in tokens or characters). The goal is to make chunks large enough to contain complete, useful context, but small enough that retrieval stays precise and you don’t hit token limits when multiple chunks are added to the prompt.

Choose chunk sizes based on the structure of the documents and the kind of questions users ask:

No matter what chunk size you pick, add chunk overlap so important context isn’t split across boundaries. Overlap helps preserve continuity (e.g., a definition at the end of one chunk and its usage at the start of the next).

Default Recommendation

A general-purpose default that works across many document types is:

When to Adjust