Guide: Token & context budget
↑ Back to toolWhat is this tool?
A free LLM context window calculator and prompt token budget helper. Split your prompt into labeled blocks (system, user, RAG, extra), count tokens and characters per block and in total, then compare against a model context limit with an optional reserved completion budget. See whether you are under, near, or over budget — and which section is largest. Client-side in the browser; use your provider's tokenizer for billing.
Prompt sections
- System / instructions — Policies, persona, tool rules.
- User message — The main user query or task.
- RAG / retrieved context — Pasted chunks, citations, knowledge base text.
- Extra — Tool JSON, function definitions, or anything else you bill as input.
The UI highlights the largest block so you know where to trim first when you are over limit.
Tokenizer modes
Same encodings as the token calculator:
- Approximate — UTF-8 bytes ÷ 4 (fast planning heuristic, not BPE-exact).
- cl100k_base — Tiktoken; common for GPT-3.5 / GPT-4–class chat models.
- o200k_base — Tiktoken; closer for GPT-4o / newer o-series–style vocabularies.
Other vendors (Anthropic, Gemini, etc.) use different tokenizers — treat counts as planning, not guaranteed billable tokens.
Context limit & completion reserve
Pick a context limit (presets from 4K up to 1M tokens, or a custom value) and set reserve for completion — headroom for the model's reply, tool call payloads, or follow-up turns. The effective budget is limit − reserved. Totals above that show as over budget; high utilization triggers a warning so you can avoid truncation or API errors.
Features
- Four text areas — Independent token/char counts per section.
- Summary bar — Total tokens/chars, progress vs budget, over/warn/ok states.
- Largest block — Called out in the summary for quick trimming.
- Share — Copy a URL encoding your inputs (for collaboration or bookmarks).
How to use
- Paste system, user, RAG, and extra text into the matching panels.
- Choose tokenizer — Match the mode you use for estimates (approx vs cl100k vs o200k).
- Set context limit — Preset or custom; align with your model's window.
- Reserve completion tokens — Room for the answer (and tools if needed).
- Adjust — If over budget, shrink the largest section or raise the limit.
Use cases
| Scenario | How this helps |
|---|---|
| RAG pipelines | See whether retrieved chunks + system + user fit before the request hits the API. |
| Long system prompts | Isolate system tokens vs user message to decide what to compress. |
| Agents & tools | Park tool schemas and JSON in Extra and watch total vs reserved completion space. |
| Teaching & docs | Demonstrate context windows and why “prompt length” matters. |
Limits
- Not billing-accurate for non–OpenAI-compatible tokenizers.
- API wrappers may add hidden tokens (chat templates); counts here are raw pasted text only.
- Share URLs encode state in the link — avoid sharing secrets.
Related terms
People search for context window calculator, LLM prompt token counter, RAG token budget, system prompt token count, max context tokens, tiktoken context budget, cl100k prompt size, and how many tokens is my prompt. This tool answers that with split sections and a clear limit line.
FAQ
Is Token & context budget free?
Yes. Counting runs in your browser.
What is the difference vs the token calculator?
This page splits prompt into system / user / RAG / extra and compares to a context limit with reserved completion space. The token calculator focuses on one blob plus illustrative cost.
Why am I “over budget” but my API still works?
Your provider may use a different tokenizer, template, or a larger effective window. Use this as a planning signal, then confirm in their dashboard or logs.
What does Share do?
It copies a URL that encodes your sections and settings so you can reopen the same scenario later.
Similar tools
Other Spoold utilities that pair with context budgeting:
Conclusion
Use Token & context budget to split prompts and stay inside your model window. For single-block counts and cost hints, use Token calculator; for VRAM, use LLM RAM / VRAM; for latency what-ifs, use Token generation speed.