Spoold is a free, privacy-first developer toolbox. Paste into the Magic Box and it detects JSON, HTML, JWT, curl, OpenAPI, CSV, timestamps, and more—then suggests the best tool. You can also browse the full catalog by category. No sign-up is required.

Is Spoold free to use?

Yes. Core tools are free. Heavy work runs in your browser so your payloads are not processed on Spoold servers for formatting, decoding, and similar utilities.

Is my data safe with Spoold?

Tool processing for supported client-side utilities happens in your browser. Encrypted share links are designed so only people with the link can read the payload. Review the Privacy Policy for details on local storage, analytics, support, and third-party services.

What tools are available?

Spoold ships 72+ tools including JSON formatter and diff, JSON Schema validation, YAML and TOML converters, HTML and Markdown preview, JWT decode and sign, OpenAPI/Swagger viewer with curl, GraphQL formatter, curl to code and curl compare, certificate viewer, CSV preview, regex tester, Mermaid, code editor, LLM token utilities, QR codes, and many encoding and text utilities.

Yes. Spoold uses optional support and in-layout promotions for its own tools instead of third-party ad placements. You can support development from the Support Spoold page.

Can I use keyboard shortcuts?

Yes. Press Cmd or Ctrl+K to open tool search, use / from the homepage, and use per-tool shortcuts (shown in the UI) such as jj for JSON and hh for HTML where configured.

LLM Token Speed Calculator — Prefill vs Decode Latency (tok/s)

Guide: Token generation speed

↑ Back to tool

What is this tool?

A free LLM inference speed calculator and token generation latency estimator. Enter prefill (prompt) and decode throughput in tokens per second, plus how many prompt tokens and how many new tokens to generate. The tool shows prefill time, decode time, total latency, and an effective average tok/s over the whole request. Use it for what-if planning — not as a benchmark. Runs in your browser.

Prefill vs decode

Prefill processes the full prompt (often parallel across the sequence) and is often limited by memory bandwidth or attention work on the prompt. Decode generates one token at a time (autoregressive) and is often more compute-bound per step. Real stacks report different tok/s for each phase — this tool keeps them separate so you can match numbers from your profiler or vendor docs.

High prefill, low decode — Long prompts feel slow to "start" even if streaming is steady afterward.
Low prefill, high decode — Short prompts start fast; long answers still take time.

How timing works

With prompt token count P, new tokens N, prefill tok/s Tp, and decode tok/s Td:

Prefill time ≈ P / Tp
Decode time ≈ N / Td
Total ≈ sum of the two (no overlap modeled)
Effective average tok/s ≈ (P + N) / total_seconds

Production systems pipeline, batch, and use speculative decoding — this model is intentionally simple.

Features

Presets — CPU-ish prefill; laptop / desktop / strong GPU (7B Q4, 13B Q4); server 70B multi-GPU (illustrative tok/s pairs).
Custom throughput — Edit prefill and decode tok/s directly.
Token counts — Prompt tokens and new tokens to generate.
Results — Prefill s, decode s, total s, effective average tok/s.

How to use

Pick a preset or type your measured prefill and decode tok/s.
Set prompt tokens — From a tokenizer or rough estimate.
Set new tokens — Expected completion length (max output cap, typical reply, etc.).
Read times — See which phase dominates for your scenario.

Use cases

Scenario	How this helps
RAG / long system prompt	Large P inflates prefill time — compare before trimming context.
Chat UI SLA	Estimate time to first token vs time to finish a 256-token reply.
Hardware comparison	Plug in tok/s from two GPUs and compare total latency for the same P and N.
Education	See why prefill and decode are not interchangeable metrics.

Limits & disclaimers

Presets are fictional ballparks — Not measured on your machine or model.
No batching, speculation, or overlap — Real servers hide latency with concurrency.
Quantization & context affect tok/s — Use numbers from your actual run.
Benchmark with your stack — vLLM, llama.cpp, TensorRT-LLM, etc. each report differently.

Throughput: LLM tokens per second calculator, decode tokens per second, LLM throughput estimator, generation speed calculator, tokens per minute TPM planning, batch inference duration.

Latency: prefill vs decode time, time to first token estimate, TTFT vs TPOT, inference latency calculator, prompt processing speed, long context prefill slow.

Stacks: vLLM throughput table rough, tensorrt-llm benchmark planning, ollama tokens per second mental model, llama.cpp bench extrapolate.

This page splits prefill and decode and sums simple seconds—not a substitute for profiling. Pair with LLM RAM and Token budget.

FAQ

Is this token speed simulator free?

Yes. It runs client-side in your browser.

Why is my effective tok/s lower than decode tok/s?

Effective average divides total tokens (prompt + output) by total time. A long prefill pulls the average down even if decode is fast.

Can I use this as a benchmark?

No — use your framework’s benchmarks and hardware measurements. This is a calculator from numbers you supply.

What if prefill or decode is zero tokens?

Zero prompt tokens makes prefill time 0; zero new tokens makes decode time 0. Throughput inputs are clamped to at least 1 tok/s to avoid divide-by-zero.

Similar tools

Other Spoold utilities that pair with this simulator:

Conclusion

Use Token generation speed to reason about prefill vs decode latency from tok/s and token counts. For VRAM planning, use LLM RAM / VRAM; for token counts, use Token calculator or Token & context budget; for wall-clock conversions, see Unix time.