Spoold is a free, privacy-first developer toolbox. Paste into the Magic Box and it detects JSON, HTML, JWT, curl, OpenAPI, CSV, timestamps, and more—then suggests the best tool. You can also browse the full catalog by category. No sign-up is required.

Is Spoold free to use?

Yes. Core tools are free. Heavy work runs in your browser so your payloads are not processed on Spoold servers for formatting, decoding, and similar utilities.

Is my data safe with Spoold?

Tool processing for supported client-side utilities happens in your browser. Encrypted share links are designed so only people with the link can read the payload. Review the Privacy Policy for details on local storage, analytics, support, and third-party services.

What tools are available?

Spoold ships 72+ tools including JSON formatter and diff, JSON Schema validation, YAML and TOML converters, HTML and Markdown preview, JWT decode and sign, OpenAPI/Swagger viewer with curl, GraphQL formatter, curl to code and curl compare, certificate viewer, CSV preview, regex tester, Mermaid, code editor, LLM token utilities, QR codes, and many encoding and text utilities.

Yes. Spoold uses optional support and in-layout promotions for its own tools instead of third-party ad placements. You can support development from the Support Spoold page.

Can I use keyboard shortcuts?

Yes. Press Cmd or Ctrl+K to open tool search, use / from the homepage, and use per-tool shortcuts (shown in the UI) such as jj for JSON and hh for HTML where configured.

Vision Token Estimator — GPT-4o Image Tokens, Claude & Gemini Multimodal Calculator (Online)

Guide: Vision token estimator

↑ Back to tool

What is this tool?

A free multimodal image token calculator and vision token estimator for planning API calls. Enter image width and height (or upload / paste a file to read dimensions locally), pick a provider rule set — OpenAI-style tiling, Anthropic pixel math, Gemini tiles, or a custom grid — and see per-image and total image tokens. Optionally add a rough text prompt token count (from our token calculator) for a combined planning number. Not billing — real counts depend on model revision, API version, and server-side resizing; always confirm in your provider's dashboard or official token counter.

What does 10,000 tokens look like?

A visual scale for multimodal budgeting — when someone says "stay under 10k" or you are weighing text vs image in one request. Figures are rules of thumb for English-like text unless noted; code and JSON differ. Framing aligns with public cheatsheets such as the OpenAI token cheatsheet. For the same table on the text-focused tool, see Token calculator → What 10K tokens looks like.

Scale	≈ 10,000 tokens
Words & characters	≈ 7,500 words · ≈ 40,000 characters Rule of thumb: `1 token ≈ ¾ word ≈ 4 chars` (English prose)
Printed pages	15 pages single-spaced · 30 pages double-spaced — about one dense book chapter.
Conversation	≈ 45–50 minutes of two-way chat (rough), depending on turns and verbosity — useful when planning agent or summarizer context.
Code footprint	On the order of ~2,300 lines of well-commented Apex, or a full Lightning Web Component library — language and style change the ratio a lot.
JSON / data	~350 KB raw JSON; ballpark ~4,000 trimmed Case-style records — handy for vector-chunk and ingestion planning.
Images (vision)	A 1024 × 1024 photo: `detail:"low"` ≈ 85 tokens · `detail:"high"` ≈ 765 tokens (OpenAI-style tiling). Crop, resize, or use caption + URL to stay lean — use the estimator above for your exact pixels.
Docs / slides	A 15-slide deck at ~75 words/slide ≈ 1,500 tokens of slide text. OCR'd scans → chunk → embed for RAG.
Customer / cases	Ballpark: 150 multi-note support-style cases (e.g. Service Cloud–scale threads) ≈ 10k tokens total — enough for root-cause clustering and agent-style actions over a corpus.

Understanding token usage

What is a token?

Tokens are the basic units models process. They are not always whole words — pieces of words, characters, or larger chunks depending on the tokenizer. Languages tokenize differently; English often lands near ~0.75 words per token on average, with wide variation.

Why tokens matter

Token counts drive context limits, cost, and latency. For large applications, efficient use affects budget and responsiveness; multimodal requests mix text tokens with image tokens in the same window.

Optimizing token usage

Prefer concise prompts and structured formats when they help; choose lower image detail when full resolution is not needed; chunk large documents for RAG; monitor patterns in your provider dashboard and adjust.

Providers & formulas

The app implements published-style rules of thumb aligned with common docs (see breakdown lines in the tool). Summaries:

OpenAI (GPT-4o / GPT-4–class vision) — For high detail: fit inside a 2048² box, scale shortest side to ~768px, count 512×512 tiles, then 85 + 170 × tiles per image. Low uses a fixed 85 tokens per image.
Anthropic (Claude vision) — Approximately ceil((width × height) / 750) tokens per image. Very large inputs may be downscaled by the API.
Google Gemini — If both sides are ≤384px, 258 tokens; otherwise 768×768 tiles × 258 tokens per image. Use countTokens in the Gemini API for exact multimodal counts.
Custom grid — Set patch size (px), tokens per patch, and base tokens; the tool uses base + ceil(w/p) × ceil(h/p) × tokensPerPatch per image.

OpenAI detail levels & model chips

Low, high, and auto match the usual Chat Completions / Responses detail semantics: auto is a planning range (floor at 85×images through the high-detail upper bound). The GPT-4o / GPT-4 Turbo / … chips are labels only — the same tiling formula is used in-app; verify usage lines for your exact model ID.

Cheatsheet-style example: a 1024 × 1024 photo is often quoted as ~85 tokens at detail:"low" vs ~765 tokens at detail:"high" — crop, caption+URL, or use low detail to save budget. A broader 10,000-token scale (words, pages, code, cases) is in What 10K tokens looks like above; the same framing appears on this token cheatsheet and in the Token calculator guide.

Features

Local image load — Drag-and-drop, file picker, or paste from clipboard; dimensions never leave your device for counting.
Size presets — Quick chips (384² through 2048², common photo sizes) plus manual width/height.
Multiple images — 1–32 images; totals multiply per-image tokens.
Breakdown — Expandable list explaining the calculation for the selected provider.
Text add-on — Manual token field to combine with image total for end-to-end planning.

How to use

Choose provider — OpenAI, Anthropic, Gemini, or Custom grid.
Set dimensions — Upload/paste an image, or enter width and height; use presets if helpful.
Configure options — OpenAI: model label + detail. Custom: patch size and token rates.
Set image count — How many images in one request.
Optional text tokens — Add prompt size from the token calculator.
Read the estimate — Per image, total image tokens, combined line if text > 0; open Breakdown for details.

Use cases

Scenario	How this helps
Vision + long system prompt	See image tokens plus a manual text count against your context window.
Comparing providers	Switch OpenAI vs Claude vs Gemini on the same pixel size to compare rules of thumb.
Screenshots & UI mockups	Paste captures, read dimensions, estimate high-detail OpenAI tiles or Gemini tiles.
Teaching & docs	Explain why resolution changes multimodal cost before students hit the API.

Limits

Images — Max 16 MB per file; PNG, JPEG, GIF, WebP, etc.
Dimensions — Width and height clamp to 1–16384 px.
Count — 1–32 images per scenario.
No text tokenizer — Text tokens are a number you supply; use Token calculator or your provider for exact text counts.
Estimates only — APIs may resize, crop, or change tokenization; this is planning, not an invoice.

OpenAI: GPT-4o image tokens, GPT-4o mini vision price estimate, OpenAI vision token cost, detail high vs low tokens, 512px tile vision, image_url base64 tokens, how many tokens is my image.

Anthropic & Google: Claude vision tokens per image, Claude 3.5 Sonnet image, Gemini image tokenization, multimodal context budget, video frame token estimate (rough).

Product: LLM image token calculator online, RAG image caption vs raw image tokens, resize image before LLM, thumbnail strategy vision api.

Pair with Token calculator for text and Token budget for mixed prompts—this page approximates vendor rules and may differ from billing.

FAQ

Is the vision token estimator free?

Yes. Estimates run in your browser; images are not uploaded to Spoold for processing.

Why doesn’t my API usage match this number?

Providers may resize images, use different tokenizers, or charge bundled modalities differently. Treat this as a planning range and verify in official tools or billing.

Do GPT-4o and GPT-4 Turbo use different image token formulas here?

No — model chips are labels; the same OpenAI-style tiling is applied. Always confirm against your model’s documentation and usage logs.

How do I add my prompt size?

Use the optional text prompt tokens field with a count from the token calculator or your API’s tokenizer.

What is Custom grid for?

Experiment with patch size and tokens per patch when you are modeling a proprietary or research stack that behaves like a fixed grid.

Similar tools

Pair vision estimates with text and hardware planning:

Conclusion

Use Vision token estimator for quick multimodal token math across OpenAI-, Claude-, and Gemini-style rules. Combine with Token calculator for text, Token & context budget for full prompt budgeting, and LLM RAM / VRAM when you also care about model memory.

Image token estimate