Question 1

How do I count tokens for an LLM prompt?

Accepted Answer

Paste your text into Toklen, pick the model you're sending it to (GPT-4o, Claude 3.5 Sonnet, Gemini 1.5 Pro, or one of seven others), and the token count updates in real time as you type. The count is computed in your browser via js-tiktoken — no server round-trip and no sign-up. For OpenAI models the count is exact; for Claude and Gemini it's an approximation via cl100k_base, typically within 5-15% for English text.

Question 2

How do I count tokens for a Claude prompt?

Accepted Answer

Pick Claude 3.5 Sonnet, Claude 3 Opus, or Claude 3 Haiku in Toklen's model selector and paste the prompt. Anthropic doesn't publish the tokenizer Claude uses in production, so Toklen approximates the count using OpenAI's cl100k_base encoding — usually within 5-15% of the true count for English text. The result shows an approximation badge so you know it's an estimate. For exact counts call Anthropic's count_tokens API; for prompt planning and cost estimation the approximation is normally close enough.

Question 3

How do I count tokens for a GPT-4 or GPT-5 prompt?

Accepted Answer

Pick GPT-4o, GPT-4o Mini, GPT-4 Turbo, or GPT-3.5 Turbo in Toklen and paste your prompt. The count is exact because Toklen runs OpenAI's real tokenizer (tiktoken) compiled to JavaScript. Toklen auto-selects the right encoding per model: o200k_base for GPT-4o family and cl100k_base for GPT-4 Turbo and GPT-3.5. When OpenAI ships a GPT-5 family model, the same exact-count behaviour will apply once its encoding is added.

Question 4

How do I estimate the cost of an LLM API call?

Accepted Answer

Toklen multiplies the input token count by each model's published per-million-token input price and shows the dollar cost to six decimal places, updating live as you type. Pricing is current as of each release and is shown alongside the cost so you can verify. For OpenAI models the cost is exact on the input side; for Claude and Gemini it's an approximation because the input token count itself is approximated. Output cost is not included — Toklen only counts what you provide, not the model's response.

Question 5

How much of an LLM context window does my prompt use?

Accepted Answer

Toklen shows a visual context window bar under every prompt: it fills as a percentage of the selected model's context window (for example 128K tokens for GPT-4o, 200K for Claude 3.5 Sonnet, 2M for Gemini 1.5 Pro) and shifts color from green to amber to red as you approach the limit. You can see at a glance whether you have headroom for system prompts, retrieved documents, or a longer response — no math required.

Question 6

Why do Claude and Gemini show an approximation badge?

Accepted Answer

Anthropic and Google don't publish the tokenizers their models use in production. Toklen uses OpenAI's cl100k_base as a proxy for Claude and Gemini because it's open source, well-tested, and produces token counts that are typically within 5-15% of the true count for English text. The approximation gets less accurate for code (which tokenizers handle differently), non-English languages, and very short inputs. If you need exact counts for Claude, use Anthropic's official count_tokens API endpoint; for Gemini, use Google's countTokens method. For prompt planning and cost estimation, the cl100k_base approximation is usually close enough.

Question 7

Why does the same text produce different counts per model?

Accepted Answer

Each model family trains its own tokenizer with a different vocabulary size and merge strategy. GPT-4o uses o200k_base (~200K tokens in the vocabulary), GPT-4 Turbo and GPT-3.5 use cl100k_base (~100K tokens), and older models like Codex used p50k_base. A word like "tokenization" might be one token in o200k_base and three tokens in cl100k_base because the newer tokenizer learned that merge during training. Toklen auto-selects the correct tokenizer when you pick an OpenAI model and falls back to cl100k_base for Claude and Gemini.

Question 8

How accurate are the cost estimates?

Accepted Answer

For OpenAI models, cost estimates are exact down to the token because Toklen uses the real tokenizer plus OpenAI's published per-million-token pricing for input. Output costs are harder to predict because you don't know how long the response will be in advance — Toklen only counts your input. For Claude and Gemini, the input token count is approximated via cl100k_base (typically 5-15% variance), so cost estimates are approximations too. Pricing tiers are updated when the providers change them; check the model's official pricing page if you're making budget decisions for a large run.

Question 9

Does Toklen send my text to any server?

Accepted Answer

No. Tokenization runs entirely in your browser via js-tiktoken, a pure JavaScript port of OpenAI's tiktoken library. Your prompt, code, or document never leaves your device. There's no backend, no API call, no database. You can open dev tools and watch the network tab — the only requests are loading the page itself and optional consent-gated PostHog analytics, which never include your text content. This matters if you're working with proprietary prompts, customer data, or anything under NDA.

Question 10

Which models are supported?

Accepted Answer

Ten models across three providers. OpenAI: GPT-4o, GPT-4o Mini, GPT-4 Turbo, GPT-3.5 Turbo (exact counts via tiktoken). Anthropic: Claude 3.5 Sonnet, Claude 3 Opus, Claude 3 Haiku (approximated via cl100k_base). Google: Gemini 1.5 Pro, Gemini 1.5 Flash, Gemini 1.0 Pro (approximated via cl100k_base). Each has its context window and pricing baked in, so switching models instantly updates the context gauge and cost estimate.

Question 11

How is this different from OpenAI's tokenizer page?

Accepted Answer

OpenAI's tokenizer only counts OpenAI models and doesn't estimate cost or show context window usage. Toklen covers GPT, Claude, and Gemini in one interface, shows a color-coded context window bar so you can see at a glance how much of the model's capacity you're using, calculates per-model cost so you can compare what the same prompt would cost on each, and runs without a sign-in or usage cap. If you bounce between providers during development, it's one tab instead of three.

Provider	Tokenizer	Vocab Size	Algorithm	Released?	Used By
OpenAI	`cl100k_base`	100,277	BPE	Yes (tiktoken)	GPT-3.5-turbo, GPT-4, GPT-4-turbo
OpenAI	`o200k_base`	199,997	BPE	Yes (tiktoken)	GPT-4o, GPT-4o-mini, o1, o3
Anthropic	`Claude BPE`	undisclosed	BPE (per docs)	No	Claude 3 Haiku, Sonnet, Opus; Claude 3.5 Sonnet
Google	`SentencePiece`	undisclosed (est. 256k)	Unigram LM	No (Gemini API exposes countTokens)	Gemini 1.5 Flash, Pro; Gemini 2.0
Meta	`Llama tokenizer`	128,000	BPE	Yes (HuggingFace)	Llama 3, 3.1, 3.3
Mistral	`Mistral tokenizer`	32,000	BPE	Yes (mistral-common)	Mistral 7B, Mixtral, Mistral Large

Model	Context Window	Input Pricing	Output Pricing
GPT-4o	128,000	$2.50 / 1M	$10.00 / 1M
GPT-4o-mini	128,000	$0.15 / 1M	$0.60 / 1M
Claude 3.5 Sonnet	200,000	$3.00 / 1M	$15.00 / 1M
Claude 3 Opus	200,000	$15.00 / 1M	$75.00 / 1M
Gemini 1.5 Pro	2,000,000	$1.25 / 1M (<=128k)	$5.00 / 1M (<=128k)
Gemini 1.5 Flash	1,000,000	$0.075 / 1M	$0.30 / 1M

How LLM token
counting works

Byte Pair Encoding and the Vocabulary Problem

Tokenizer Specifications by Provider

Context Window Limits across GPT, Claude, and Gemini

Where cl100k_base Diverges from Proprietary Tokenizers

FAQ

How do I count tokens for an LLM prompt?

How do I count tokens for a Claude prompt?

How do I count tokens for a GPT-4 or GPT-5 prompt?

How do I estimate the cost of an LLM API call?

How much of an LLM context window does my prompt use?

Why do Claude and Gemini show an approximation badge?

Why does the same text produce different counts per model?

How accurate are the cost estimates?

Does Toklen send my text to any server?

Which models are supported?

How is this different from OpenAI's tokenizer page?

Your text stays on your device

Ready to count tokens?

How LLM tokencounting works

Byte Pair Encoding and the Vocabulary Problem

Tokenizer Specifications by Provider

Context Window Limits across GPT, Claude, and Gemini

Where cl100k_base Diverges from Proprietary Tokenizers

FAQ

How do I count tokens for an LLM prompt?

How do I count tokens for a Claude prompt?

How do I count tokens for a GPT-4 or GPT-5 prompt?

How do I estimate the cost of an LLM API call?

How much of an LLM context window does my prompt use?

Why do Claude and Gemini show an approximation badge?

Why does the same text produce different counts per model?

How accurate are the cost estimates?

Does Toklen send my text to any server?

Which models are supported?

How is this different from OpenAI's tokenizer page?

Your text stays on your device

Ready to count tokens?

How LLM token
counting works