Toklen — token counter for GPT, Claude, and Gemini

Paste any prompt to count its tokens, estimate the API cost, and see how much of the model's context window it uses. Pick from ten GPT, Claude, and Gemini models; everything runs in your browser with no sign-up and no text leaving the page.

OpenAI

Input Text

Tokens

Words

Characters

Context Window

0 / 128,000 tokens (0%)

0%75%90%100%

Input Cost

$0.000000

per 1M tokens: $2.50

Prices as of March 2026 — 100% client-side, nothing leaves your browser

Token counts are approximations based on each model's tokenizer. Actual usage and costs may differ. Pricing reflects published rates at time of last update.

Reference

Token counting code reference: count GPT, Claude, and Gemini tokens in any language

GPT, Claude, and Gemini each tokenize text with a different vocabulary, so counting tokens in code means picking the right library for the right model. What follows is the programmatic version of that work: provider SDKs, language ports, raw HTTP calls, and the model and encoding details you need when wiring token accounting into a pipeline or test harness.

Python: count tokens per provider

Task	Snippet
GPT (any), exact and local	import tiktoken; enc = tiktoken.encoding_for_model("gpt-4o"); len(enc.encode(text))
Claude, exact via API	client.messages.count_tokens(model="claude-sonnet-4", messages=[{"role":"user","content":text}]).input_tokens
Claude, approximate and local	import tiktoken; len(tiktoken.get_encoding("cl100k_base").encode(text))
Gemini, exact via API	model = genai.GenerativeModel("gemini-2.5-pro"); model.count_tokens(text).total_tokens
Pick encoding by name	tiktoken.get_encoding("o200k_base")
Decode tokens back to text	enc.decode([1234, 5678])
Inspect token boundaries	[enc.decode_single_token_bytes(t) for t in enc.encode(text)]

The tiktoken Python package ships the BPE vocabularies as compressed blobs and tokenizes in a Rust core. Anthropic and Google's exact endpoints bill against your rate limit, so for inputs over a few hundred tokens cache the result by hash.

JavaScript and TypeScript

Browser, Node, and edge runtimes. The browser path is what Toklen uses internally.

Task	Snippet
Browser, js-tiktoken (no WASM)	import { encodingForModel } from "js-tiktoken"; encodingForModel("gpt-4o").encode(text).length
Browser or Node, @dqbd/tiktoken (WASM)	const enc = new Tiktoken(model.bpe_ranks, model.special_tokens, model.pat_str); enc.encode(text).length; enc.free();
Edge runtimes, gpt-tokenizer	import { encode } from "gpt-tokenizer/model/gpt-4o"; encode(text).length
Claude via Anthropic SDK	const { input_tokens } = await client.messages.countTokens({ model: "claude-sonnet-4", messages: [{ role: "user", content: text }] });
Gemini via Google SDK	const { totalTokens } = await ai.models.countTokens({ model: "gemini-2.5-pro", contents: text });

The WASM build (@dqbd/tiktoken) is faster on long inputs but ships ~1.4MB of WASM and requires unsafe-eval in your CSP. The pure-JS port (js-tiktoken) is ~700KB and works under stricter CSP, which makes it the right choice for static sites.

Always call .free() on @dqbd/tiktoken encoders. The WASM heap leaks otherwise.

Go

import "github.com/pkoukk/tiktoken-go"

enc, _ := tiktoken.EncodingForModel("gpt-4o")
n := len(enc.Encode(text, nil, nil))

For Claude, use anthropic-sdk-go's Messages.CountTokens. For Gemini, google.golang.org/genai's Models.CountTokens.

The tiktoken-go package downloads encoding files on first call. In a container, pre-warm by pinning TIKTOKEN_CACHE_DIR and running a one-off encode at build time.

Rust

use tiktoken_rs::cl100k_base;

let bpe = cl100k_base()?;
let n = bpe.encode_with_special_tokens(text).len();

tiktoken-rs is the canonical port. For other providers there is no exact-count Rust client, so route through the HTTP endpoints with reqwest.

Shell and curl

Task	Snippet
Claude count via HTTP	curl https://api.anthropic.com/v1/messages/count_tokens -H "x-api-key: $KEY" -H "anthropic-version: 2023-06-01" -H "content-type: application/json" -d '{"model":"claude-sonnet-4","messages":[{"role":"user","content":"..."}]}'
Gemini count via HTTP	curl "https://generativelanguage.googleapis.com/v1beta/models/gemini-2.5-pro:countTokens?key=$KEY" -H "content-type: application/json" -d '{"contents":[{"parts":[{"text":"..."}]}]}'
Tokenize stdin (Python one-liner)	python -c "import sys,tiktoken; print(len(tiktoken.encoding_for_model('gpt-4o').encode(sys.stdin.read())))" < prompt.txt
Wordcount-style pipe	cat prompt.txt \| python -m tiktoken count gpt-4o (requires pip install tiktoken-cli)

OpenAI returns token usage on every completion response. For tracking spend in production, prefer response.usage.prompt_tokens over recomputing.

Encoding reference

The four encodings tiktoken ships, plus what each is for.

Encoding	Vocabulary	Pattern style	Used by
o200k_base	199,997	Unicode + word-segmented	GPT-4o, GPT-4o-mini, o1, o1-mini, o3, o3-mini
cl100k_base	100,277	Unicode + word-segmented	GPT-4, GPT-4-turbo, GPT-3.5-turbo, text-embedding-3-*
p50k_base	50,281	GPT-2 style	text-davinci-003, code-davinci-002
r50k_base	50,257	GPT-2 style	GPT-3 (davinci, curie, babbage, ada)

Pick o200k_base for new GPT work, cl100k_base as the de-facto fallback for Anthropic, the older two only when targeting legacy OpenAI APIs.

Model reference: tokenizer, context, pricing per 1M tokens

Prices as of May 2026; provider listings change without breaking SDK behavior, so verify against current docs before billing logic.

Model	Tokenizer	Context	Input	Output
GPT-4o	o200k_base	128,000	$2.50	$10.00
GPT-4o-mini	o200k_base	128,000	$0.15	$0.60
GPT-4-turbo	cl100k_base	128,000	$10.00	$30.00
GPT-4	cl100k_base	8,192	$30.00	$60.00
GPT-3.5-turbo	cl100k_base	16,385	$0.50	$1.50
o1	o200k_base	200,000	$15.00	$60.00
o1-mini	o200k_base	128,000	$1.10	$4.40
o3	o200k_base	200,000	$10.00	$40.00
o3-mini	o200k_base	200,000	$1.10	$4.40
Claude Opus 4	Claude BPE	200,000	$15.00	$75.00
Claude Sonnet 4	Claude BPE	200,000	$3.00	$15.00
Claude 3.7 Sonnet	Claude BPE	200,000	$3.00	$15.00
Claude 3.5 Sonnet	Claude BPE	200,000	$3.00	$15.00
Claude 3.5 Haiku	Claude BPE	200,000	$0.80	$4.00
Claude 3 Haiku	Claude BPE	200,000	$0.25	$1.25
Gemini 2.5 Pro	SentencePiece	2,000,000	$1.25 / $2.50	$5.00 / $10.00
Gemini 2.5 Flash	SentencePiece	1,000,000	$0.075 / $0.15	$0.30 / $0.60
Gemini 2.0 Flash	SentencePiece	1,000,000	$0.075	$0.30
Gemini 1.5 Pro	SentencePiece	2,000,000	$1.25 / $2.50	$5.00 / $10.00
Gemini 1.5 Flash	SentencePiece	1,000,000	$0.075	$0.30

Gemini's split pricing reflects the under/over 128k tier; the second number applies to prompts above 128,000 tokens. Anthropic's batch API halves these prices.

Cost math, in one place

cost_usd = (input_tokens  / 1_000_000) * input_price_per_million
         + (output_tokens / 1_000_000) * output_price_per_million

Anthropic and OpenAI both offer prompt caching. Cached input tokens bill at 10 to 25 percent of the normal rate depending on provider. For Anthropic, the cache-aware formula is:

cost = (cache_create * 1.25 * input_price
      + cache_read   * 0.10 * input_price
      + uncached            * input_price) / 1_000_000

OpenAI's automatic cache discounts cached input to 50 percent (25 percent on some models) with no explicit cache_control block. The API returns cached_tokens in the usage object.

Common pitfalls

Symptom	Cause	Fix
Token count for the same string differs across libraries	Library defaults to an older encoding (e.g., p50k_base) for a GPT-4o model	Pin o200k_base explicitly with tiktoken.get_encoding("o200k_base")
Off by ~5 tokens vs API usage	Chat messages add per-message overhead (role + name + separator tokens)	Add ~4 tokens per message and ~3 tokens for assistant priming, or trust usage from the API response
Counts double on Unicode emoji	Encoding text without normalizing form	Run text = unicodedata.normalize("NFC", text) before encoding
Counts wildly off for code-heavy prompts	Wrong encoding family for the model	Look up encoding by model name, not family ("gpt-4" → cl100k_base, "gpt-4o" → o200k_base)
Tiktoken.encode is slow in browser	@dqbd/tiktoken WASM cold start; vocabulary loads on first call	Pre-warm on idle, or use js-tiktoken which skips the WASM init
Memory leak in long-running Node process	Not calling .free() on WASM encoders	Wrap encoder use in try/finally with enc.free() in the finally
Claude count_tokens returns 401	Anthropic version header missing	Send anthropic-version: 2023-06-01 on every request
Gemini countTokens returns 0	Sending text in the wrong shape	Use contents: [{ parts: [{ text: "..." }] }], not bare {text}
Context bar shows >100% but request still succeeds	Provider counts input only; output reserved separately	Subtract max_tokens from context window before computing input percentage
Tool definitions inflate token count	Function/tool specs count against input budget	Include the rendered tool spec when measuring, not just user text

Tokens per request: rough estimates by content type

Useful when you do not have the text yet but need to size a prompt.

Content	Approximate tokens
English prose	1 token per ~4 characters; 0.75 tokens per word
Code (Python, TypeScript)	1 token per ~3.5 characters
Code (Java, C#, long camelCase)	1 token per ~3 characters
Compact JSON	1 token per ~2.5 characters
Pretty-printed JSON	1 token per ~3.5 characters
Chinese, Japanese, Korean	1.5 to 2 tokens per character
Emoji	2 to 4 tokens per emoji
Base64 blob	1 token per ~3 characters
Markdown table	30 to 50 percent overhead vs the raw cell text

These are calibration medians, not guarantees. For billing logic, count the real tokens.

Related concepts

Prompt caching: OpenAI's automatic prompt cache and Anthropic's cache_control blocks both reduce input cost at the price of slightly higher first-write cost.
Batch API: OpenAI and Anthropic offer 50 percent off for async batch processing with a 24-hour SLA.
Tool use overhead: tool and function definitions count against your input budget. A typical 5-tool spec adds 500 to 1,500 tokens before any user input.
System prompts: Claude bills system prompts as input tokens. OpenAI bills system messages the same way.
Max tokens vs context window: most providers' "context window" is the combined input + output. max_tokens is the output reservation that reduces input headroom.