Skip to content
OpenAI
Input Text
Tokens
0
Words
0
Characters
0
Context Window
0 / 128,000 tokens (0%)
0%75%90%100%
Input Cost
$0.000000
per 1M tokens: $2.50
Prices as of March 2026 — 100% client-side, nothing leaves your browser

Token counts are approximations based on each model's tokenizer. Actual usage and costs may differ. Pricing reflects published rates at time of last update.

Reference

Token counting code reference: count GPT, Claude, and Gemini tokens in any language

Toklen counts tokens for GPT, Claude, and Gemini in your browser. The reference below covers how to do the same thing programmatically: provider SDKs, language ports, raw HTTP calls, and the model and encoding details you need when wiring token accounting into a pipeline or test harness.

Python: count tokens per provider

TaskSnippet
GPT (any), exact and localimport tiktoken; enc = tiktoken.encoding_for_model("gpt-4o"); len(enc.encode(text))
Claude, exact via APIclient.messages.count_tokens(model="claude-sonnet-4", messages=[{"role":"user","content":text}]).input_tokens
Claude, approximate and localimport tiktoken; len(tiktoken.get_encoding("cl100k_base").encode(text))
Gemini, exact via APImodel = genai.GenerativeModel("gemini-2.5-pro"); model.count_tokens(text).total_tokens
Pick encoding by nametiktoken.get_encoding("o200k_base")
Decode tokens back to textenc.decode([1234, 5678])
Inspect token boundaries[enc.decode_single_token_bytes(t) for t in enc.encode(text)]

The tiktoken Python package ships the BPE vocabularies as compressed blobs and tokenizes in a Rust core. Anthropic and Google's exact endpoints bill against your rate limit, so for inputs over a few hundred tokens cache the result by hash.

JavaScript and TypeScript

Browser, Node, and edge runtimes. The browser path is what Toklen uses internally.

TaskSnippet
Browser, js-tiktoken (no WASM)import { encodingForModel } from "js-tiktoken"; encodingForModel("gpt-4o").encode(text).length
Browser or Node, @dqbd/tiktoken (WASM)const enc = new Tiktoken(model.bpe_ranks, model.special_tokens, model.pat_str); enc.encode(text).length; enc.free();
Edge runtimes, gpt-tokenizerimport { encode } from "gpt-tokenizer/model/gpt-4o"; encode(text).length
Claude via Anthropic SDKconst { input_tokens } = await client.messages.countTokens({ model: "claude-sonnet-4", messages: [{ role: "user", content: text }] });
Gemini via Google SDKconst { totalTokens } = await ai.models.countTokens({ model: "gemini-2.5-pro", contents: text });

The WASM build (@dqbd/tiktoken) is faster on long inputs but ships ~1.4MB of WASM and requires unsafe-eval in your CSP. The pure-JS port (js-tiktoken) is ~700KB and works under stricter CSP, which makes it the right choice for static sites.

Always call .free() on @dqbd/tiktoken encoders. The WASM heap leaks otherwise.

Go

import "github.com/pkoukk/tiktoken-go"

enc, _ := tiktoken.EncodingForModel("gpt-4o")
n := len(enc.Encode(text, nil, nil))

For Claude, use anthropic-sdk-go's Messages.CountTokens. For Gemini, google.golang.org/genai's Models.CountTokens.

The tiktoken-go package downloads encoding files on first call. In a container, pre-warm by pinning TIKTOKEN_CACHE_DIR and running a one-off encode at build time.

Rust

use tiktoken_rs::cl100k_base;

let bpe = cl100k_base()?;
let n = bpe.encode_with_special_tokens(text).len();

tiktoken-rs is the canonical port. For other providers there is no exact-count Rust client, so route through the HTTP endpoints with reqwest.

Shell and curl

TaskSnippet
Claude count via HTTPcurl https://api.anthropic.com/v1/messages/count_tokens -H "x-api-key: $KEY" -H "anthropic-version: 2023-06-01" -H "content-type: application/json" -d '{"model":"claude-sonnet-4","messages":[{"role":"user","content":"..."}]}'
Gemini count via HTTPcurl "https://generativelanguage.googleapis.com/v1beta/models/gemini-2.5-pro:countTokens?key=$KEY" -H "content-type: application/json" -d '{"contents":[{"parts":[{"text":"..."}]}]}'
Tokenize stdin (Python one-liner)python -c "import sys,tiktoken; print(len(tiktoken.encoding_for_model('gpt-4o').encode(sys.stdin.read())))" < prompt.txt
Wordcount-style pipecat prompt.txt | python -m tiktoken count gpt-4o (requires pip install tiktoken-cli)

OpenAI returns token usage on every completion response. For tracking spend in production, prefer response.usage.prompt_tokens over recomputing.

Encoding reference

The four encodings tiktoken ships, plus what each is for.

EncodingVocabularyPattern styleUsed by
o200k_base199,997Unicode + word-segmentedGPT-4o, GPT-4o-mini, o1, o1-mini, o3, o3-mini
cl100k_base100,277Unicode + word-segmentedGPT-4, GPT-4-turbo, GPT-3.5-turbo, text-embedding-3-*
p50k_base50,281GPT-2 styletext-davinci-003, code-davinci-002
r50k_base50,257GPT-2 styleGPT-3 (davinci, curie, babbage, ada)

Pick o200k_base for new GPT work, cl100k_base as the de-facto fallback for Anthropic, the older two only when targeting legacy OpenAI APIs.

Model reference: tokenizer, context, pricing per 1M tokens

Prices as of May 2026; provider listings change without breaking SDK behavior, so verify against current docs before billing logic.

ModelTokenizerContextInputOutput
GPT-4oo200k_base128,000$2.50$10.00
GPT-4o-minio200k_base128,000$0.15$0.60
GPT-4-turbocl100k_base128,000$10.00$30.00
GPT-4cl100k_base8,192$30.00$60.00
GPT-3.5-turbocl100k_base16,385$0.50$1.50
o1o200k_base200,000$15.00$60.00
o1-minio200k_base128,000$1.10$4.40
o3o200k_base200,000$10.00$40.00
o3-minio200k_base200,000$1.10$4.40
Claude Opus 4Claude BPE200,000$15.00$75.00
Claude Sonnet 4Claude BPE200,000$3.00$15.00
Claude 3.7 SonnetClaude BPE200,000$3.00$15.00
Claude 3.5 SonnetClaude BPE200,000$3.00$15.00
Claude 3.5 HaikuClaude BPE200,000$0.80$4.00
Claude 3 HaikuClaude BPE200,000$0.25$1.25
Gemini 2.5 ProSentencePiece2,000,000$1.25 / $2.50$5.00 / $10.00
Gemini 2.5 FlashSentencePiece1,000,000$0.075 / $0.15$0.30 / $0.60
Gemini 2.0 FlashSentencePiece1,000,000$0.075$0.30
Gemini 1.5 ProSentencePiece2,000,000$1.25 / $2.50$5.00 / $10.00
Gemini 1.5 FlashSentencePiece1,000,000$0.075$0.30

Gemini's split pricing reflects the under/over 128k tier; the second number applies to prompts above 128,000 tokens. Anthropic's batch API halves these prices.

Cost math, in one place

cost_usd = (input_tokens  / 1_000_000) * input_price_per_million
         + (output_tokens / 1_000_000) * output_price_per_million

Anthropic and OpenAI both offer prompt caching. Cached input tokens bill at 10 to 25 percent of the normal rate depending on provider. For Anthropic, the cache-aware formula is:

cost = (cache_create * 1.25 * input_price
      + cache_read   * 0.10 * input_price
      + uncached            * input_price) / 1_000_000

OpenAI's automatic cache discounts cached input to 50 percent (25 percent on some models) with no explicit cache_control block. The API returns cached_tokens in the usage object.

Common pitfalls

SymptomCauseFix
Token count for the same string differs across librariesLibrary defaults to an older encoding (e.g., p50k_base) for a GPT-4o modelPin o200k_base explicitly with tiktoken.get_encoding("o200k_base")
Off by ~5 tokens vs API usageChat messages add per-message overhead (role + name + separator tokens)Add ~4 tokens per message and ~3 tokens for assistant priming, or trust usage from the API response
Counts double on Unicode emojiEncoding text without normalizing formRun text = unicodedata.normalize("NFC", text) before encoding
Counts wildly off for code-heavy promptsWrong encoding family for the modelLook up encoding by model name, not family ("gpt-4" → cl100k_base, "gpt-4o" → o200k_base)
Tiktoken.encode is slow in browser@dqbd/tiktoken WASM cold start; vocabulary loads on first callPre-warm on idle, or use js-tiktoken which skips the WASM init
Memory leak in long-running Node processNot calling .free() on WASM encodersWrap encoder use in try/finally with enc.free() in the finally
Claude count_tokens returns 401Anthropic version header missingSend anthropic-version: 2023-06-01 on every request
Gemini countTokens returns 0Sending text in the wrong shapeUse contents: [{ parts: [{ text: "..." }] }], not bare {text}
Context bar shows >100% but request still succeedsProvider counts input only; output reserved separatelySubtract max_tokens from context window before computing input percentage
Tool definitions inflate token countFunction/tool specs count against input budgetInclude the rendered tool spec when measuring, not just user text

Tokens per request: rough estimates by content type

Useful when you do not have the text yet but need to size a prompt.

ContentApproximate tokens
English prose1 token per ~4 characters; 0.75 tokens per word
Code (Python, TypeScript)1 token per ~3.5 characters
Code (Java, C#, long camelCase)1 token per ~3 characters
Compact JSON1 token per ~2.5 characters
Pretty-printed JSON1 token per ~3.5 characters
Chinese, Japanese, Korean1.5 to 2 tokens per character
Emoji2 to 4 tokens per emoji
Base64 blob1 token per ~3 characters
Markdown table30 to 50 percent overhead vs the raw cell text

These are calibration medians, not guarantees. For billing logic, count the real tokens.

Related concepts

  • Prompt caching: OpenAI's automatic prompt cache and Anthropic's cache_control blocks both reduce input cost at the price of slightly higher first-write cost.
  • Batch API: OpenAI and Anthropic offer 50 percent off for async batch processing with a 24-hour SLA.
  • Tool use overhead: tool and function definitions count against your input budget. A typical 5-tool spec adds 500 to 1,500 tokens before any user input.
  • System prompts: Claude bills system prompts as input tokens. OpenAI bills system messages the same way.
  • Max tokens vs context window: most providers' "context window" is the combined input + output. max_tokens is the output reservation that reduces input headroom.