Technique Updated 2026-04

Tokenizer

Definition

A tokenizer is the algorithm that splits text into tokens (elementary units) before it is processed by an LLM.

Frequently Asked Questions

Why is the tokenizer important?
It determines how many tokens a text consumes, thus the cost and whether text fits in the context window. A bad tokenizer wastes tokens.
Do all LLMs use the same tokenizer?
No. OpenAI uses tiktoken, Anthropic and Google have their own. The same text can be 100 tokens on GPT-4 and 120 on Claude.