One doc tagged with "Token Generation"

How LLM Inference Works: From Prompt to Generated Tokens

Understand how LLM inference works, including tokenization, prefill and decode phases, KV cache, attention, sampling, and how transformer models generate text token by token in production systems.