2 docs tagged with "AI Systems"

How LLM Inference Works: From Prompt to Generated Tokens

Understand how LLM inference works, including tokenization, prefill and decode phases, KV cache, attention, sampling, and how transformer models generate text token by token in production systems.

LLM Architecture Overview: How Modern Language Models Are Built

A system-level overview of LLM architecture, including transformer blocks, encoder vs decoder models, attention mechanisms, training and inference flow, and how all components fit together in modern AI systems.