6 docs tagged with "Generative AI"

How LLM Inference Works: From Prompt to Generated Tokens

Understand how LLM inference works, including tokenization, prefill and decode phases, KV cache, attention, sampling, and how transformer models generate text token by token in production systems.

LLM Architecture Overview: How Modern Language Models Are Built

A system-level overview of LLM architecture, including transformer blocks, encoder vs decoder models, attention mechanisms, training and inference flow, and how all components fit together in modern AI systems.

LLM Attention Mechanism: How Models Focus on What Matters

Understand the attention mechanism that powers Large Language Models (LLMs), including self-attention, query-key-value vectors, attention scores, multi-head attention, and why attention replaced recurrent neural networks.

LLM Embeddings Explained: How AI Understands Meaning and Similarity

Learn what embeddings are, how they convert text into vectors, why they are essential for semantic search and RAG systems, and how modern LLMs represent meaning in high-dimensional space.

LLM Model Parameters Explained: What 7B, 70B, and 405B Really Mean

Learn what LLM model parameters are, how they are trained, and why 7B, 70B, and 405B models differ in capability, memory, and cost.

LLM Tokens Explained: The Building Blocks of Large Language Models

Learn what tokens are in Large Language Models, how tokenization works, why token counts matter for context windows and pricing, and how modern LLMs process text.