Long Context

What It Is

Long context refers to a language model’s ability to process and reliably reason over very long input sequences — typically hundreds of thousands to millions of tokens. A model has genuine long-context capability when its recall quality stays high across the full window, not just near the start and end.

Why It Matters

Most real-world tasks require more context than a 4k or 8k window allows: full codebases, legal documents, research papers, multi-hour conversations, entire books. Long-context models can take the whole input at once rather than requiring external retrieval pipelines to select relevant chunks.

How It Works

Vanilla self-attention is O(n²) in sequence length, which makes million-token contexts computationally infeasible on a single device. Approaches to extend context include: (1) efficient attention variants like FlashAttention and ring attention that distribute computation across devices; (2) sparse attention patterns that avoid full pairwise comparison; (3) architectural changes like MoE that reduce per-token compute so more tokens can be processed in the same FLOP budget. Training regime matters as much as architecture — models must be trained on genuinely long sequences with dependencies that span the full window, otherwise they learn to ignore distant context even when technically able to attend to it.

Key Sources

attention-is-all-you-need — introduces the O(n²) attention cost that makes long context expensive
gemini-1-5-multimodal-long-context — 1M-token context via sparse MoE; near-perfect needle-in-haystack at 10M tokens
alibi-train-short-test-long — ALiBi’s recency bias enables inference at 2–3× training length; train on 1K tokens, deploy at 2K
self-rag-learning-to-retrieve-generate-critique — adaptive retrieval as an alternative to long context: retrieves only when needed, filtering irrelevant passages before they enter the window

ML Wiki

Explorer

Long Context

What It Is

Why It Matters

How It Works

Key Sources

Graph View

Table of Contents

Backlinks

ML Wiki

Explorer

Long Context

What It Is

Why It Matters

How It Works

Key Sources

Related Concepts

Graph View

Table of Contents

Backlinks