A memory palace for frontier ML. Every page is written so you can re-derive the idea from first principles, six months from now, without re-reading the source.

120 concepts · 98 sources · 16 entities

How to use this site. Skim Overview for scope. For a structured way in, follow a Learning Path. To track what’s still unknown, read Open Threads. To see what’s queued, read Reading Queue. The graph view (top-right) is the recall map — pages cluster by neighborhood, not by directory.


Anchor Pages — if you only remember twelve

If these twelve pages stay sharp, the rest can be reconstructed. Each links into a dense neighborhood; together they cover architecture, training, alignment, inference, scale, and reasoning.

  1. Attention — the routing primitive that replaced recurrence.
  2. Transformer — the block that scales.
  3. Scaling Laws — the map: loss as a power law in compute, data, params.
  4. Pre-training — next-token prediction at scale; what makes a base model.
  5. SFT — turning a base model into a follower of instructions.
  6. RLHF — preferences over a reward model, optimized with PPO.
  7. DPO — RLHF without RL: the policy is the reward model.
  8. In-Context Learning — why prompting works at all.
  9. Chain-of-Thought — exposing reasoning to the model itself.
  10. KV Cache — why inference is fast and why it costs memory.
  11. MoE — sparse activation; parameters without proportional compute.
  12. Emergent Abilities — and the active debate over whether the phase change is real.

Recently Ingested


Open Threads

Forward-looking research questions tracked across sources. Each thread accumulates evidence and updates its current best understanding as new work lands.

Browse all threads.


Learning Paths

Sequences, not browsing. Each path is ordered so each step makes the next legible.


Concepts by Theme

Architecture Transformer · Attention · FlashAttention · Mamba · Positional Encoding · Vision Transformer · Early Fusion · Patch Embeddings · Encoder-Decoder · Cross-Attention · Sliding-Window Attention · Grouped-Query Attention · Residual Connections

Training Pre-training · Scaling Laws · Compute-Optimal Training · Grokking · Distillation · SFT · RLHF · DPO · PPO · GRPO · Alignment · Constitutional AI · Contrastive Learning · Emergent Abilities · Phase Transition · Reward Model · LoRA · Fine-Tuning

Inference KV Cache · Speculative Decoding · Quantization · Continuous Batching · Inference Efficiency · Memory Efficiency · Sampling · Dynamic Computation

Reasoning & Agents In-Context Learning · Chain-of-Thought · Self-Consistency · Self-Critique · Reasoning RL · Tool Use · Instruction Following · RAG

Multimodal & Vision Multimodal Embeddings · Multimodal Instruction Tuning · Vision-Language Models · Visual Grounding · Open-Vocab Segmentation · Promptable Segmentation · Diffusion Models · Video Generation · Zero-Shot Transfer · Latent Space · VAE

Foundations Optimization · SGD · Momentum · Adaptive LR · Bias Correction · Batch Norm · Vanishing Gradients · Inductive Bias · Tokenization · Subword Units · Vocabulary · Temperature · Uncertainty

Browse all concepts.


Entities

Labs. OpenAI · Anthropic · Google DeepMind · Google Brain · Google Research · FAIR · Microsoft Research · DeepSeek · TII UAE · UC Berkeley Sky Lab · Stanford Hazy Research

Authors. Ashish Vaswani · Noam Shazeer · Jason Wei · Tri Dao · Albert Gu

Browse all entities.


Wiki Health

  • Stub backlog: recall-audit tracks pages still on the old What It Is / Why It Matters / How It Works schema. Upgrading them to recall-first is the standing chore.
  • Validate: python3 check_broken_links.py before commit.
  • Refresh: python3 update_index.py regenerates stats and Recently Ingested.