What to ingest next, and why. Each entry includes which concept pages it would touch and which open thread (if any) it would advance. Delete entries when ingested.

Format

- **<Title or arXiv ID>** — *Why it matters: <one line>.* Touches: [[concepts/foo]], [[threads/bar]].

Frontier model reports

  • Llama 3 Technical Report (arXiv:2407.21783) — Why it matters: post-LLaMA-2 frontier open-weight reference; documents the current open-source training stack. Touches: meta-ai-fair, pre-training, rlhf, post-rlhf-alignment-stack.
  • DeepSeek-V3 Technical ReportWhy it matters: open MoE at 671B with novel auxiliary-loss-free balancing; advances the MoE-at-scale story. Touches: mixture-of-experts, deepseek.
  • OLMo / OLMo 2Why it matters: fully open training data + recipe for a competent base model — rare and important for reproducibility. Touches: pre-training, scaling-laws.

Reasoning & RL

Long context & inference

  • RULER: What’s the Real Context Size of Your Long-Context Language Models?Why it matters: distinguishes nominal context from effective context — directly testable answer to lost-in-the-middle. Touches: long-context, when-does-long-context-fail.
  • YaRN / NTK-aware RoPE scalingWhy it matters: how context windows are extended after training; the practical mechanism behind 128K+ models. Touches: positional-encoding, long-context.
  • PagedAttention follow-ups (vLLM v1)Why it matters: production inference at scale; the dominant serving architecture. Touches: kv-cache, inference-efficiency, continuous-batching.

Alignment

Architectures

  • Hybrid SSM+Attention models (Jamba, Zamba) — Why it matters: tests whether attention is replaceable for some layers; relevant if quadratic attention is the long-context bottleneck. Touches: ssm-mamba, transformer, long-context.
  • Mamba-2Why it matters: state-space duality; closes the algorithmic gap with attention. Touches: ssm-mamba, albert-gu, tri-dao.

Multimodal

  • Sora technical reportWhy it matters: scale-up of video diffusion; the canonical reference for video generation at the frontier. Touches: video-generation, diffusion-models.
  • Gemini 2 / Gemini Ultra technical reportsWhy it matters: multimodal frontier; closes the gap from Gemini 1.5. Touches: google-deepmind, long-context.