What to ingest next, and why. Each entry includes which concept pages it would touch and which open thread (if any) it would advance. Delete entries when ingested.
Format
- **<Title or arXiv ID>** — *Why it matters: <one line>.* Touches: [[concepts/foo]], [[threads/bar]].
Frontier model reports
- Llama 3 Technical Report (arXiv:2407.21783) — Why it matters: post-LLaMA-2 frontier open-weight reference; documents the current open-source training stack. Touches: meta-ai-fair, pre-training, rlhf, post-rlhf-alignment-stack.
- DeepSeek-V3 Technical Report — Why it matters: open MoE at 671B with novel auxiliary-loss-free balancing; advances the MoE-at-scale story. Touches: mixture-of-experts, deepseek.
- OLMo / OLMo 2 — Why it matters: fully open training data + recipe for a competent base model — rare and important for reproducibility. Touches: pre-training, scaling-laws.
Reasoning & RL
- DeepSeek-R1-Zero analysis / replications — Why it matters: validates whether pure-RL reasoning is reproducible without distillation. Touches: reasoning-rl, where-does-rl-on-verifiable-rewards-stop.
- Process Reward Models (PRMs) (e.g. “Let’s Verify Step by Step”, OpenAI 2023) — Why it matters: step-level supervision is the alternative to outcome-only RL; tradeoff is unresolved. Touches: reasoning-rl, reward-model, where-does-rl-on-verifiable-rewards-stop.
- STaR / Quiet-STaR — Why it matters: bootstrapping reasoning from the model itself, the predecessor of RL-on-CoT. Touches: chain-of-thought, self-consistency.
Long context & inference
- RULER: What’s the Real Context Size of Your Long-Context Language Models? — Why it matters: distinguishes nominal context from effective context — directly testable answer to lost-in-the-middle. Touches: long-context, when-does-long-context-fail.
- YaRN / NTK-aware RoPE scaling — Why it matters: how context windows are extended after training; the practical mechanism behind 128K+ models. Touches: positional-encoding, long-context.
- PagedAttention follow-ups (vLLM v1) — Why it matters: production inference at scale; the dominant serving architecture. Touches: kv-cache, inference-efficiency, continuous-batching.
Alignment
- Iterative / Online DPO papers (rejection-sampling DPO, IPO, identity DPO) — Why it matters: closes the static-offline-data gap that may explain DPO’s scaling limits. Touches: dpo, does-dpo-scale, post-rlhf-alignment-stack.
- RLAIF at scale (Anthropic, Google) — Why it matters: how preference data scales when humans become the bottleneck. Touches: constitutional-ai, rlhf, post-rlhf-alignment-stack.
Architectures
- Hybrid SSM+Attention models (Jamba, Zamba) — Why it matters: tests whether attention is replaceable for some layers; relevant if quadratic attention is the long-context bottleneck. Touches: ssm-mamba, transformer, long-context.
- Mamba-2 — Why it matters: state-space duality; closes the algorithmic gap with attention. Touches: ssm-mamba, albert-gu, tri-dao.
Multimodal
- Sora technical report — Why it matters: scale-up of video diffusion; the canonical reference for video generation at the frontier. Touches: video-generation, diffusion-models.
- Gemini 2 / Gemini Ultra technical reports — Why it matters: multimodal frontier; closes the gap from Gemini 1.5. Touches: google-deepmind, long-context.