Reading Queue

What to ingest next, and why. Each entry includes which concept pages it would touch and which open thread (if any) it would advance. Delete entries when ingested.

Format

- **<Title or arXiv ID>** — *Why it matters: <one line>.* Touches: [[concepts/foo]], [[threads/bar]].

Frontier model reports

Llama 3 Technical Report (arXiv:2407.21783) — Why it matters: post-LLaMA-2 frontier open-weight reference; documents the current open-source training stack. Touches: meta-ai-fair, pre-training, rlhf, post-rlhf-alignment-stack.
DeepSeek-V3 Technical Report — Why it matters: open MoE at 671B with novel auxiliary-loss-free balancing; advances the MoE-at-scale story. Touches: mixture-of-experts, deepseek.
OLMo / OLMo 2 — Why it matters: fully open training data + recipe for a competent base model — rare and important for reproducibility. Touches: pre-training, scaling-laws.

Reasoning & RL

DeepSeek-R1-Zero analysis / replications — Why it matters: validates whether pure-RL reasoning is reproducible without distillation. Touches: reasoning-rl, where-does-rl-on-verifiable-rewards-stop.
Process Reward Models (PRMs) (e.g. “Let’s Verify Step by Step”, OpenAI 2023) — Why it matters: step-level supervision is the alternative to outcome-only RL; tradeoff is unresolved. Touches: reasoning-rl, reward-model, where-does-rl-on-verifiable-rewards-stop.
STaR / Quiet-STaR — Why it matters: bootstrapping reasoning from the model itself, the predecessor of RL-on-CoT. Touches: chain-of-thought, self-consistency.

Long context & inference

RULER: What’s the Real Context Size of Your Long-Context Language Models? — Why it matters: distinguishes nominal context from effective context — directly testable answer to lost-in-the-middle. Touches: long-context, when-does-long-context-fail.
YaRN / NTK-aware RoPE scaling — Why it matters: how context windows are extended after training; the practical mechanism behind 128K+ models. Touches: positional-encoding, long-context.
PagedAttention follow-ups (vLLM v1) — Why it matters: production inference at scale; the dominant serving architecture. Touches: kv-cache, inference-efficiency, continuous-batching.

Alignment

Iterative / Online DPO papers (rejection-sampling DPO, IPO, identity DPO) — Why it matters: closes the static-offline-data gap that may explain DPO’s scaling limits. Touches: dpo, does-dpo-scale, post-rlhf-alignment-stack.
RLAIF at scale (Anthropic, Google) — Why it matters: how preference data scales when humans become the bottleneck. Touches: constitutional-ai, rlhf, post-rlhf-alignment-stack.

Architectures

Hybrid SSM+Attention models (Jamba, Zamba) — Why it matters: tests whether attention is replaceable for some layers; relevant if quadratic attention is the long-context bottleneck. Touches: ssm-mamba, transformer, long-context.
Mamba-2 — Why it matters: state-space duality; closes the algorithmic gap with attention. Touches: ssm-mamba, albert-gu, tri-dao.

Multimodal

Sora technical report — Why it matters: scale-up of video diffusion; the canonical reference for video generation at the frontier. Touches: video-generation, diffusion-models.
Gemini 2 / Gemini Ultra technical reports — Why it matters: multimodal frontier; closes the gap from Gemini 1.5. Touches: google-deepmind, long-context.

ML Wiki

Explorer

Reading Queue

Format

Frontier model reports

Reasoning & RL

Long context & inference

Alignment

Architectures

Multimodal

Graph View

Table of Contents

Backlinks