Stub — full ingest pending.

Raposo et al. (2024) introduce Mixture of Depths (MoD): a learned routing mechanism that decides, per token per layer, whether to process it through the full transformer block or skip it with a residual pass-through. The total compute budget is fixed, but it is dynamically allocated — harder tokens get more layers, easier tokens fewer. MoD models match dense baseline quality while using 12.5% fewer FLOPs per forward pass, with no inference overhead for batch processing.

Key claim: Token-level dynamic layer routing matches dense baseline quality at significantly lower average FLOPs, without changing model architecture or adding inference latency.