What It Is
Emergent behaviors are capabilities that appear in large models but are absent in smaller ones — and cannot be predicted by extrapolating from smaller scale. Performance is flat (near-random) for many orders of magnitude of compute, then suddenly jumps at a threshold.
Why It Matters
It means scaling is not always smooth. Some capabilities you can’t buy incrementally — you either have them or you don’t. This makes capability prediction hard, and is central to AI safety debates about whether dangerous capabilities could appear suddenly at scale.
Examples
- 3-digit arithmetic: absent below ~13B params, sharp jump above
- Chain-of-thought effectiveness: hurts below ~68B, helps above
- Instruction following: hurts below ~8B when fine-tuned, helps above
- Multilingual translation: absent in small models, appears at scale
The Controversy
Some researchers argue emergence is a measurement artifact: if you use a finer-grained metric, the improvement is gradual. Others argue the phase transitions are real. The debate is unresolved.
Key Sources
- emergent-abilities-of-large-language-models — the paper that defined and catalogued emergence across model families
Related Concepts
- scaling-laws
- in-context-learning
- chain-of-thought
- grokking — another form of sudden phase transition: generalization appearing long after training loss converges