Power Laws

What It Is

A power law is a relationship of the form $y = a \cdot x^{k}$ , where the exponent k is constant. On a log-log plot, a power law appears as a straight line with slope k. Power laws appear throughout physics, biology, economics, and — as Kaplan et al. showed — language model performance.

Why It Matters

Power laws are predictive across orders of magnitude. Once you fit the line at small scale, you can read off predicted values at large scale with high confidence. This is what makes scaling laws practically useful: small experiments become predictors of large-scale behavior.

How It Works

In the context of language model scaling:

$L (N) = (\frac{N _{c}}{N})^{α_{N}}$

The exponent $α_{N} \approx 0.076$ is the slope on the log-log plot. Doubling N multiplies L by $2^{- 0.076} \approx 0.949$ — a 5% reduction in loss per doubling. The trend holds from $1 0^{3}$ to $1 0^{9}$ parameters across the Kaplan et al. experiments.

The practical consequence: smooth, predictable improvement means you never plateau unexpectedly when scaling — until you do, at which point you have hit an emergent capability threshold that the power law was not measuring.

Key Sources

scaling-laws-for-neural-language-models — establishes power law relationships for language model loss across N, D, and C

ML Wiki

Explorer

Power Laws

What It Is

Why It Matters

How It Works

Key Sources

Graph View

Table of Contents

Backlinks

ML Wiki

Explorer

Power Laws

What It Is

Why It Matters

How It Works

Key Sources

Related Concepts

Graph View

Table of Contents

Backlinks