What It Is

A residual connection (also called a skip connection) adds a layer’s input directly to its output: H(x) = F(x) + x. The layer learns only the correction F(x) rather than the full mapping.

Why It Matters

Without residual connections, stacking more layers makes networks harder — not easier — to train. Residual connections solve the degradation problem, making it practical to train networks with 100+ layers while keeping gradient flow healthy through backpropagation.

How It Works

Each block computes a small residual F(x) and adds it to the unchanged input via a shortcut path. If nothing needs to change, F(x) → 0 and identity passes through. During backprop, the gradient has a direct identity path backward, bypassing the vanishing gradient problem. The shortcut costs zero extra parameters in the default case.

Key Sources