What It Is
A residual connection (also called a skip connection) adds a layer’s input directly to its output: H(x) = F(x) + x. The layer learns only the correction F(x) rather than the full mapping.
Why It Matters
Without residual connections, stacking more layers makes networks harder — not easier — to train. Residual connections solve the degradation problem, making it practical to train networks with 100+ layers while keeping gradient flow healthy through backpropagation.
How It Works
Each block computes a small residual F(x) and adds it to the unchanged input via a shortcut path. If nothing needs to change, F(x) → 0 and identity passes through. During backprop, the gradient has a direct identity path backward, bypassing the vanishing gradient problem. The shortcut costs zero extra parameters in the default case.