Residual Connections

What It Is

A residual connection (also called a skip connection) adds a layer’s input directly to its output: H(x) = F(x) + x. The layer learns only the correction F(x) rather than the full mapping.

Why It Matters

Without residual connections, stacking more layers makes networks harder — not easier — to train. Residual connections solve the degradation problem, making it practical to train networks with 100+ layers while keeping gradient flow healthy through backpropagation.

How It Works

Each block computes a small residual F(x) and adds it to the unchanged input via a shortcut path. If nothing needs to change, F(x) → 0 and identity passes through. During backprop, the gradient has a direct identity path backward, bypassing the vanishing gradient problem. The shortcut costs zero extra parameters in the default case.

Key Sources

deep-residual-learning-for-image-recognition
an-image-is-worth-16x16-words

vanishing-gradients
batch-normalization
transformer

ML Wiki

Explorer

Residual Connections

What It Is

Why It Matters

How It Works

Key Sources

Graph View

Table of Contents

Backlinks

ML Wiki

Explorer

Residual Connections

What It Is

Why It Matters

How It Works

Key Sources

Related Concepts

Graph View

Table of Contents

Backlinks