Data Augmentation

What It Is

Artificially expanding a training dataset by applying label-preserving transformations to existing examples — crops, flips, color shifts, rotations, noise — to improve model generalization and, in self-supervised learning, to define what “same” means.

Why It Matters

Augmentation is the primary lever for controlling what invariances a model learns. In supervised learning it prevents overfitting. In contrastive self-supervised learning it defines the positive pairs — two augmented views of the same image — and determines which features the model must learn to ignore and which it must preserve.

How It Works

A transformation pipeline is applied randomly to each sample during training. The key insight from SimCLR: the composition of augmentations matters more than any individual transform. Random cropping combined with color distortion is the critical pair — cropping forces shape learning, color distortion removes the color-matching shortcut. Neither alone is sufficient.

ML Wiki

Explorer

Data Augmentation

What It Is

Why It Matters

How It Works

Key Sources

Graph View

Table of Contents

Backlinks

ML Wiki

Explorer

Data Augmentation

What It Is

Why It Matters

How It Works

Key Sources

Related Concepts

Graph View

Table of Contents

Backlinks