What It Is

A training paradigm where a model learns representations from unlabeled data by creating its own supervisory signal — predicting one part of the data from another, or recognizing that two views of the same data should agree.

Why It Matters

Labels are expensive; unlabeled data is abundant. Self-supervised learning lets models build rich, transferable representations from raw data at scale, often matching or exceeding supervised pretraining when fine-tuned on small labeled sets.

How It Works

The model is given a pretext task constructed from the data itself: reconstruct masked patches, predict the next token, or agree on two augmented views of the same image. The labels come from the data structure, not human annotation. After pretraining, the learned representations transfer to downstream tasks via fine-tuning or linear probing.

Key Sources