What It Is

A foundation model is a large model trained at scale on broad data that can be adapted to many downstream tasks — either via fine-tuning, prompting, or zero-shot transfer — without retraining from scratch for each task.

Why It Matters

Before foundation models, each task required its own dataset, training loop, and architecture. Foundation models collapse this: one model, trained once, serves as the backbone for dozens of applications. GPT/BERT are foundation models for text; CLIP and SAM are foundation models for vision.

How It Works

The key properties:

  1. Scale: large enough that general capabilities emerge from the training data itself
  2. Breadth: trained on diverse data so the learned representations transfer across domains
  3. Promptability: the interface allows steering the model toward a specific task at inference time, without weight updates

The “foundation” metaphor is architectural: many applications are built on top of one shared model, rather than each application building its own base.

Key Sources