Ensemble Methods

What It Is

Ensemble methods combine predictions from multiple independently trained models. The simplest form averages their output probabilities. Ensembles almost always outperform any single constituent model because their errors are uncorrelated — where one model is wrong, others are often right.

Why It Matters

Ensembles set the performance ceiling that distillation tries to match in a single model. In practice, deploying 10 models costs 10× the inference compute — so compressing ensemble knowledge into a single student model is the primary motivation for knowledge distillation.

How It Works

Train N models with different random seeds, data orderings, or hyperparameters. At inference, average their logits (or probabilities). The averaging reduces variance from individual models while preserving the bias of the collective. Specialist ensembles (models trained on subsets of confusable classes) can further improve fine-grained accuracy.

Key Sources

distillation — knowledge distillation compresses ensemble knowledge into one model
transfer-learning — ensemble outputs can serve as rich transfer targets
uncertainty-estimation — ensemble disagreement is a primary signal for output uncertainty
self-consistency — self-consistency is within-model sampling; cross-model ensembles are stronger

ML Wiki

Explorer

Ensemble Methods

What It Is

Why It Matters

How It Works

Key Sources

Graph View

Table of Contents

Backlinks

ML Wiki

Explorer

Ensemble Methods

What It Is

Why It Matters

How It Works

Key Sources

Related Concepts

Graph View

Table of Contents

Backlinks