Variational Autoencoder (VAE)

What It Is

A generative model that learns to encode data into a continuous latent distribution $q (z ∣ x)$ and decode samples from that distribution back to data — training both encoder and decoder jointly with a reconstruction loss plus a KL divergence regularizer that keeps the latent space smooth and contiguous.

Why It Matters

VAEs are the compression backbone of Latent Diffusion Models. The VQ-VAE (vector-quantized variant) in Stable Diffusion compresses images to 4-8x smaller latent representations. Diffusion then operates in this compressed space, cutting training and inference cost dramatically while preserving perceptual quality.

How It Works

The encoder outputs a mean $μ$ and variance $σ^{2}$ rather than a single point: $q_{ϕ} (z ∣ x) = N (μ_{ϕ} (x), σ_{ϕ}^{2} (x))$ . The training objective is the Evidence Lower BOund (ELBO):

$L = E_{q_{ϕ} (z ∣ x)} [lo g p_{θ} (x ∣ z)] - D_{K L} (q_{ϕ} (z ∣ x) ∥ p (z))$

The reconstruction term pushes the decoder to faithfully reproduce inputs. The KL term regularizes the latent space toward a standard normal $N (0, I)$ , ensuring smooth interpolation and valid samples from random $z \sim N (0, I)$ .

Key Sources

latent-diffusion-models-high-resolution-image-synthesis

ML Wiki

Explorer

Variational Autoencoder (VAE)

What It Is

Why It Matters

How It Works

Key Sources

Graph View

Table of Contents

Backlinks

ML Wiki

Explorer

Variational Autoencoder (VAE)

What It Is

Why It Matters

How It Works

Key Sources

Related Concepts

Graph View

Table of Contents

Backlinks