What It Is

A neural network architecture split into two parts: an encoder that processes the input sequence into rich representations, and a decoder that generates the output sequence by attending to those representations one token at a time.

Why It Matters

Encoder-decoder architectures are the natural fit for sequence-to-sequence tasks — translation, summarization, question answering — where the input and output may differ in length and structure. The encoder can see the full input bidirectionally; the decoder generates autoregressively.

How It Works

The encoder runs self-attention over the full input sequence, producing a set of context-enriched token representations. The decoder runs causal self-attention over its own outputs so far, then cross-attends to the encoder’s output to decide what to generate next. This cross-attention mechanism is how the decoder “reads” the input while generating the output.

Key Sources