What It Is
A neural network architecture split into two parts: an encoder that processes the input sequence into rich representations, and a decoder that generates the output sequence by attending to those representations one token at a time.
Why It Matters
Encoder-decoder architectures are the natural fit for sequence-to-sequence tasks — translation, summarization, question answering — where the input and output may differ in length and structure. The encoder can see the full input bidirectionally; the decoder generates autoregressively.
How It Works
The encoder runs self-attention over the full input sequence, producing a set of context-enriched token representations. The decoder runs causal self-attention over its own outputs so far, then cross-attends to the encoder’s output to decide what to generate next. This cross-attention mechanism is how the decoder “reads” the input while generating the output.
Key Sources
- attention-is-all-you-need
- bart-denoising-sequence-to-sequence-pre-training
- t5-exploring-the-limits-of-transfer-learning