The Problem
Proteins fold into precise 3D shapes that determine their function. Determining those shapes experimentally (X-ray crystallography, cryo-EM) takes months per protein. There are ~200M known sequences but only ~100K experimentally solved structures. The 50-year challenge: predict the 3D structure from the amino acid sequence alone.
The Key Insight
Proteins evolve under structural constraint. If two residues are in contact in 3D space, mutations at one tend to be compensated by mutations at the other (correlated mutations across species). Mining these co-evolutionary signals from multiple sequence alignments (MSAs) across homologous proteins provides indirect 3D contact information — enough, with the right architecture, to reconstruct atomic-accuracy structures.
What’s Clever
AlphaFold2 cracked this by enforcing geometric consistency inside the network (triangle attention) rather than as post-processing, and by jointly refining sequence-level evolutionary representations and pairwise spatial representations through 48 Evoformer blocks.
Key Sources
- alphafold-2-protein-structure-prediction — first computational method achieving atomic accuracy across CASP14 targets (0.96 Å median backbone RMSD vs 2.8 Å next best)
Related Concepts
- evoformer — the architecture introduced to solve protein structure prediction
- attention — Evoformer’s row/column/triangle attention mechanisms
- self-supervised-learning — evolutionary MSA signals from unlabeled sequence databases