Protein Structure Prediction

The Problem

Proteins fold into precise 3D shapes that determine their function. Determining those shapes experimentally (X-ray crystallography, cryo-EM) takes months per protein. There are ~200M known sequences but only ~100K experimentally solved structures. The 50-year challenge: predict the 3D structure from the amino acid sequence alone.

The Key Insight

Proteins evolve under structural constraint. If two residues are in contact in 3D space, mutations at one tend to be compensated by mutations at the other (correlated mutations across species). Mining these co-evolutionary signals from multiple sequence alignments (MSAs) across homologous proteins provides indirect 3D contact information — enough, with the right architecture, to reconstruct atomic-accuracy structures.

What’s Clever

AlphaFold2 cracked this by enforcing geometric consistency inside the network (triangle attention) rather than as post-processing, and by jointly refining sequence-level evolutionary representations and pairwise spatial representations through 48 Evoformer blocks.

Key Sources

alphafold-2-protein-structure-prediction — first computational method achieving atomic accuracy across CASP14 targets (0.96 Å median backbone RMSD vs 2.8 Å next best)

evoformer — the architecture introduced to solve protein structure prediction
attention — Evoformer’s row/column/triangle attention mechanisms
self-supervised-learning — evolutionary MSA signals from unlabeled sequence databases

ML Wiki

Explorer

Protein Structure Prediction

The Problem

The Key Insight

What’s Clever

Key Sources

Graph View

Table of Contents

Backlinks

ML Wiki

Explorer

Protein Structure Prediction

The Problem

The Key Insight

What’s Clever

Key Sources

Related Concepts

Graph View

Table of Contents

Backlinks