AI for Biology — Protein, Genome & Cell

Biology is arguably the scientific domain where AI has moved fastest from demonstration to infrastructure. The reason is structural: life encodes itself in sequences — proteins as strings of amino acids, genomes as strings of nucleotides — and decades of high-throughput sequencing have produced billions of these strings that can be learned self-supervised, exactly the regime in which large neural models excel. This module surveys the AI/ML methods that define the 2025–2026 frontier: structure prediction, generative protein design, sequence/genomic foundation models, and the more contested single-cell models. It is deliberately about the methods and systems, not wet-lab biology per se. See ai-for-science for the cross-domain framing and aifs-chemistry for the closely-coupled molecular-chemistry side.

1. Protein structure prediction

The prediction of 3D protein structure from sequence was the field's breakthrough use case. AlphaFold 2 (2021) used an Evoformer that reasoned jointly over a multiple-sequence alignment (MSA) and a pairwise residue representation, with an SE(3)-aware "structure module" emitting coordinates.

AlphaFold 3 (2024) is a substantial architectural redesign. It replaces the hand-built structure module with a diffusion network operating directly on atom coordinates — conceptually similar to image diffusion, starting from a noise cloud of atoms and denoising over many steps to a final structure [1]. The decisive change is scope: AF3 predicts the joint structure of complexes spanning proteins, nucleic acids, small-molecule ligands, ions and modified residues in one unified framework. It reports far greater accuracy than specialized docking tools for protein–ligand interactions, higher accuracy than nucleic-acid-specific predictors for protein–nucleic-acid interactions, and improved antibody–antigen accuracy over AlphaFold-Multimer v2.3 [1].

ESMFold (Meta AI, 2022–2023) takes a different route: it folds from a single sequence, with no MSA or templates, by reading structure out of the protein language model ESM-2 (variants reported up to 15B parameters) [2]. It is reported to be roughly an order of magnitude faster than AlphaFold 2 at inference, which enabled the >600M-protein ESM Metagenomic Atlas; accuracy is competitive for sequences the language model models confidently, and weaker for the rest [2].

Current limits. These remain static structure predictors — conformational dynamics, allostery and folding pathways are out of scope [1]. Independent analyses report that AF3 can hallucinate order in intrinsically disordered regions, with one study reporting a sizable fraction of disorder-residue misalignment against DisProt [9]. Chirality and exotic stoichiometries remain weak spots, and accuracy degrades for sequences lacking evolutionary signal.

2. Protein design / de-novo

Where prediction reads structure from sequence, design does the inverse and the generative: invent a new protein for a target function.

RFdiffusion (Baker Lab) is a denoising diffusion model built on the RoseTTAFold backbone. It represents each residue as a rigid frame (a Cα coordinate plus an N–Cα–C orientation) and runs an SE(3)-equivariant diffusion so that outputs are invariant to global rotation/translation [3][4]. It generates protein backbones conditioned on motifs, symmetries or binding targets.
ProteinMPNN solves the complementary inverse-folding problem: given a backbone, design an amino-acid sequence that will fold into it. The canonical pipeline is RFdiffusion → ProteinMPNN: geometry engine, then sequence designer [4].
AlphaProteo (DeepMind, 2024) is a family of models targeting the de-novo binder problem directly. On a reported set of seven target proteins it achieved 3-to-300× better binding affinities and higher experimental success than prior methods, generating binders to targets including VEGF-A and a SARS-CoV-2 protein, often after a single round of medium-throughput screening [5].
Flow-matching designers are the emerging alternative to score-based diffusion: continuous-normalizing-flow training (flow matching) on the SE(3) manifold offers straighter probability paths and faster sampling for backbone generation, and is an active 2025–2026 research line [4].

The unifying idea is equivariant generative modeling on a geometric manifold: proteins live in 3D, so the generative process must respect the symmetries of 3D space rather than learning them from data.

3. Sequence & genomic foundation models

This is where "foundation model" is most literal — single self-supervised models pretrained on raw sequence and adapted to many tasks.

ESM-2 / ESM3. ESM-2 is a masked protein language model whose internal representations encode structure [2]. ESM3 (EvolutionaryScale, 2024) is a multimodal generative successor that reasons jointly over sequence, structure and function, trained (per the lab) on 2.78B proteins. In a widely cited demonstration it generated a novel fluorescent protein at ~58% identity to known GFPs — characterized as equivalent to simulating ~500M years of evolution [6].
Evo / Evo2 (Arc Institute + NVIDIA, 2025) are DNA foundation models built on the StripedHyena architecture (a convolution/attention hybrid for very long context). Evo 2 is reported at 7B and 40B parameters, trained on ~9.3 trillion DNA base pairs across >128,000 species, with context windows up to ~1 million nucleotides — enough to read whole microbial genomes or human chromosomes in one pass. It reports state-of-the-art zero-shot variant classification, e.g. on BRCA1 [7].
AlphaGenome (DeepMind, 2025) targets the regulatory genome: it takes up to ~1 Mb of DNA and predicts thousands of functional genomic tracks (expression, chromatin accessibility, histone marks, TF binding, contact maps, splicing) at up to single-base resolution. It is reported to match or exceed the best external models on 24 of 26 variant-effect evaluations and is the only assessed model that jointly predicts all modalities [8].
Nucleotide Transformer (InstaDeep, Nature Methods 2024) is a family of human/multi-species genomics LMs (reported 50M–2.5B parameters), with the multispecies 2.5B variant the strongest of its cohort across promoter and splicing tasks [9b]. A v3 line extends context toward 1 Mb.

Methods table

Model	Architecture	Primary task	Year
AlphaFold 2	Evoformer + structure module (MSA-based)	Single-chain structure	2021
AlphaFold 3	Pairformer + atom diffusion	Multi-molecule complex structure	2024
ESMFold / ESM-2	Protein LM → folding head (single-seq)	Fast structure prediction	2022–23
ESM3	Multimodal masked/generative LM	Seq+struct+function generation	2024
RFdiffusion	SE(3)-equivariant diffusion (RoseTTAFold)	Backbone generation	2023
ProteinMPNN	Message-passing GNN	Inverse folding (sequence design)	2022
AlphaProteo	Generative binder design	De-novo binders	2024
Evo 2	StripedHyena (long-context hybrid)	DNA/RNA/protein generation & prediction	2025
AlphaGenome	Unified DNA sequence model	Regulatory track + variant effect	2025
Nucleotide Transformer	Transformer (k-mer / single-base)	Genomic downstream tasks	2023–24
scGPT / Geneformer	Transformer over gene tokens	Single-cell representation	2023–24

Comparison — structure predictors

Property	AlphaFold 2	AlphaFold 3	ESMFold
Needs MSA?	Yes	Yes (retained)	No (single sequence)
Coordinate generator	Structure module	Diffusion (atom-level)	Folding head
Multi-molecule complexes	Limited (Multimer)	Native (protein/NA/ligand/ion)	Limited
Relative speed	Baseline	Slower (sampling)	~10× faster (reported) [2]
Key weakness	Static; MSA-dependent	Static; disorder hallucination [9]	Lower accuracy on hard seqs

4. Single-cell / omics foundation models — the honest debate

scGPT and Geneformer apply the LM recipe to single-cell transcriptomics, tokenizing genes/expression and pretraining on tens of millions of cells. The promise is a reusable "virtual cell" embedding for clustering, annotation and perturbation prediction.

The candid 2025 finding is that this promise is not yet established. Multiple benchmarks report that in zero-shot settings these models can be matched or beaten by far simpler baselines — highly-variable-gene selection, scVI, or Harmony batch correction — for tasks like batch integration and cell-type clustering [10]. Proposed explanations: masked-LM pretraining may not yield useful cell-level embeddings, or the models may not have truly learned the pretraining task; notably, larger pretraining data did not reliably help [10]. The takeaway is methodological humility — "foundation model" branding does not by itself beat well-tuned, task-specific baselines, and rigorous baselines are mandatory.

General vs specialized

Biology is the domain closest to having genuine foundation models, and the reason is data-shaped. Protein and DNA sequences are (a) enormous in volume, (b) naturally self-supervisable (mask a residue/base and predict it), and (c) carry deep evolutionary signal that correlates with structure and function. This is the same precondition that made language models work, transplanted into biology — which is why sequence models (ESM, Evo, NT) generalize across tasks far more convincingly than, say, materials or fluid-dynamics models do.

But "foundation" is uneven. Sequence space is conquered; the cell and the regulatory genome are not. AlphaGenome and Evo 2 push toward whole-genome context, yet integrating multi-omic, dynamic, spatial cell state into one model remains open — and the single-cell results above show that scale alone is not sufficient. The frontier is moving from one molecule toward systems.

Open problems

Dynamics, not snapshots. Structure predictors output static conformations; ensembles, allostery and folding kinetics are largely unsolved.
Disorder & hallucination. Intrinsically disordered regions are both biologically important and where models confidently hallucinate order [9].
Trustworthy generative biology. De-novo design success rates are rising but experimental validation remains the only ground truth; in-silico metrics can be gamed.
Genuine cell foundation models. Beating simple baselines robustly, and predicting perturbation response, is still open [10].
Multimodal integration. Unifying sequence, structure, function, regulation and cell state in one model — the "virtual cell" goal — is unrealized.
Evaluation rigor. The field needs leak-free, baseline-anchored benchmarks; optimistic zero-shot claims have repeatedly failed independent re-testing.

Sources

AlphaFold 3 — Accurate structure prediction of biomolecular interactions with AlphaFold 3, Nature 630:493–500 (2024). https://www.nature.com/articles/s41586-024-07487-w
ESMFold / ESM-2 — Evolutionary-scale prediction of atomic-level protein structure with a language model, Science (2023). https://www.science.org/doi/10.1126/science.ade2574
RFdiffusion / SE(3)-equivariant diffusion — background on frame-based equivariant protein diffusion. https://arxiv.org/pdf/2302.02277
RFdiffusion → ProteinMPNN pipeline & flow-matching design — review of generative protein design. https://www.sciencedirect.com/science/article/pii/S0959440X24000216
AlphaProteo — De novo design of high-affinity protein binders with AlphaProteo (2024). https://arxiv.org/abs/2409.08022
ESM3 — Simulating 500 million years of evolution with a language model, Science (2024). https://www.science.org/doi/10.1126/science.ads0018
Evo 2 — Arc Institute / NVIDIA DNA foundation model announcement (2025). https://arcinstitute.org/news/evo2
AlphaGenome — Advancing regulatory variant effect prediction with AlphaGenome, Nature (2025). https://www.nature.com/articles/s41586-025-10014-0
AF3 disorder hallucination — Hallucinations in AlphaFold3 for Intrinsically Disordered Proteins (2025). https://arxiv.org/pdf/2510.15939 9b. Nucleotide Transformer — Building and evaluating robust foundation models for human genomics, Nature Methods (2024). https://www.nature.com/articles/s41592-024-02523-z
Single-cell FM limits — Zero-shot evaluation reveals limitations of single-cell foundation models, Genome Biology (2025). https://genomebiology.biomedcentral.com/articles/10.1186/s13059-025-03574-x