AI for Science — Landscape & Method Spectrum

How AI/ML is changing the practice of basic science — the method families, the flagship systems, and the line dividing "domain models" from general-purpose LLMs.

Sections: aifs-biology · aifs-chemistry · aifs-physics · aifs-mathematics · aifs-ml-self · aifs-operations-research

The top-level shift: from "AI assistance" to "Agentic Science"

Multiple 2025 surveys and position papers converge on one framing: AI for Science is moving from partial assistance toward full scientific agency — systems that handle hypothesis generation → experimental design → execution → analysis → iterative refinement as a closed loop [1]. A NeurIPS 2025 position paper frames the methodological debate as two camps [2]:

Paradigm Enhancement — AI accelerates traditional scientific tasks. This is today's mainstream, and it is overwhelmingly domain-specific / task-specific models.
Paradigm Transition — a general model fundamentally redefines how science is done. This is the frontier bet, and the evidence is still thin (a "foundation models make materials science a general-purpose, emergent AI system" claim failed adversarial verification in our 2026-06 review).

So the accurate framing is not "general vs specialized" but "domain foundation model vs task-specific model" — a single cross-science general model essentially does not yet exist.

The method spectrum (what's actually used)

Layer	Dominant method	Representative systems
Solving physical equations	PINNs + Neural Operators (FNO / AFNO / DeepONet)	aifs-physics, NVIDIA PhysicsNeMo
Structure prediction & de-novo design	Generative models (diffusion, flow matching) + GNNs	alphafold3, MatterGen, RFdiffusion
Atomistic simulation	Machine-learned interatomic potentials (MLIP)	mattersim, MACE-MP-0
Discovery loops	GNN + active learning + first-principles (DFT) verification	gnome
Reasoning / proof	LLM + formal verification (Lean) + search	aifs-mathematics

Two flagships that define the playbook

alphafold3 — a unified diffusion model jointly predicting protein / nucleic-acid / small-molecule / ion complexes, greatly outperforming classical docking [3]. Notably it removed AF2's equivariant structure module, replacing it with diffusion over raw atom coordinates — a data-scale win for the "big generative model" side of the equivariance debate.
gnome — graph networks + active learning + DFT verification discovered ~~2.2M new crystals (~~380K high-stability candidates), lifting the materials stability discovery rate from ~50% to 80% on MatBench Discovery [4].

Why domain models, not general LLMs

The same math backbones (Transformer / diffusion / GNN) are reused, but the models differ from general LLMs in four structural ways:

Tokens aren't text — amino acids, atoms, lattices, physical-field grid points.
Symmetry is hardcoded — SE(3)/E(3)-equivariant networks bake in rotation/translation invariance, which general LLMs must learn from data.
Objective isn't next-token — masked modeling (ESM), diffusion denoising (AlphaFold 3 / MatterGen), or physics-residual loss (PINNs, which can train with zero labelled data).
Scale is 1–2 orders smaller — protein/materials foundation models are millions-to-billions of parameters, not the trillions of frontier LLMs; they trade scale for inductive bias.

The decisive factor in how scholars choose comes down to three questions: is there enough self-supervisable data to pretrain a foundation model (sequences yes, PDEs no); is the problem under hard physical constraints (favours specialized + physics priors); and is there a cheap verification loop (DFT, experiment) to trust large-scale generation.

How this section is organized

Each discipline page tracks the frontier methods, flagship systems, current benchmarks, and open problems:

aifs-biology — protein structure & design, genomic/sequence foundation models (AlphaFold, ESM, Evo)
aifs-chemistry — molecular generation, retrosynthesis, reaction prediction, MLIP for chemistry
aifs-physics — PINNs, neural operators, simulation surrogates, weather/climate, fusion
aifs-mathematics — theorem proving, autoformalization, AI-assisted conjecture (AlphaProof, AlphaGeometry, Lean)
aifs-ml-self — AI for AI: autonomous AI scientists, neural architecture search, AI-driven ML research
aifs-operations-research — ML for combinatorial optimization, learned solvers, OR + RL

Caveats

SOTA is highly time-sensitive — leaderboard positions (MLIP, weather) churn every few months; treat any "best" as "best at publication."
Source quality is layered — peer-reviewed results (AlphaFold 3, GNoME) are high-confidence; many "unifying framework / taxonomy" claims come from un-reviewed arXiv preprints and are one reasonable framing, not settled fact.
The "general-purpose science foundation model" narrative is mostly aspiration as of mid-2026.

Sources

[1] Agentic Science survey — arXiv:2508.14111 (2026-06-14)
[2] NeurIPS 2025 position paper — OpenReview:PegEYWWXvx / arXiv:2510.15280 (2026-06-14)
[3] AlphaFold 3 — Nature s41586-024-07487-w, Abramson et al. 2024 (2026-06-14)
[4] GNoME — DeepMind blog + Nature s41586-023-06735-9 (2026-06-14)
local: ai_for_science_landscape_2026-06-14.md — deep-research compilation (29 sources, 25 adversarially-verified claims)