AI for Science — Landscape & Method Spectrum
How AI/ML is changing the practice of basic science — the method families, the flagship systems, and the line dividing "domain models" from general-purpose LLMs.
Sections: aifs-biology · aifs-chemistry · aifs-physics · aifs-mathematics · aifs-ml-self · aifs-operations-research
The top-level shift: from "AI assistance" to "Agentic Science"
Multiple 2025 surveys and position papers converge on one framing: AI for Science is moving from partial assistance toward full scientific agency — systems that handle hypothesis generation → experimental design → execution → analysis → iterative refinement as a closed loop [1]. A NeurIPS 2025 position paper frames the methodological debate as two camps [2]:
- Paradigm Enhancement — AI accelerates traditional scientific tasks. This is today's mainstream, and it is overwhelmingly domain-specific / task-specific models.
- Paradigm Transition — a general model fundamentally redefines how science is done. This is the frontier bet, and the evidence is still thin (a "foundation models make materials science a general-purpose, emergent AI system" claim failed adversarial verification in our 2026-06 review).
So the accurate framing is not "general vs specialized" but "domain foundation model vs task-specific model" — a single cross-science general model essentially does not yet exist.
The method spectrum (what's actually used)
| Layer | Dominant method | Representative systems |
|---|---|---|
| Solving physical equations | PINNs + Neural Operators (FNO / AFNO / DeepONet) | aifs-physics, NVIDIA PhysicsNeMo |
| Structure prediction & de-novo design | Generative models (diffusion, flow matching) + GNNs | alphafold3, MatterGen, RFdiffusion |
| Atomistic simulation | Machine-learned interatomic potentials (MLIP) | mattersim, MACE-MP-0 |
| Discovery loops | GNN + active learning + first-principles (DFT) verification | gnome |
| Reasoning / proof | LLM + formal verification (Lean) + search | aifs-mathematics |
Two flagships that define the playbook
- alphafold3 — a unified diffusion model jointly predicting protein / nucleic-acid / small-molecule / ion complexes, greatly outperforming classical docking [3]. Notably it removed AF2's equivariant structure module, replacing it with diffusion over raw atom coordinates — a data-scale win for the "big generative model" side of the equivariance debate.
- gnome — graph networks + active learning + DFT verification discovered
2.2M new crystals (380K high-stability candidates), lifting the materials stability discovery rate from ~50% to 80% on MatBench Discovery [4].
Why domain models, not general LLMs
The same math backbones (Transformer / diffusion / GNN) are reused, but the models differ from general LLMs in four structural ways:
- Tokens aren't text — amino acids, atoms, lattices, physical-field grid points.
- Symmetry is hardcoded — SE(3)/E(3)-equivariant networks bake in rotation/translation invariance, which general LLMs must learn from data.
- Objective isn't next-token — masked modeling (ESM), diffusion denoising (AlphaFold 3 / MatterGen), or physics-residual loss (PINNs, which can train with zero labelled data).
- Scale is 1–2 orders smaller — protein/materials foundation models are millions-to-billions of parameters, not the trillions of frontier LLMs; they trade scale for inductive bias.
The decisive factor in how scholars choose comes down to three questions: is there enough self-supervisable data to pretrain a foundation model (sequences yes, PDEs no); is the problem under hard physical constraints (favours specialized + physics priors); and is there a cheap verification loop (DFT, experiment) to trust large-scale generation.
How this section is organized
Each discipline page tracks the frontier methods, flagship systems, current benchmarks, and open problems:
- aifs-biology — protein structure & design, genomic/sequence foundation models (AlphaFold, ESM, Evo)
- aifs-chemistry — molecular generation, retrosynthesis, reaction prediction, MLIP for chemistry
- aifs-physics — PINNs, neural operators, simulation surrogates, weather/climate, fusion
- aifs-mathematics — theorem proving, autoformalization, AI-assisted conjecture (AlphaProof, AlphaGeometry, Lean)
- aifs-ml-self — AI for AI: autonomous AI scientists, neural architecture search, AI-driven ML research
- aifs-operations-research — ML for combinatorial optimization, learned solvers, OR + RL
Caveats
- SOTA is highly time-sensitive — leaderboard positions (MLIP, weather) churn every few months; treat any "best" as "best at publication."
- Source quality is layered — peer-reviewed results (AlphaFold 3, GNoME) are high-confidence; many "unifying framework / taxonomy" claims come from un-reviewed arXiv preprints and are one reasonable framing, not settled fact.
- The "general-purpose science foundation model" narrative is mostly aspiration as of mid-2026.
Sources
- [1] Agentic Science survey — arXiv:2508.14111 (2026-06-14)
- [2] NeurIPS 2025 position paper — OpenReview:PegEYWWXvx / arXiv:2510.15280 (2026-06-14)
- [3] AlphaFold 3 — Nature s41586-024-07487-w, Abramson et al. 2024 (2026-06-14)
- [4] GNoME — DeepMind blog + Nature s41586-023-06735-9 (2026-06-14)
- local: ai_for_science_landscape_2026-06-14.md — deep-research compilation (29 sources, 25 adversarially-verified claims)