AI for Chemistry — Reactions, Molecules & Synthesis

How AI/ML is being applied to the core verbs of chemistry — predict a reaction, plan a synthesis, generate a molecule, compute an energy. A sub-module of ai-for-science; sibling to aifs-biology and aifs-physics.

Scope note: this module is strictly about the AI/ML angle. Wet-lab chemistry, mechanism, and instrumentation appear only where a model touches them. Energy/force models that overlap materials science are summarized here and cross-linked to mattersim.

The four problem families

Chemistry-for-AI clusters into four loosely-coupled problem families, each with its own data regime and its own flagship systems:

Reaction prediction — given reactants (and conditions), predict products. The clean sequence-to-sequence framing.
Retrosynthesis & synthesis planning — given a target molecule, search backward to purchasable building blocks.
Molecular generation / inverse design — given desired properties, generate candidate structures.
Machine-learned interatomic potentials (MLIPs) — given atomic coordinates, predict energy and forces fast enough to replace DFT in the loop.

A fifth, cross-cutting layer is LLMs / agents that orchestrate the other four plus literature, code, and robotics.

1. Reaction prediction

The canonical AI/ML framing treats a reaction as a string-to-string translation: reactant SMILES → product SMILES.

Molecular Transformer (Schwaller et al., ACS Central Science 2019) recast forward reaction prediction as neural machine translation over SMILES tokens, and crucially added uncertainty calibration so the model can express confidence in a predicted product [4]. It became the template for nearly all transformer-based reaction work that followed.
Successors and variants extend the same backbone: transfer-learned models specialize the Molecular Transformer for narrow domains (e.g. regio-/stereoselective carbohydrate reactions, enzymatic reactions), and 2025 work explores graph-prior augmented and dual-task (reaction + retrosynthesis) training to share representations across the forward/backward directions.
Graph-based reaction models (e.g. RXNGraphormer) operate on molecular graphs rather than strings; 2026 reusability reporting found they transfer reasonably to high-throughput experimentation (HTE) datasets but that the retrosynthetic direction is more sensitive to distribution shift than the forward direction.

The unifying lesson: forward prediction is the most "solved" of the four families because the task is well-posed and the USPTO patent corpus provides millions of examples — but that same corpus carries heavy bias (see Open Problems).

2. Retrosynthesis & synthesis planning

Synthesis planning is a search problem layered on a single-step model: a policy proposes disconnections, and a tree search (Monte Carlo Tree Search or A*-style) recurses until every leaf is a purchasable reagent.

System	Approach	Availability
AiZynthFinder (AstraZeneca)	MCTS + template-based neural policy; recurses to purchasable precursors [3]	Open source
IBM RXN for Chemistry	Transformer (Molecular Transformer lineage) reaction + retro, cloud service	Hosted / freemium
Synthia (formerly Chematica)	Hybrid: expert-coded reaction rules + ML scoring; strong on complex natural products	Commercial
ASKCOS (MIT)	Template-based + neural, integrated condition recommendation	Open source

AiZynthFinder is the de-facto open-source reference: an MCTS that recursively breaks a molecule down to purchasable precursors, guided by a neural policy trained on reaction templates. It is fast (single-target searches typically complete in well under a minute) and widely reproduced [3].
Synthia (descended from Hans Hopf / Bartosz Grzybowski's Chematica) leans on a large body of hand-encoded mechanistic rules plus ML scoring, and has produced routes to complex natural products that experienced chemists judged plausible — the strongest published evidence that CASP can rival human route design on hard targets.
Human-in-the-loop is a 2025 theme: rather than fully automating, recent work adds prompting interfaces that let a chemist steer multistep retrosynthesis (constrain reagents, avoid disconnections), echoing the human-in-the-loop-ai pattern.
A derived signal, the Retrosynthetic Accessibility (RAscore), is a cheap ML classifier that predicts whether AiZynthFinder would find a route — used as a synthesizability filter inside generative pipelines (family 3).

3. Molecular generation / inverse design

Inverse design flips the property-prediction arrow: instead of structure → property, generate structure conditioned on a target property.

Method family	Representative	Mechanism
VAE (junction-tree)	JT-VAE	Encodes molecular graphs via a junction-tree scaffold; decode + optimize in latent space
Diffusion	E(3)/SE(3)-equivariant 3D diffusion, graph diffusion (MG-DIFF), graph-diffusion-transformers	Denoise from noise to valid molecule; naturally supports 3D and multi-condition guidance
GFlowNets	GFlowNet molecular graph generation	Sample structures in proportion to a reward, favoring diverse high-reward modes rather than a single optimum [16]
Genetic / hybrid	JANUS (GA guided by a neural net)	Evolutionary search steered by a learned surrogate

JT-VAE remains the canonical graph-VAE baseline: it guarantees chemically valid decodings by building molecules from a vocabulary of valid subgraphs.
Diffusion models are now the most active line, especially 3D / equivariant variants that generate conformers directly and accept multi-conditional or text-guided prompts; 2025–2026 work pushes property-conditioned ("predictor-free") guidance.
GFlowNets are valued precisely where single-best optimization fails: drug/materials discovery wants a diverse batch of candidates, and GFlowNets sample proportionally to reward, with 2025 work on cheap-reward pretraining to bootstrap the policy [16].

The hard part is not generation but validity + synthesizability + property fidelity simultaneously — which is why generators are increasingly chained to RAscore filters (family 2) and MLIP property checks (family 4).

4. Machine-learned interatomic potentials (MLIPs) for chemistry

MLIPs predict potential energy (and, by autodifferentiation, forces) from atomic coordinates at a tiny fraction of DFT cost, enabling reaction-energetics and molecular-dynamics studies that were previously intractable. This family overlaps materials science heavily — see mattersim and aifs-physics.

Model	Architecture	Niche
ANI (incl. ANI-1xBB, 2025)	Behler–Parrinello-style descriptors	Organic molecules; ANI-1xBB targets reaction energetics, barrier heights, bond-dissociation (13M+ geometries) [9]
MACE	Higher-order equivariant message passing (ACE-inspired tensor contractions)	Accurate, widely-used general MLIP [5]
Allegro	Strictly local, no iterative message passing	Scales to large systems
OrbNet / OrbNet Denali	Features from semi-empirical orbitals	DFT-accuracy at semi-empirical cost for organic/bio chemistry
AIMNet2	Charge-aware neural potential	Neutral, charged, and elemental-organic species

Reactivity is the frontier: most MLIPs were trained near equilibrium, so capturing transition states and bond-breaking needs dedicated reactive datasets. ANI-1xBB (2025) is built for exactly this and reports improved barrier-height and bond-dissociation prediction over conventional ANI, generalizing to pericyclic and radical reactions [9]. A 2025 Chemical Reviews survey on reactive MLIPs is the reference overview [8].
Foundation-scale atomistic models arrived in 2025: Meta FAIR's Open Molecules 2025 (OMol25) — >100M DFT calculations at the ωB97M-V/def2-TZVPD level spanning small molecules, biomolecules, metal complexes, and electrolytes [6] — and the Universal Model for Atoms (UMA) trained across FAIR's combined molecular + materials data [7]. The Open Catalyst Project (OC20/OC22, Meta FAIR + CMU) remains the catalysis-focused antecedent, with EquiformerV2 a notable large model.

5. LLMs and agents for chemistry

The orchestration layer wraps the four families plus literature, code, and (sometimes) robots.

ChemCrow (Bran et al., Nature Machine Intelligence 2024) augments GPT-4 with 18 expert-designed tools (RXN, retrosynthesis, property lookups, web search, code). It autonomously planned and executed syntheses of an insect repellent and three organocatalysts and guided discovery of a novel chromophore [1] — the canonical "tool-using chemistry agent."
Coscientist (Boiko, MacKnight et al., 2023) is an LLM that designs, plans and controls robotic experiments: web search, doc retrieval, code execution, and hardware control. It optimized palladium-catalyzed cross-coupling reactions on real automated hardware [2].
Multi-agent successors (2025): hierarchical systems such as ChemAgents split work across role-specific agents (literature reader, experiment designer, computation performer, robot operator) under a task manager. Benchmark work (e.g. ChemToolAgent) studies when tools actually help an LLM versus when they add noise.

This connects directly to the "Agentic Science" thesis in ai-for-science: chemistry is one of the few domains where a closed hypothesize→execute→analyze loop has actually been run on physical hardware.

The A-Lab autonomous-synthesis controversy

The cautionary tale of the field. A-Lab (Ceder group, LBNL, with Google DeepMind) reported in Nature (Nov 2023) that an autonomous robotic lab synthesized 43 new inorganic materials in 17 days [10]. Independent chemists — led by Robert Palgrave (UCL) — raised "very serious problems," arguing the XRD-based phase identification was unreliable and that several claimed compounds either already exist in the Inorganic Crystal Structure Database or were not convincingly characterized as new [11]. Nature issued a correction, but critics maintain (as of early 2026 reporting) that core concerns about whether genuinely new materials were made remain unresolved [12].

The episode is the standing warning for this whole module: autonomy claims must clear a higher characterization bar than the headline metric. Cite the skeptical coverage alongside the original whenever invoking A-Lab.

Methods table (cross-family summary)

Family	Core ML method	Flagship	Data regime	Maturity
Reaction prediction	Seq2seq transformer / graph net	Molecular Transformer	USPTO patents (millions, biased)	High (forward)
Retrosynthesis	Search + template/neural policy	AiZynthFinder, Synthia, IBM RXN	Reaction templates from patents	Medium–High
Generation	VAE / diffusion / GFlowNet	JT-VAE, 3D diffusion, GFlowNet	Property-labeled molecule sets	Medium (validity gap)
MLIP	Equivariant GNN	MACE, ANI-1xBB, UMA, Allegro	DFT calculations (OMol25, OC20)	High (near-eq.), Medium (reactive)
Agents	Tool-using LLM	ChemCrow, Coscientist	Tools + literature + hardware	Early / demo-stage

General vs specialized: why chemistry trails biology

aifs-biology has an undisputed foundation-model flagship in the AlphaFold family — a single model that redefined a whole task. Chemistry has no equivalent, and the gap is structural, not a matter of effort:

Data fragmentation. Where biology converged on large, standardized resources (PDB, UniProt), chemistry's benchmarks are a patchwork. MoleculeNet is a suite of many small, specialized datasets (roughly thousands to tens of thousands of compounds each), not one unified ImageNet-scale corpus [15]. Models that win one sub-task rarely transfer.
Reaction-condition complexity. A reaction is not just reactant→product; yield depends on solvent, temperature, catalyst, concentration, time, and order of addition — variables that public datasets routinely omit. Most retrosynthesis benchmarks ignore conditions entirely, so a "correct" route may be unrunnable.
Reporting bias. The USPTO patent corpus over-represents a handful of robust, popular reaction types and almost never records failed reactions, so models learn what gets patented, not what's chemically true (see Open Problems) [13][14].
Multi-scale physics. Chemistry spans electronic structure (quantum) up to bulk thermodynamics; no single representation is natural across that range, unlike the sequence/structure duality biology gets from proteins.

The 2025 counter-trend is real but partial: SMILES foundation models (e.g. encoder-decoder models pretrained on ~91M PubChem SMILES; GP-MoLFormer), atomistic foundation models (OMol25 / UMA [6][7]), and tool-using agents all push toward generality [15]. But as of mid-2026 there is no single chemistry model that dominates across reaction prediction, retrosynthesis, generation, and energetics — the four families remain separate stacks.

Open problems

Condition prediction. Predicting whether and how well a route runs (solvent/catalyst/temperature/yield), not just that a disconnection is valid, is the biggest practical gap. Public data is sparse and HTE data is mostly proprietary.
Dataset bias & evaluation. USPTO over-represents common reactions and random train/test splits leak similar reactions (same patent/author), yielding over-optimistic top-k scores [13]. A 2025 "critical look at the USPTO benchmark" argues much measured progress is benchmark artifact [14]. Negative/failed-reaction data is almost entirely missing.
Reactive MLIPs. Equilibrium-trained potentials extrapolate poorly to transition states; building broad, reliable reactive datasets (à la ANI-1xBB) and validating barrier heights remains open [8][9].
Synthesizability of generated molecules. Generators still emit structures that are valid on paper but hard or impossible to make; tighter coupling of generation with retrosynthesis (RAscore-style filters) is improving but unsolved.
Autonomy verification. Post–A-Lab, the field needs standardized characterization and reproducibility protocols before accepting "the robot discovered X" claims [10][11][12].
Foundation-model fragmentation. Whether one model can unify the four families — or whether chemistry stays a federation of specialists — is the open strategic question [15].

Cross-links

ai-for-science — parent landscape; the "Agentic Science" framing.
aifs-biology — the foundation-model-mature sibling; useful contrast.
aifs-physics — shares MLIP / equivariant-GNN methods and the simulation-surrogate pattern.
mattersim — materials-side atomistic model; overlaps the MLIP family.
human-in-the-loop-ai — the prompting/steering interfaces emerging in retrosynthesis.

Sources

ChemCrow — "Augmenting large language models with chemistry tools," Nature Machine Intelligence, 2024. https://www.nature.com/articles/s42256-024-00832-8 (2026-06-14)
Coscientist — Boiko, MacKnight et al., "Autonomous chemical research with large language models." https://www.semanticscholar.org/paper/Autonomous-chemical-research-with-large-language-Boiko-MacKnight/6fe3779fe5f2e9402abdd08ad8db41a0f13a99eb (2026-06-14)
AiZynthFinder — Journal of Cheminformatics, 2020. https://link.springer.com/article/10.1186/s13321-020-00472-1 (2026-06-14)
Molecular Transformer — ACS Central Science, 2019. https://pubs.acs.org/doi/10.1021/acscentsci.9b00576 (2026-06-14)
MACE. https://github.com/ACEsuit/mace (2026-06-14)
OMol25 (Open Molecules 2025) Dataset, Evaluations, and Models, arXiv:2505.08762. https://arxiv.org/abs/2505.08762 (2026-06-14)
UMA — A Family of Universal Models for Atoms, arXiv:2506.23971. https://arxiv.org/pdf/2506.23971 (2026-06-14)
Reactive Machine Learning Interatomic Potentials for Chemistry and Materials Science, Chemical Reviews, 2025. https://pubs.acs.org/doi/10.1021/acs.chemrev.5c00728 (2026-06-14)
ANI-1xBB: An ANI-Based Reactive Potential for Small Organic Molecules, JCTC, 2025. https://pubs.acs.org/doi/full/10.1021/acs.jctc.5c00347 (2026-06-14)
A-Lab — "An autonomous laboratory for the accelerated synthesis of inorganic materials," Nature, 2023. https://www.nature.com/articles/s41586-023-06734-w (2026-06-14)
Chemistry World — "New analysis raises doubts over autonomous lab's materials discoveries." https://www.chemistryworld.com/news/new-analysis-raises-doubts-over-autonomous-labs-materials-discoveries/4018791.article (2026-06-14)
C&EN — "Nature robot chemist paper corrected, but some questions remain unanswered," 2026. https://cen.acs.org/research-integrity/Nature-robot-chemist-paper-corrected/104/web/2026/01 (2026-06-14)
An exploration of dataset bias in single-step retrosynthesis, ChemRxiv, 2025. https://chemrxiv.org/doi/pdf/10.26434/chemrxiv-2025-5fcj6 (2026-06-14)
A Critical Look at the USPTO Benchmark, EMNLP Findings 2025. https://aclanthology.org/2025.findings-emnlp.1242.pdf (2026-06-14)
A Perspective on Foundation Models in Chemistry. https://pmc.ncbi.nlm.nih.gov/articles/PMC12042027/ (2026-06-14)
Pretraining Generative Flow Networks with Inexpensive Rewards for Molecular Graph Generation, arXiv:2503.06337. https://arxiv.org/pdf/2503.06337 (2026-06-14)