Sage Journals: Discover world-class research

Abstract

Despite a Cambrian explosion in therapeutic modalities, small-molecule drugs remain a prominent and advantageous medical intervention. The universe of synthesizable, drug-like small molecules is astronomical. Given this scale, efficiently narrowing in on therapeutic candidates that are potent, selective, and tolerable cannot occur by happenstance. Over the past several decades, computational tools have become commonplace among pharma companies seeking to discover new small-molecule drugs. For example, molecular mechanics force fields are used to power molecular dynamics simulations—an effective approach for virtually screening and optimizing candidate molecules. In parallel, data-driven methods such as machine learning have supercharged the field’s ability to design potentially bioactive compounds. Despite these advances, established computational methods still suffer from issues relating to throughput, accuracy, generalizability, or combinations thereof. We argue that a merger of these technologies is inevitable and desirable, allowing the strengths of each to address the weaknesses of the other. This fusion—in the form of neural network potentials (NNPs)—is an exciting frontier for small-molecule discovery and design. Ostensibly, NNPs enable a swift, accurate, and generalizable solution for researchers developing the next generation of small-molecule drugs.

Introduction

Small molecules are the dominant therapeutic modality.¹ They are the therapies of the past and are likely to be the therapies of the future. Defined as any compound with a low molecular weight (often <500 Daltons), small molecules can be designed for oral administration, penetrate cell membranes to reach intracellular proteins, and engage their targets through multiple mechanisms of action.^1,2 Critically, small-molecule drugs can be readily manufacturable and are frequently formulated as tablets that do not require cold chain storage or other complex logistics.

Biologics and other emerging modalities (e.g., gene therapies) have some proven advantages over small molecules—such as the ability to deliver or express functional copies of proteins sidelined by genetic mutations—but also disadvantages presenting significant new challenges.^3,4 For these reasons, small-molecule drugs will retain a central position in the future pharmacopoeia.

Most small-molecule drugs bind to proteins—often protein active sites. Binding can inhibit or activate a protein’s function, altering an ostensibly dysregulated gene pathway and potentially rescuing a patient’s phenotype. Some small molecules target protein allosteric sites—regulatory pockets on a protein’s surface.⁵ These so-called allosteric modulators are harder to develop, but are often more selective for their targets, leading to safer medicines.⁶

Transforming random small molecules into drugs cannot happen by serendipity. Scientists estimate there are roughly 10²⁴ synthesizable, drug-like compounds—that is equivalent to the number of stars in the known universe.^7–9 Given this astronomical search space, the pharmaceutical industry increasingly has turned to computer-aided drug design (CADD) over the past decades to assist with rapidly narrowing in on the complex objectives that define a drug discovery program.

Virtually every component of a traditional small-molecule discovery and development campaign benefits from CADD tools. As outlined in Figure 1, pharma companies can use an inexorably expanding array of computational methods across target validation, hit finding, and lead optimization. Molecular mechanics (MM) force fields (FFs) are a key technological foundation upon which many of these tools have been built.^10,11 As such, MM FFs have proven their utility through dozens, if not hundreds, of successful discovery campaigns.

FIG. 1.

A typical small-molecule discovery and development campaign.

Force Fields Are Critical Components of Widely Used CADD Tools

Chemistry underwent a quantitative revolution in the early 20th century. Although scientists understood the relationship between a molecule’s 3D structure and its physiochemical properties, they lacked a rigorous, mathematical bridge linking the two. MM seemed to satisfy this need in a manner within reach of contemporary computing hardware.

MM is rooted in the Born–Oppenheimer approximation, which exploits the separation of timescales between fast electron relaxation and slow nuclear motion to describe a molecule’s potential energy as a function of its atoms’ 3D nuclear coordinates.¹² The mathematical framework connecting potential energy and structure is called a force field (FF), which can be used to compute useful equilibrium properties relevant to chemistry and drug discovery.

Many FFs calculate a system’s potential energy by summing terms for bonded and nonbonded interactions, as shown in Figure 2. The former considers quantum chemical valence interactions (e.g., harmonic bond stretching and angle bending), while the latter may include Van der Waals forces and electrostatics, for example.¹³ These terms are expressed using a variety of functional forms—equations that yield the energy contribution(s) of a particular interaction using a set of learned parameters that must be fit to data such as quantum chemical calculations or physical property measurements.¹⁴ Until the early 1960s, chemists carried out these laborious calculations by hand.¹⁰

FIG. 2.

Common inter- and intramolecular forces contributing to a molecule’s potential energy.

In 1961, James Hendrickson used MM to compute the conformational dynamics of hydrocarbon rings, catalyzing a new era of computational chemistry.¹⁰ Hendrickson used an IBM 709, a mainframe capable of just 4,000 multiplication operations per second—that is roughly 10¹² fewer operations per second than modern graphics processing units (GPUs).^15,16 With Pandora’s box now opened, scientists raced to accelerate FFs using computational methods. The ensuing decade would see the rise of the first open-source, scientific FFs.

The Allinger laboratory published some of the first open FF software packages (e.g., MM2) optimized for small organic molecules in the 1970s.¹⁷ These made use of minicomputers such as the PDP-11, capable of a more impressive 700,000 operations per second.¹⁸ A decade later, the Karplus and Kollman laboratories released CHARMM and AMBER, FFs specializing in larger biomolecules such as proteins and nucleic acids.^19,20

Despite matching the fastest minicomputers of the era, these frameworks did not advance the underlying functional forms. In fact, CHARMM and AMBER used simplified functional forms compared with the earlier FF models, trading accuracy for speed and simplicity. GROMOS and OPLS, which shared these simplified functional forms, became popular in the 1990s for simulating compounds in aqueous environments.²¹

Schrödinger, a pioneer in molecular simulation and software development, has driven the commercialization of FFs. In the early 2000s, Schrödinger began offering DESMOND through its platform, a molecular dynamics (MD) simulation engine developed by D.E. Shaw Research, which supported Schrödinger’s commercial FF, OPLS3.^22–24 As of 2024, Schrödinger’s software products are used by all the top 20 established pharma companies for small-molecule drug discovery.²⁵ While not required for program success, Schrödinger’s ubiquity is suggestive of the broad utility of commercial FF packages. Its success has spawned numerous other modern FF efforts, such as the open-source Open Force Field Consortium supported by an industry consortium funding model.

Molecular Dynamics Simulations Are Moving Pictures

MD simulations breathe life into static representations of molecules.²⁶ During an MD simulation, a FF acts on the structure of a small molecule (a ligand) or a protein–ligand complex, causing it to bend, twist, wiggle, and contort through time. Pressing “Go” on an MD simulation generates a trajectory—a movie of how the complex explores the conformational landscape imposed by the molecules’ constituent atoms.²⁷

Provided there are sufficiently long trajectories, MD simulations will describe the equilibrium behavior of a system. This enables chemists to gain a quantitative understanding of the system’s microscopic conformational preferences and macroscopic equilibrium physical properties (e.g., thermodynamics). Comprehensively sampling the conformational landscape is crucial for accurately capturing the behavior of complex biological systems. This is because both energetically favorable (frequently sampled) and unfavorable (rarely visited) conformations can play significant roles.

A practical example is that of DDR1—a tyrosine kinase protein whose dysregulation has been attributed to myriad cancers. Kinases are one of the largest enzyme families in humans generally involved in cell signaling. Adenosine triphosphate (ATP)-competitive inhibitors are classified by the conformation of the activation loop containing the highly conserved Asp-Phe-Gly (DFG) motif upon ligand binding.²⁸ Type I inhibitors stabilize the DFG-in state (Asp facing inward), while type II inhibitors stabilize the DFG-out state (Asp facing bulk solvent). Understanding the full equilibrium behavior between these states of the wild-type DDR1 protein is a critical first step for a rational inhibitor design against mutant versions of DDR1, as shown in Figure 3.

FIG. 3.

The free energy landscape of a kinase protein (DDR1).

MM FFs and MD simulations are ubiquitous in industry owing to their usefulness throughout the small-molecule discovery process, from early hit finding through lead optimization. Common as they are, these techniques are replete with limitations that constrain their domain of applicability.

The central challenge is that the accuracy of a FF is generally inversely proportional to its speed.²⁹ Long timescale accurate simulations confer the most commercial value, but can be computationally intractable. Brief low-resolution, fast-but-inaccurate simulations are not as predictive, but can be executed at high-throughput. This trade-off governs how FFs are used in small-molecule drug discovery.

At the top of the discovery funnel, virtual screening of ultralarge chemical libraries involves checking many millions of candidate ligands against a rigid or semiflexible representation of a protein target.³⁰ Pruning this vast universe involves docking and scoring compounds, oftentimes using methods powered by FFs.³¹ The scale of the problem necessitates speed and throughput. Therefore, the FFs used during screening are lightweight—they drop terms, ignore protein dynamics, and make a plethora of assumptions.³² These shortcuts all contribute to the sobering fact that docking scores are not generally predictive of experimentally determined binding affinities, instead leading practitioners to focus on enrichment (biasing the selection of molecules toward hits more than random chance) and ultralarge library docking (where a small, top-scoring fraction of the docked library has a substantially higher hit rate).^33–35

Since screening shrinks the number of candidate molecules drastically, scientists apply more rigorous MD methods during lead optimization. Directly predicting the relative binding affinities (K_d) of candidate ligands is a prime MD use case at this stage.³⁶

Protein–ligand association kinetics (k_on) is often diffusion-limited and fast—once a small molecule bumps into its target, they snap together rapidly. However, it takes on the order of hours for some compounds to dissociate (k_off).³⁷ This means that a one-hour MD trajectory might observe a single dissociation event—if one is lucky. Because binding affinity is proportional to the ratio k_off/k_on, comparing the affinities of any two candidate small molecules with useful statistical precision might require simulating 10²–10³ binding/unbinding cycles. Even with modern hardware, this MD workload may take 10⁸–10⁹ years of compute time.

Sidestepping this Herculean computational task required the development of alchemical free energy methods.³⁸ Alchemical methods decompose the lengthy, continuous process of protein–ligand binding into a series of more easily computable alchemical intermediates (e.g., steric decoupling). As the name might suggest, alchemical intermediates are oftentimes not real—they are imaginary, interpolated snapshots of molecules as they transmutate between different molecules or environments, as shown in Figure 4.

FIG. 4.

Common free energy differences calculatable using alchemical methods.

By breaking the binding affinity prediction problem into discrete, solvable pieces, alchemical free energy calculations require orders of magnitude less effort while still retaining all entropic and enthalpic contributions to binding free energy.

Unlike docking techniques, alchemical methods can produce results that more closely match experimental data, although only in well-behaved instances.⁴⁰ Many contemporary methods perform more poorly for complexes with side-chain motion, multiple ligand binding modes, conformationally dynamic proteins, dynamic protonation states, bound metals, and more. Unfortunately, these obstacles are not uncommon in industry.⁴¹

In essence, binding affinity predictions fail for the following three main reasons: (1) the FF does a poor job modeling the physics of the system, (2) the simulation omits relevant chemical effects that modulate the chemical components in the system (e.g., protonation and tautomerization of the ligand or binding site residues), and (3) the simulation is not sampling all the relevant protein conformations—or combinations of all three.^42,43

Accurately predicting binding affinity is critically valuable because it helps chemists prioritize compounds to synthesize and test—saving time and money by reducing the number of design–make–test–analyze cycles required in a discovery program.

Generative Molecular Modeling Is an Efficient Frontier

Data-driven techniques such as machine learning (ML) are becoming increasingly transformative to the life sciences.⁴⁴ ML excels at learning complex, nonlinear functions [f(x)], provided there are sufficient training data from the distribution of (x) one wishes to predict on or generate from. Chemistry is replete with nonlinear mappings. For example, medicinal chemists seek to connect a molecule’s structure to its bioactivity and other physiochemical properties—which is not straightforward.

Property prediction and de novo generative molecular design both are means to bridge the structure–activity chasm. The latter aims to build a potent molecule from scratch. While ML techniques are commonplace across both regimes, generative de novo design has become the field’s aspirational North Star.

In the ideal scenario, one could express their design objectives in a target candidate profile and directly generate a small number of candidate molecules ready for preclinical development. While this goal may seem far off, it is vital to acknowledge the breathtaking progress the field has made in recent decades.

CADD tools for generative design date back to the early 2000s, before the advent of deep learning techniques. By the early 2010s, medicinal chemists used rules-based algorithms for enumerating chemical libraries and conducting reaction-driven design—enabling the synthesis of plausible, bioactive compounds for high-throughput screening.^45,46

Around the same time, researchers began leveraging early neural architectures (e.g., recurrent neural networks) to generate bioactive ligands in silico.⁴⁷ Many of these early models used text-like representations of molecules termed Simplified Molecular-Input Line-Entry System (SMILES).⁴⁸ ChEMBL is a popular, open-access repository of matched molecule–activity data containing experimental measurements for ∼2.4 million compounds, with compound identities expressed in machine-readable formats such as SMILES (Fig. 5).⁵⁰

FIG. 5.

SMILES notation for ciprofloxacin.

The current set of generative small-molecule ML models often incorporates structural data (e.g., co-crystal structures) in some manner. Many generative ML models are structure-implicit, meaning they are not trained on structural data.⁵¹ Instead, these models are exposed to protein structure during inference time. Several structure-implicit models leverage goal-directed optimization, a common strategy through which the output of a generative model is docked and scored using a conventional MM software package. Recently, the number of structure-explicit models trained on structure data has increased. Unfortunately, very few of these models have been prospectively validated in the physical world.⁵¹

A total of 50 years of systematic data curation powered by roughly $20 billion of investment created the Protein Data Bank (PDB).⁵² The PDB contains >200,000 experimentally determined crystal structures with roughly 10% of these containing bound ligands—an excellent resource for training ML models, although small compared with typical ML corpora.⁵³ Since 1994, the Critical Assessment of Structure Prediction (CASP) competition has honed metrics of structure prediction accuracy.⁵⁴ Abundant training data and consensus performance benchmarks laid the foundation for the breathtaking results generated by Deepmind’s AlphaFold2 (AF2) in 2021.⁵⁵

Early commenters (including the organizers of CASP) hailed AF2 as the solution to the century-old protein structure prediction problem.^56,57 Although not directly related to generative chemistry, AF2 fomented enormous excitement within the small-molecule discovery community—and the life sciences writ large. Would computational biologists be able to dock (or generate) molecules within digitally folded protein pockets, would this eliminate the need to obtain expensive crystal structures, and would structure become the best data representation for ML models within biology?

A few things seem clear in 2024. While AF2’s scientific and social impacts cannot be understated, AF2 was not a panacea for replacing crystal structures in small-molecule discovery programs.⁵⁸ Instead, AF2 can be considered a continuation of the trend exploiting structural data representations in biology already set in motion by the rise of geometric deep learning models—models that make use of the spatial relationships between parts of each training example.⁵⁹

Although AF2 was powered by one of the few highly curated data sets in the biological sciences (the PDB), more than data curation is needed to succeed in modeling biochemistry. To make the most out of relatively small data sets, ML models can leverage simplifying rules based on the laws of physics. For example, E(3) equivariant models understand that rotating and translating molecules in 3D space do not change their identity, properties, or conformation.⁶⁰ E(3) refers to the Euclidean group of translations, rotations, and reflections in 3D space. By understanding these physical invariances, models may make better guesses in situations where biochemical data are particularly scarce.⁶¹

Indeed, most current state-of-the-art models for protein–ligand complex structure prediction train on structure use equivariant networks. These include NeuralPLexer2, AlphaFold-latest, and RoseTTAFold All-Atom, among others, with AlphaFold3 being a notable departure.^62–65 While often orders of magnitude slower than docking, the potential for these methods to predict protein conformational changes necessary for ligand binding is tantalizing.⁶⁶

Within the traditional small-molecule discovery lexicon, generative ML models enumerate digital chemical libraries that are “virtually screened” using ML docking algorithms (e.g., DiffDock) or complex structure prediction models.⁶⁷ Recently, benchmark articles such as PoseBusters and PoseCheck have given upbeat critiques over ML’s current ability to generate physically viable ligands.^68,69 PoseCheck shows how several generative algorithms propose ligand poses with unfavorable energetics, steric clashes, missing key interactions, and high strain energy.

Regardless of the model architecture or training objective, what is certain is that data are the central deciding factor for model performance and generalizability. Unlike proteins, where sequence data are abundant and inexpensive thanks to the shrinking sequencing prices, experimental small-molecule data are scarce and expensive to generate.⁷⁰ Once filtered for sequence and structural similarity, the PDB contains only a few hundred unique protein–ligand complex structures, posing challenges for ML training.⁷¹ Real proteins are conformationally dynamic (with kinases being one of the few well-understood examples), a characteristic unrepresented in single static 3D crystal structures.⁷²

Directly predicting ligand binding affinity using deep learning methods also is rife with pitfalls. Binding affinity data sets such as ChEMBL and PDBbind contain significant interlaboratory variation, making simple pooling of all IC₅₀ and K _i data ill-suited for training, even when separated by measurement class.^73,74 For this reason, some ML-focused companies (e.g., Terray) have begun generating enormous batches of new affinity data in a consistent, controlled manner.⁷⁵

Altogether, the generative de novo design of small molecules is an exciting frontier. Data-driven methods such as ML are blossoming, opening new opportunities within drug discovery and development. Unfortunately, extant open-access data sets of small molecules are noisy, sparse, and often incompletely annotated, highlighting the need for internal data generation.^73,76,77 Physics-informed techniques and generative modeling are not diametrically opposed. In fact, emerging ML architectures are deeply connected with physical phenomena. Diffusion models, for example, are rooted in equilibrium statistical mechanics—specifically, Boltzmann distributions.^78,79

The present–future intersection of ML and physics beyond MM, one that could reshape what is possible in small-molecule discovery, is the neural network potential (NNP).⁸⁰

NNPs Combine Advantages from Both Worlds

MM-based methods for affinity prediction have a generalizability advantage over current data-driven approaches. As mentioned previously, however, current MM FFs have a somewhat constrained domain of applicability where their accuracy is diminished. Previously, there seemed to be two paths for improving FF generalizability and accuracy.

The first approach involves tacking on a combinatorial explosion of mathematical terms to existing MM equations to account for specific scenarios. Generally, these are Taylor or Fourier series terms that account for complex quantum chemical multibody valence interactions. Ostensibly, this avenue could expand MM methods’ domain of applicability where currently there are limitations. Early efforts such as MMFF and other “Class II” MM FFs attempted this, but ran into difficulties with increasingly burdensome parameterization due to a combinatorial explosion of coupling terms.^10,81

The second approach involves switching to quantum mechanical (QM) FFs to power MD simulations. QM frameworks are exquisitely accurate owing to their inclusion of chemical features such as electronic polarization, charge transfer, and orbital hybridization. However, QM is computationally intractable at the scales necessary to appropriately capture the statistical mechanics of protein–ligand binding in industry settings.⁸² An emerging third path forward is to use NNPs.

NNPs are ML models designed to directly compute the potential energy of a molecular system as a function of atomic coordinates.⁸⁰ NNPs learn to describe physical interactions between atoms, and implicitly, electrons. Unlike MM potentials that rely on lower order functional forms, NNPs use flexible (and fast) neural forms that can capture higher order multibody interactions with greater fidelity.⁸⁰

Typically, NNPs are trained on quantum chemical calculations (e.g., forces, energies) of clusters of molecules.⁸³ Similar to MM FFs, NNPs may be further fit to experimental physical property data (e.g., liquid densities, heats of vaporization, and other condensed-phase properties).⁸⁴ NNPs are differentiable similar to other neural networks, meaning that their energy predictions are continuous functions of atomic position and model parameters. Why is this so important?

Much of the complexity of MM simulation engines extends from the onerous and manual process of programming efficient routines for calculating position and parameter gradients—how changes in atomic coordinates or model settings influence the system’s potential energy. Owing to their automatic differentiability and just-in-time compilation, NNPs within ML frameworks (e.g., PyTorch, JAX, and TensorFlow) can precisely and efficiently calculate gradients—accelerating simulation without compromising accuracy.

Recall that MM simulations sometimes fail to predict binding affinity due to inaccurate FFs, omission of certain chemical components (e.g., protonation states), and poor conformational sampling. NNPs promise to improve on all these aspects:

Accuracy—NNPs are trained on substantially more accurate quantum chemical data and can learn much more expressive functions to model physical interactions, and indeed routinely show significantly reduced error in modeling conformational energetics compared with errors typically achieved with general MM FFs.^85–87 Small-molecule binding affinities computed using hybrid ML/MM models have been shown to provide significant improvements over MM.^87–89 Critically, one can fine-tune these models using experimental data on relevant chemical domains to improve performance for specific tasks.

Chemical Features—With traditional MM, changing protonation states or tautomerization requires detailed tracking of atomic data, making simulations sluggish and bookkeeping of valence terms highly complex.⁹¹ NNPs can implicitly deal with any changes in bonded structure, making key algorithms such as constant-pH simulations trivial.⁹²

Sampling—Many contemporary methods sample highly correlated molecular states (conformations) with MD, inefficiently discovering important but distinct conformations. There is enormous potential to enumerate ML-driven enhanced sampling strategies alongside NNPs within ML frameworks to much more efficiently explore relevant conformations.⁹³

Several concurrent technological advances are driving innovation within NNPs. These models benefit from the explosion of fit-for-purpose model architectures (e.g., ANI-1).⁹⁴ Combined with ML methods to reduce the cost for accurate QM calculations, cheap CPU compute ($0.01/core-hour) makes generating large QM training sets more feasible.⁹⁵ Moreover, pressure on GPU vendors to deliver better hardware as well as the porting over of efficient software algorithms from MM codes should galvanize rapid advances in NNP performance, rapidly bringing it within reach of MM for small systems.⁹⁶

For a given discovery program, an NNP that confers near-experimental predictive accuracy on binding affinity could have profound implications. As shown in Figure 6, models capable of predicting ligand binding affinities to chemical accuracy (∼0.5 kcal/mol) could accelerate lead optimization by a factor of 8× and greatly reduce the number of compounds required for synthesis.⁹⁷

FIG. 6.

Increasingly accurate molecular simulations allow medicinal chemists to more effectively prioritize compounds for lead optimization synthesis.

NNPs should coexist beautifully with de novo generative modeling techniques. For example, NNPs could serve as accurate simulators to provide synthetic training data for generative ML models. Alternatively, generative models that share the same fundamental architectures as NNPs could be trained on (1) the same QM data powering the NNP and (2) the data that generative models typically are trained on (e.g., crystal structures), opening the front end of the funnel to a broader universe of data.

There still are risks and unknowns associated with NNPs. As with any ML technique, overfitting is an immediate concern. The training corpus could omit relevant chemical features or be biased toward specific scenarios, creating generalizability issues. Specific models are well-known to struggle with common drug moieties such as sulfonamides, for example. Adequately mapping entire conformational landscapes during training is critical as well for NNP generalizability. There are several active learning and adversarial approaches to remedy this issue.^98,99 Finally, simulations are not always stable.¹⁰⁰ Improper simulations can explode—rapidly deteriorating into nonphysical scenarios.

Although there is more work ahead, NNPs are a powerful evolution of the CADD toolkit for small-molecule discovery efforts. By combining the inference speed and throughput of data-driven modeling and the physical constraints conferred by rich QM and experimental data, NNPs could become the best of both worlds.

What Will the Future Look Like?

The shift to NNPs should be rapid over the next several years. At first, costly yet accurate NNPs should begin replacing expensive QM methods for many applications in small-molecule drug discovery since they are more efficient and well-suited to GPUs.^85,101–103

We are already seeing the emergence of NNP/MM hybrid simulations, in which a part of the system (e.g., the ligand) is treated with more accurate NNPs—which appears to provide substantial accuracy boosts while only causing an ∼5× drag on simulation speed compared with the pure MM counterpart.^96,104

At the same time, we are nearing the limit of what GPU-accelerated hardware can deliver for small biomolecular atomistic MM systems. There are certainly still efficiency gains to be made for larger systems (e.g., large membrane proteins) on current-generation hardware. For small drug-like molecules, however, there is no reason to not utilize more accurate NNPs if there is no speed penalty over MM.

We have barely scratched the surface of what the clever melding of NNP MD simulations and generative ML algorithms could do. These technologies have not been practically intertwined for long enough. There will be enormous creativity brought to bear on this problem in the coming years. Researchers will continue improving NNP inference speed through improved algorithmic innovation that exploits the simplicity of NNP atomic potentials and the rapid iteration cycles commonly found with ML.¹⁰⁵

The future should follow the paradigm—simulate, emulate, generate. Accurate simulations will serve as the foundation. Fast ML methods will be trained to emulate accurate simulations at a greatly reduced cost. Finally, generative ML models will gain the ability to generate physically valid compounds that meet the complex design objectives required by the small-molecule discovery teams of tomorrow.

Footnotes

Acknowledgments

JDC thanks Justin Smith (NVIDIA), Gianni de Fabritiis (Universitat Pompeu Fabra), Marcus Wieder (Open Molecular Software Foundation), Antonia S. J. S. Mey (University of Edinburgh), Peter Eastman (Stanford University), Michael Shirts (University of Colorado at Boulder), and Davis Lm Mobley (University of California at Irvine) for helpful discussions.

SB thanks Eric Dai, Zavain Dar, Adam Goulburn, and Nan Li from Dimension as well as Henri Palacci (DESRES), Nathan Frey (Prescient Design), Sam Stanton (Prescient Design), Charlie Harris (University of Cambridge), and Eric Atkins for helpful conversations and inspiration.

Author Disclosure Statement

JDC is a current member of the Scientific Advisory Board of OpenEye Scientific Software, Redesign Science, Ventus Therapeutics, and Interline Therapeutics, and has equity interests in Redesign Science and Interline Therapeutics. The Chodera laboratory receives or has received funding from multiple sources, including the National Institutes of Health, the National Science Foundation, the Parker Institute for Cancer Immunotherapy, Relay Therapeutics, Entasis Therapeutics, Silicon Therapeutics, EMD Serono (Merck KGaA), AstraZeneca, Vir Biotechnology, Bayer, XtalPi, Interline Therapeutics, the Molecular Sciences Software Institute, the Starr Cancer Consortium, the Open Force Field Consortium, Cycle for Survival, a Louis V. Gerstner Young Investigator Award, and the Sloan Kettering Institute. A complete funding history for the Chodera lab can be found at http://choderalab.org/funding.

SB is a current member of Dimension, a venture capital firm that invests at the intersection of technology and the life sciences. Dimension has financial interests in companies developing small molecule therapeutics including Enveda Biosciences, Kimia Therapeutics, and Monte Rosa Therapeutics.

Funding Information

JDC acknowledges financial support from the Sloan Kettering Institute.

References

Lemurell

. A big future for small molecules: Targeting the undruggable. astrazeneca. 2022. Available from: https://www.astrazeneca.com/r-d/next-generation-therapeutics/small-molecule.html [Last accessed: March 3, 2024].

Lipinski

, Lombardo

, Dominy

, et al. Experimental and computational approaches to estimate solubility and permeability in drug discovery and development settings. Advanced Drug Delivery Reviews, 1997; 23(1–3):3–25; doi: 10.1016/S0169-409X(96)00423-1

Biologics vs. Small Molecule Drugs: Which Are Better? AscendiaPharma. 2021 Available from: https://ascendiapharma.com/newsroom/2021/10/27/biologics-vs-small-molecule-drugs [Last accessed: March 3, 2024].

Challenges In Gene Therapy. Learn Genetics. Available from: https://learn.genetics.utah.edu/content/genetherapy/challenges [Last accessed: March 3, 2024].

Huang

, Nussinov

, Zhang

. Computational tools for allosteric drug discovery: Site identification and focus library design. Methods Mol Biol, 2017; 1529:439–446; doi: 10.1007/978-1-4939-6637-0_23

Nussinov

, Tsai

. The different ways through which specificity works in orthosteric and allosteric drugs. Curr Pharm Des, 2012; 18(9):1311–1316; doi: 10.2174/138161212799436377

Reymond

, Awale

. Exploring chemical space for drug discovery using the chemical universe database. ACS Chem Neurosci, 2012; 3(9):649–657; doi: 10.1021/cn3000422

Benet

, Hosey

, Ursu

, et al. BDDCS, the Rule of 5 and drugability. Adv Drug Deliv Rev, 2016; 101:89–98; doi: 10.1016/j.addr.2016.05.007

Star Basics. science.nasa.gov. Available from: https://science.nasa.gov/universe/stars/ [Last accessed: March 4, 2024].

10.

Dauber-Osguthorpe

, Hagler

. Biomolecular force fields: Where have we been, where are we now, where do we need to go and how do we get there? J Comput Aided Mol Des, 2018; 33(2):133–203; doi: 10.1007/s10822-018-0111-4

11.

Sabe

, Ntombela

, Jhamba

, et al. Current trends in computer aided drug design and a highlight of drugs discovered via computational techniques: A review. Eur J Med Chem, 2021; 224:113705; doi: 10.1016/j.ejmech.2021.113705

12.

Hanson

, Harvey

, Sweeney

, et al. 10.1: The Born-Oppenheimer approximation. Chemistry LibreTexts. 2013 Available from: https://chem.libretexts.org/Bookshelves/Physical_and_Theoretical_Chemistry_Textbook_Maps/Book%3A_Quantum_States_of_Atoms_and_Molecules_(Zielinksi_et_al)/10%3A_Theories_of_Electronic_Molecular_Structure/10.01%3A_The_Born-Oppenheimer_Approximation [Last accessed: March 4, 2024].

13.

Smirnov

, McCarty

. 3.1: Potential energy surface and bonding interactions. Chemistry LibreTexts. 2022 Available from: https://chem.libretexts.org/Courses/Western_Washington_University/Biophysical_Chemistry_(Smirnov_and_McCarty)/03%3A_Molecular_Mechanics_and_Statistical_Thermodynamics/3.01%3A_Potential_Energy_Surface_and_Bonding_Interactions#:∼:text=Oscillations%20that%20are%20described%20by [Last accessed: March 4, 2024].

14.

Force Field (Chemistry). Chemeurope.com Available from: https://www.chemeurope.com/en/encyclopedia/Force_field_%28chemistry%29.html [Last accessed: March 4, 2024].

15.

Fisher

, McKie

, Mancke

. IBM and the U.S. Data Processing Industry: An Economic History. Google Books. Praeger; 1983.

16.

NVIDIA Ships World’s Most Advanced AI System—NVIDIA DGX A100 — to Fight COVID-19; Third-Generation DGX Packs Record 5 Petaflops of AI Performance. NVIDIA. 2020 Available from: https://nvidianews.nvidia.com/news/nvidia-ships-worlds-most-advanced-ai-system-nvidia-dgx-a100-to-fight-covid-19-third-generation-dgx-packs-record-5-petaflops-of-ai-performance [Last accessed: March 4, 2024].

17.

Structural Biology Software Database. www.ks.uiuc.edu. 2001 Available from: https://www.ks.uiuc.edu/Development/biosoftdb/biosoft.cgi?sortby=date&category=6 [Last accessed: March 4, 2024].

18.

Hudson

. A brief tour of the PDP-11, the most influential minicomputer of all time. Ars Technica. 2022 Available from: https://arstechnica.com/gadgets/2022/03/a-brief-tour-of-the-pdp-11-the-most-influential-minicomputer-of-all-time/ [Last accessed: March 4, 2024].

19.

Building Systems - CHARMM-GUI. ambermd.org. 2023 Available from: https://ambermd.org/tutorials/CHARMM-GUI.php [Last accessed: March 4, 2024].

20.

Amber Force Fields. ambermd.org. 2024 Available from: https://ambermd.org/AmberModels.php [Last accessed: March 4, 2024].

21.

Supplemental: Overview of the Common Force Fields—Practical considerations for Molecular Dynamics. computecanada.github.io. Available from: https://computecanada.github.io/molmodsim-md-theory-lesson-novice/S01-Force_Fields_Overview/index.html [Last accessed: March 4, 2024].

22.

Bowers

, Chow

, Xu

, et al. Scalable Algorithms for Molecular Dynamics Simulations on Commodity Clusters. In Proceedings of the 2006 ACM/IEEE Conference on Supercomputing. 2006. pp. 84–es. IEEE Xplore; doi: 10.1145/1188455.1188544

23.

D. E. Shaw Research: Who We Are. www.deshawresearch.com. 2023 Available from: https://www.deshawresearch.com/who-we-are.html [Last accessed: March 4, 2024].

24.

Harder

, Damm

, Maple

, et al. OPLS3: A force field providing broad coverage of drug-like small molecules and proteins. J Chem Theory Comput, 2015; 12(1):281–296; doi: 10.1021/acs.jctc.5b00864

25.

SEC Filings. newsite.schrodinger.com. 2024 Available from: https://ir.schrodinger.com/financials/sec-filings/default.aspx [Last accessed: March 4, 2024].

26.

Hollingsworth

, Dror

. Molecular dynamics simulation for all. Neuron, 2018; 99(6):1129–1143; doi: 10.1016/j.neuron.2018.08.011

27.

Likhachev

, Balabaev

, Galzitskaya

. Available instruments for analyzing molecular dynamics trajectories. Open Biochem J, 2016; 10(1):1–11; doi: 10.2174/1874091X01610010001

28.

Hanson

, Georghiou

, Thakur

, et al. What makes a kinase promiscuous for inhibitors? Cell Chem Biol, 2019; 26(3):390–399.e5; doi: 10.1016/j.chembiol.2018.11.005

29.

Childers

, Daggett

. Insights from molecular dynamics simulations for computational protein design. Mol Syst Des Eng, 2017; 2(1):9–33; doi: 10.1039/C6ME00083E

30.

Gimeno

, Ojeda-Montes

, Tomás-Hernández

, et al. The light and dark sides of virtual screening: What is there to know? Int J Mol Sci, 2019; 20(6); doi: 10.3390/ijms20061375

31.

Morris

, Lim-Wilby

. Molecular docking. Methods Mol Biol, 2008; 443:365–382; doi: 10.1007/978-1-59745-177-2_19

32.

Muegge

, Rarey

. 2001; 17:1–60; doi: 10.1002/0471224413.ch1 Small molecule docking and scoring. Reviews in Computational Chemistry

33.

Warren

, Andrews

, Capelli

, et al. A critical assessment of docking programs and scoring functions. J Med Chem, 2006; 49(20):5912–5931; doi: 10.1021/jm050362n

34.

McGaughey

, Sheridan

, Bayly

, et al. Comparison of topological, shape, and docking methods in virtual screening. J Chem Inf Model, 2007; 47(4):1504–1519; doi: 10.1021/ci700052x

35.

Lyu

, Wang

, Balius

, et al. Ultra-large library docking for discovering new chemotypes. Nature, 2019; 566(7743):224–229. Available from. https://www.nature.com/articles/s41586-019-0917-9

36.

Bernetti

, Masetti

, Rocchia

, et al. Kinetics of drug binding and residence time. Annu Rev Phys Chem, 2019; 70(1):143–171; doi: 10.1146/annurev-physchem-042018-052340

37.

Roskoski

. Classification of small molecule protein kinase inhibitors based upon the structures of their drug-enzyme complexes. Pharmacol Res, 2016; 103:26–48; doi: 10.1016/j.phrs.2015.10.021

38.

Chodera

, Mobley

, Shirts

, et al. Alchemical free energy methods for drug discovery: Progress and challenges. Curr Opin Struct Biol, 2011; 21(2):150–160; doi: 10.1016/j.sbi.2011.01.011

39.

Mey

ASJS

, Allen

, Bruce Macdonald

, et al. Best practices for alchemical free energy calculations [Article v1.0]. Living J Comput Mol Sci, 2020; 2(1); doi: 10.48550/arXiv.2008.03067

40.

Wang

, Wu

, Deng

, et al. Accurate and reliable prediction of relative ligand binding potency in prospective drug discovery by way of a modern free-energy calculation protocol and force field. J Am Chem Soc, 2015; 137(7):2695–2703; doi: 10.1021/ja512751q

41.

Schindler

CEM

, Baumann

, Blum

, et al. Large-scale assessment of binding free energy calculations in active drug discovery projects. J Chem Inf Model, 2020; 60(11):5457–5474; doi: 10.1021/acs.jcim.0c00900

42.

Charifson

, Walters

. Acidic and basic drugs in medicinal chemistry: A perspective. J Med Chem, 2014; 57(23):9701–9717; doi: 10.1021/jm501000a

43.

Martin

. Let’s not forget tautomers. J Comput Aided Mol Des, 2009; 23(10):693–704; doi: 10.1007/s10822-009-9303-2

44.

IBM. What is Machine Learning? 2023 Available from: https://www.ibm.com/topics/machine-learning [Last accessed: March 7, 2024].

45.

Ruddigkeit

, van Deursen

, Blum

, et al. Enumeration of 166 billion organic small molecules in the chemical universe database GDB-17. J Chem Inf Model, 2012; 52(11):2864–2875; doi: 10.1021/ci300415d

46.

Hartenfeller

, Zettl

, Walter

, et al. DOGS: Reaction-Driven de novo Design of Bioactive Compounds. PLoS Comput Biol, 2012; 8(2):e1002380–0; doi: 10.1371/journal.pcbi.1002380

47.

Gupta

, Müller

, Huisman

BJH

, et al. Generative recurrent networks for de novo drug design. Mol Inform, 2017; 37(1–2):1700111; doi: 10.1002/minf.201700111

48.

SMILES Tutorial | Research | US EPA. archive.epa.gov. 2016 Available from: https://archive.epa.gov/med/med_archive_03/web/html/smiles.html [Last accessed: March 7, 2024].

49.

Wikipedia Contributors. Simplified molecular-input line-entry system. Wikipedia. Wikimedia Foundation; 2019 Available from: https://en.wikipedia.org/wiki/Simplified_molecular-input_line-entry_system [Last accessed: March 9, 2024].

50.

ChEMBL Database. www.ebi.ac.uk. 2023 Available from: https://www.ebi.ac.uk/chembl/ [Last accessed: March 7, 2024].

51.

Thomas

, Bender

, de Graaf

. Integrating structure-based approaches in generative molecular design. Curr Opin Struct Biol, 2023; 79:102559; doi: 10.1016/j.sbi.2023.102559

52.

Berman

, Westbrook

, Feng

, et al. The protein data bank. Nucleic Acids Res, 2000; 28(1):235–242; doi: 10.1093/nar/28.1.235

53.

Sen

, Young

, Berrisford

, et al. Small molecule annotation for the Protein Data Bank. Database (Oxford), 2014; 2014(0):bau116–6; doi: 10.1093/database/bau116

54.

Home—Prediction Center. predictioncenter.org. Available from: https://predictioncenter.org/ [Last accessed: March 7, 2024].

55.

Jumper

, Evans

, Pritzel

, et al. Highly accurate protein structure prediction with AlphaFold. Nature, 2021; 596(7873):583–589; doi: 10.1038/s41586-021-03819-2

56.

Lewis

. One of the biggest problems in biology has finally been solved. Scientific American, 2022; doi: 10.1038/scientificamerican0223-28

57.

Zwanzig

, Szabo

, Bagchi

. Levinthal’s paradox. Proc Natl Acad Sci U S A, 1992; 89(1):20–22; doi: 10.1073/pnas.89.1.20

58.

Derek Lowe. Docking With AlphaFold Structures: Oops. 2023 Available from: https://www.science.org/content/blog-post/docking-alphafold-structures-oops [Last accessed: March 19, 2024].

59.

Michael

, Bronstein

, Bruna

, et al. Geometric deep learning: Grids, Groups. Graphs, Geodesics, and Gauges, 2021; doi: 10.48550/arXiv.2104.13478

60.

White

. Deep Learning for Molecules and Materials. Living J Comput Mol Sci July 2022; 3(1):1499; doi: 10.33011/livecoms.3.1

61.

Karniadakis

, Kevrekidis

, Lu

, et al. Physics-informed machine learning. Nat Rev Phys, 2021; 3(6):422–440; doi: 10.1038/s42254-021-00314-5

62.

Qiao

, Nie

, Vahdat

, et al. State-specific protein–ligand complex structure prediction with a multiscale deep generative model. Nat Mach Intell, 2024; 6(2):195–208; doi: 10.1038/s42256-024-00792-z

63.

Performance and structural coverage of the latest, in-development AlphaFold model. 2023. Available from: https://storage.googleapis.com/deepmind-media/DeepMind.com/Blog/a-glimpse-of-the-next-generation-of-alphafold/alphafold_latest_oct2023.pdf [Last accessed: March 19, 2024].

64.

Krishna

, Wang

, Ahern

, et al. Generalized biomolecular modeling and design with RoseTTAFold all-atom. Science, 2023; 384(6693):eadl2528; doi: 10.1101/2023.10.09.561603

65.

Abramson

, Adler

, Dunger

, et al. Accurate structure prediction of biomolecular interactions with AlphaFold 3. Nature, 2024. May 8; 1–3. doi: 10.1038/s41586-024-07487-w

66.

, Zhang

, Huang

, et al. DynamicBind: Predicting ligand-specific protein-ligand complex structure with a deep equivariant generative model. Nat Commun, 2024; 15(1):1071; doi: 10.1038/s41467-024-45461-2

67.

Corso

, Deng

, Fry

, et al. Deep confident steps to new pockets: Strategies for docking generalization. ArXiv, 2024; doi: 10.48550/arXiv.2402.18396

68.

Buttenschoen

, Morris

, Deane

. PoseBusters: AI-based docking methods fail to generate physically valid poses or generalise to novel sequences. Chem Sci, 2024; 15(9):3130–3139; doi: 10.48550/arXiv.2402.18396

69.

Harris

, Didi

, Jamasb

, et al. Benchmarking generated poses: How rational is structure-based drug design with generative models? Biomolecules, 2023; doi: 10.48550/arXiv.2308.07413

70.

Wetterstrand

. The cost of sequencing a human genome. National human genome research institute. 2019: Available from: https://www.genome.gov/about-genomics/fact-sheets/Sequencing-Human-Genome-cost [Last accessed: March 19, 2024].

71.

Gazizov

, Lian

, Goverde

, et al. AF2BIND: Predicting ligand-binding sites using the pair representation of AlphaFold2. bioRxiv; 2023:2023–2010; doi: 10.1101/2023.10.15.562410

72.

Modi

, Dunbrack

. Defining a new nomenclature for the structures of active and inactive kinases. Proc Natl Acad Sci U S A, 2019; 116(14):6818–6827; doi: 10.1073/pnas.1814279116

73.

Landrum

, Riniker

. Combining IC50 or K i values from different sources is a source of significant noise. J Chem Inf Model, 2024; 64(5):1560–1567; doi: 10.26434/chemrxiv-2024-2smhk

74.

Kalliokoski

, Kramer

, Vulpetti

, et al. Comparability of Mixed IC50 Data – A Statistical Analysis. PLoS One, 2013; 8(4):e61007; doi: 10.1371/journal.pone.0061007

75.

We experiment beyond limits to design smarter. Our integrated experimental platform is built for generative AI-driven drug discovery and lets us tackle unsolved problems. Terray, 2024 Available from: https://www.terraytx.com/platform [Last accessed: March 19, 2024].

76.

Irwin

BWJ

, Whitehead

, Rowland

, et al. Deep imputation on large‐scale drug discovery data. Applied AI Letters, 2021; 2(3); doi: 10.1002/ail2.31

77.

Papadatos

, Gaulton

, Hersey

, et al. Activity, assay and target data curation and quality in the ChEMBL database. J Comput Aided Mol Des, 2015; 29(9):885–896; doi: 10.1007/s10822-015-9860-5

78.

Ambrogioni

. The statistical thermodynamics of generative diffusion models. arXiv preprint arXiv:2310.17467. 2023; doi: 10.48550/arXiv.2310.17467

79.

Weideman

. 7.4: Boltzmann distribution. Chemistry LibreTexts Available from: https://phys.libretexts.org/Courses/University_of_California_Davis/UCD%3A_Physics_9HE_-_Modern_Physics/07%3A_Multiple_Particles/7.4%3A_Boltzmann_Distribution [Last accessed: March 19, 2024].

80.

Kocer

, Ko

, Behler

. Neural network potentials: A concise overview of methods. Annu Rev Phys Chem, 2022; 73:163–186; doi: 10.48550/arXiv.2107.03727

81.

Halgren

. Merck molecular force field. I. Basis, form, scope, parameterization, and performance of MMFF94. J Comput Chem, 1996; 17(5-6):490–519; doi: 10.1002/(SICI)1096-987X(199604)17:5/6%3C490::AID-JCC1%3E3.0.CO;2-P

82.

Zheng

, Zubatyuk

, Wu

, et al. Artificial intelligence-enhanced quantum chemical method with broad applicability. Nat Commun, 2021; 12(1):7022; doi: 10.1038/s41467-021-27340-2

83.

Tokita

, Behler

. How to train a neural network potential. J Chem Phys, 2023; 159(12); doi: 10.1063/5.0160326

84.

Wen

, Afshar

, Elliott

, et al. KLIFF: A framework to develop physics-based and machine learning interatomic potentials. Computer Physics Communications, 2022; 272:108218; doi: 10.1016/j.cpc.2021.108218

85.

Smith

, Roitberg

, Isayev

. Transforming computational drug discovery with machine learning and AI. ACS Med Chem Lett, 2018; 9(11):1065–1069; doi: 10.1021/acsmedchemlett.8b00437

86.

Anstine

, Zubatyuk

, Isayev

. AIMNet2: A Neural Network Potential to Meet your Neutral, Charged, Organic, and Elemental-Organic Needs. ChemRxiv, 2024; doi: 10.26434/chemrxiv-2023-296ch-v2

87.

Stevenson

, Jacobson

, Zhao

, et al. Schrodinger-ANI: An Eight-Element Neural Network Interaction Potential with Greatly Expanded Coverage of Druglike Chemical Space. arXiv.org. 2019; doi: 10.48550/arXiv.1912.05079

88.

Rufa

, Bruce MacDonald

, Fass

. Towards chemical accuracy for alchemical free energy calculations with hybrid physics-based machine learning/molecular mechanics potentials. bioRxiv, 2020; doi: 10.1101/2020.07.29.227959

89.

Sabanés Zariquiey

, Galvelis

, Gallicchio

, et al. Enhancing protein–ligand binding affinity predictions using neural network potentials. J Chem Inf Model, 2024; 64(5):1481–1485; doi: 10.1021/acs.jcim.3c02031

90.

Inizan

, Plé

, Adjoua

, et al. Scalable hybrid deep neural networks/polarizable potentials biomolecular simulations including long-range effects. Chem Sci, 2023; 14(20):5438–5452; doi: 10.1039/D2SC04815A

91.

Chen

, Roux

. Constant-pH hybrid nonequilibrium molecular dynamics–Monte Carlo Simulation Method. J Chem Theory Comput, 2015; 11(8):3919–3931; doi: 10.1021/acs.jctc.5b00261

92.

Weider

, Fass

, Chodera

. Teaching free energy calculations to learn from experimental data. bioRxiv, 2021; doi: 10.1101/2021.08.24.457513

93.

Mehdi

, Smith

, Herron

, et al. Enhanced sampling with machine learning. Annu Rev Phys Chem, 2024; 75; doi: 10.1146/annurev-physchem-083122-125941

94.

Smith

, Isayev

, Roitberg

. ANI-1: An extensible neural network potential with DFT accuracy at force field computational cost. Chem Sci, 2017; 8(4):3192–3203; doi: 10.1039/C6SC05720A

95.

Bauerdick

, Bockelman

, Dykstra

, et al. CMS Collaboration. Experience in using commercial clouds in CMS. J Phys: Conf Ser, 2017; 898(5); doi: 10.1088/1742-6596/898/5/052019052019

96.

Galvelis

, Varela-Rial

, Doerr

, et al. NNP/MM: Accelerating molecular dynamics simulations with machine learning potentials and molecular mechanics. J Chem Inf Model, 2023; 63(18):5701–5708; doi: 10.1021/acs.jcim.3c00773

97.

Shirts

, Mobley

, Brown

. Free-energy calculations in structure-based drug design. Drug design: structure-and ligand-based approaches; Cambridge University Press: 2010: pp. 61-86.

98.

Smith

, Nebgen

, Lubbers

, et al. Less is more: Sampling chemical space with active learning. J Chem Phys, 2018; 148(24):241733; doi: 10.1063/1.5023802

99.

Schwalbe-Koda

, Tan

, Gómez-Bombarelli

. Differentiable sampling of molecular geometries with uncertainty-based adversarial attacks. Nat Commun, 2021; 12(1):5104; doi: 10.1038/s41467-021-25342-8

100.

, Wu

, Wang

, et al. Forces are not enough: Benchmark and critical evaluation for machine learning force fields with molecular simulations. arXiv Preprint, 2022; doi: 10.48550/arXiv.2210.07237

101.

Horton

, Boothroyd

, Wagner

, et al. Open force field bespokefit: Automating bespoke torsion parametrization at scale. J Chem Inf Model, 2022; 62(22):5622–5633; doi: 10.1021/acs.jcim.2c01153Doi

102.

Galvelis

, Doerr

, Damas

, et al. A scalable molecular force field parameterization method based on density functional theory and quantum-level machine learning. J Chem Inf Model, 2019; 59(8):3485–3493; doi: 10.1021/acs.jcim.9b00439

103.

Liu

, Zubatiuk

, Roitberg

, et al. Auto3D: Automatic generation of the low-energy 3D structures with ANI neural network potentials. J Chem Inf Model, 2022; 62(22):5373–5382; doi: 10.1021/acs.jcim.2c00817

104.

Lahey

, Rowley

. Simulating protein–ligand binding with neural network potentials. Chem Sci, 2020; 11(9):2362–2368; doi: 10.1039/C9SC06017K

105.

Eastman

, Galvelis

, Peláez

, et al. Openmm 8: Molecular dynamics simulation with machine learning potentials. J Phys Chem B, 2023; 128(1):109–116; doi: 10.1021/acs.jpcb.3c06662

Neural Network Potentials for Enabling Advanced Small-Molecule Drug Discovery and Generative Design

Abstract

Introduction

Force Fields Are Critical Components of Widely Used CADD Tools

Molecular Dynamics Simulations Are Moving Pictures

Generative Molecular Modeling Is an Efficient Frontier

NNPs Combine Advantages from Both Worlds

What Will the Future Look Like?

Footnotes

Acknowledgments

Author Disclosure Statement

Funding Information

References