The pharmaceutical industry is familiar with the ‘hype cycle of technologies, artificial intelligence (AI) being the most recent. AI is best thought of as a nested set of capabilities: machine learning (ML; models that learn from legacy data), deep learning (ML models that mimic human brain processes), generative AI (use of ML to create original content) and the ultimate goal of artificial general intelligence (systems capable of conducting scientific research and discovering new knowledge).

Great claims are made for AI in drug discovery – a revolution is coming according to McKinsey [1]. There has been large amounts of backing for AI-based startups, with an estimated $4 billion invested between 2018 and 2022 in the leading 20 companies [2] and the size of the AI services market in drug discovery expected to reach almost $8 billion per annum by 2030 [3]. Recently, Xaira Therapeutics spun out of the University of Washington Baker lab with $1 billion in funding [4]. Given the lack of impact on pharma productivity from previous technology ‘game-changers’ [5], how much is based on real evidence and how much is wishful thinking? Exactly where and how will AI disrupt established practices in drug discovery? This perspective aims to shed some light on these questions and will hopefully convince you that there is already enough evidence that, this time, the journey along the technology hype curve will be different.

AI methods

Without descending into the nuances of deep learning network architectures and such like, it will be useful to introduce common ML terminology and utility. The comprehensive review by Yang et al. [6] is recommended for further reading.

‘Classical machine learning’ is a term generally applied to the collection of methods which pre-date ‘deep learning’. Supervised learning methods (i.e. those which are trained to predict a specific labelled end-point such as logP) include support vector machines, naïve Bayes and random forests. Unsupervised learning methods (i.e. where the data are unlabelled) include clustering, k-nearest neighbours, principal component analysis and self-organising maps. These methods are fed descriptors (e.g. chemical structure fingerprints [7]) and produce a mathematical model that relates the descriptors to the desired endpoint (supervised) or allows a data-driven representation of the molecules in the descriptor space (unsupervised).

Deep learning methods have been key to the emergence of modern AI. Deep learning typically refers to a learning system incorporating multiple layers of artificial neural networks. Such networks are very flexible learners and are able to model many types of data (e.g. medical images, face recognition, speech, music and, of course, molecular data) and highly complex, non-linear relationships. They are particularly powerful when given very large data sets, for example the 1.2 million images used by the breakthrough AlexNet image classification system [8].

A key departure from classical ML is the ability of deep learning models to learn the most effective representation of the data, rather than use fixed, human-engineered descriptors. Molecules can be represented as graphs or as SMILES strings [9] or proteins as sequences of their shorthand amino-acid letters, with their actual representation in the model refined by the model training process.

The flexibility of deep learning networks has enabled a large number of variants and types of learning:

  • Multitask learning allows learning of several, related endpoints (e.g. IC50 data from kinase panels) in parallel, making use of a shared representation which can be very useful where some endpoints have large data and others small data.

  • Transfer learning enables fine-tuning of a model which has been pre-trained on a large corpus of data, by further training on a much smaller but focussed data set.

  • Reinforcement learning is a reward-driven learning strategy that enables the optimisation of a model to predict an outcome without a priori knowing how to optimise; it is often used in combination with other computational models which may impose penalties (e.g. developability models) or rewards (e.g. fit to a pharmacophore model).

  • Contrastive learning is a semi-supervised learning method which attempts to learn a latent space where similar data are close and dissimilar data are far apart. It is particularly useful at integrating disparate data types such as image and chemical structure data [10].

  • Diffusion models [11] are a class of supervised learning methods gaining in popularity. Noise is incrementally added to the training data (e.g. a set of images) until the new data set is a Gaussian distribution. A deep learning model is then trained to be able to follow the reverse process (i.e. start with noise and recreate the input image). This creates a model that is a very efficient generative tool for images and, increasingly, molecular design [12,13]

  • Recurrent neural networks models (RNN) have proved a powerful tool for generative chemistry [14]. Designed for modelling time-series and sequence data and particularly useful for language, translation and speech models, RNNs (particularly the long short-term memory variant [15]) use the context of a word or character to modify the prediction for what the next word or character will be.

  • Large language models (LLMs) [16], a hot topic due to the extraordinary impact of OpenAI’s Chat GPT and similar models. These are extremely large models, trained by extremely large data sets. In contrast with RNNs, LLMs use a transformer architecture that enables self-learning plus parallelisation of training. LLMs are able to understand entire conversations and context. Interestingly, the feature of LLMs that often irritates everyday use (hallucination) is the one that is most useful for molecule design – producing plausible SMILES or protein sequences that have never been reported.

One other learning technique should be mentioned. Active learning is an optimisation method that uses model uncertainty to guide the next data acquisition, either from an existing data set or from the next experiment in a design–make–test cycle. Generally, active learning approaches will seek to suggest data that will improve the model (‘Explore’), until the model has reached a point where it can confidently predict (‘Exploit’).

Application of AI in small molecule discovery

ML in chemistry is not new. In fact, chemistry has its own name for statistical models: quantitative structure activity relationship (QSAR) models. Initially, these were linear regression models, the first being published in the 19th century(!) by Overton [17] and Meyer [18]. These ideas were famously developed by Hansch & Fujita [19]. QSAR has continued to evolve as new methods were invented [20], the chemistry community popularising the multivariate technique of partial least squares [21]. QSAR modellers were early adopters of neural networks [22], kernel ML methods [23], random forests [24], active learning [25], automated design [26], AI-based design processes [27], Pareto-based multi objective designs [28,29] and automated QSAR modelling/MLOps [30,31]. QSAR models have been used in the design of marketed drugs [32] and are established tools in a regulatory setting for risk assessments of organic compounds [33].

If ML is not new to drug design, why then the current, excited, interest and what has enabled it? The growth in computing power (an iPhone 12 is 5000 x faster than the Cray-2, the world’s fastest supercomputer from 1985!), and almost commodity pricing of very large memory and storage has enabled computational scientists to employ methods that were hitherto either impractical or infeasible. On a practical level, great computational power has also accelerated the speed with which researchers develop new solutions, reducing the iteration time for each cycle of testing. Here is a non-exhaustive list of the most interesting developments (note: not all are AI applications):

  • Large-scale, big data cheminformatics, such as matched molecular pairs [34,35] and series [36]

  • Large-scale ML hyperparameter optimisation to yield optimal models [37]

  • Large enumerations of readily accessible chemistry space [38]

  • Free energy perturbation (FEP, invented in 1987 and now usable!) [39]

  • Deep learning QSAR models for large datasets [40]

  • Deep learning-based generative chemistry [41]

  • ML-based forcefields with DFT levels of accuracy [42]

  • Accurate protein structure prediction with Alphafold & RosettaFold [43,44]

  • Multi-task [45] and multi-modal [46] modelling of complex data sets

  • Small [47] or large language models trained on chemistry [48] and protein sequence [49]

  • Transfer learning from pre-trained/foundational models for small data sets [50,51]

  • Federated modelling for safe data sharing between companies [52,53]

This is an impressive list of capabilities, but do they work in the real world? In short, it appears so. In their review of generative chemistry, Du et al. [41] cite no fewer than 37 published examples of laboratory validated small molecule design using generative chemistry methods.

The first published example of generative chemistry design is that of Insilico Medicine’s DDR-1 inhibitor [54], designed, synthesised and tested in 21 days. This was a controversial example, being extremely close to a known marketed drug Ponatinib (Figure 1a) and subject to a ‘well any chemist would have done that’ response. A more charitable view needs to be taken – these new design paradigms must be able to do the ordinary as well as – hopefully – the extraordinary. A more novel DDR-1 inhibitor was discovered by Yoshimori et al. [55] (Figure 1a) by coupling a generative chemistry model with a traditional pharmacophore approach. More ambitious was the coupling of an automated design system with an automated on-chip chemical synthesis platform to generate novel LXRa agonists (Figure 1b) [56]. More recently, a collaboration between Pfizer and PostEra reported the ML-driven discovery of a series of potent, selective and orally available SARS-CoV-2 PLpro inhibitors, with the lead compound (active in a mouse model) identified in less than eight months [57] (Figure 1c).

Chemical structures of compounds designed using generative chemistry methods.

Figure 1:
Chemical structures of compounds designed using generative chemistry methods.

(a) DDR1 inhibitors: the marketed drug Ponatinib and those designed from reference [54] and reference [55].(b) An LXR ligand design in reference [56]. (c) The starting (GRL0617) point and optimised (PF-07957472) compound designed in reference [57].

Figure 1:
Chemical structures of compounds designed using generative chemistry methods.

(a) DDR1 inhibitors: the marketed drug Ponatinib and those designed from reference [54] and reference [55].(b) An LXR ligand design in reference [56]. (c) The starting (GRL0617) point and optimised (PF-07957472) compound designed in reference [57].

Close modal

There are other validated computational protocols for automated design that use more traditional computational chemistry and cheminformatics. The first published example of modern automated design was provided by Besnard et al. [58], whereby novel compounds were generated using cheminformatics methods and scored with QSAR models which were combined to drive multi-objective optimisation. Using this approach, CNS-penetrant, selective dopamine D2 inverse agonists and compounds fitting a polypharmacological profile were designed. Schrödinger has pioneered large-scale cheminformatics and free energy simulation to drive lead optimisation. The discovery of the Malt-1 inhibitor SGR-1505 [59] used a computational pipeline involving the generation of 8 billion compounds through reaction-based enumeration, an Active Learning FEP protocol to generate a machine model that could triage large numbers of compounds before committing to full free energy simulation, followed by multiparameter optimisation using ML QSAR models. By using this intense computational process, the project needed only 10 months and 78 compounds synthesised to optimise to a clinical candidate [60].

ML has been applied to hit identification or virtual screening. The size of available ‘make to order’ libraries is becoming extremely large – over 1012 compounds and growing – and searching them with traditional methods (pharmacophore searching, docking) is accordingly expensive. Klarich et al. [61] utilised an active learning approach called Thompson sampling to make the search process more efficient, needing to evaluate only 1% of the virtual library to find >50% of the known hits. The approach can be coupled with any type of screening method; they demonstrate 3D shape searching and docking. An alternative solution to this problem is the NGT (NeuralGenThesis) methods of Oliveira et al. [62]. NGT uses deep learning to project a 3 trillion compound vendor catalogue into a ‘latent space’ which has an associated decoder to regenerate chemical structures. The virtual screen can then iteratively sample promising compounds from the latent space, generate the structures via the decoder, and score them using, in this case, docking to a crystal structure of the activated receptor, an AlphaFold model and a homology model. The example given describes the identification of potent and selective inhibitors of the melanocortin-2 receptor.

More ambitious than searching in pre-defined chemical libraries is the de novo generation of hit molecules. Thomas et al. [63] utilised an LLM pre-trained on ChEMBL [64] with the goal of generating novel chemical structures with a low-energy docking score for seven known A2A protein crystal structures, alongside a variety of developability metrics such as logP, hydrogen bond donors and rotatable bonds. After extensive filtering, nine compounds were synthesised, yielding three nanomolar ligands with confirmed functional activity, two of which are novel chemotypes.

An emerging hit discovery strategy is to apply ML to screening data from DNA-encoded libraries and use the resulting model to predict activity in databases of commercially available compounds, thus saving the resource cost of off-DNA resynthesis. An example of this is the discovery of a low micromolar, first-in-class ligand for WDR91 [65], testing only 150 commercial compounds.

Biologics

Protein design is a younger discipline than its small molecule cousin [66]. It has its origins in protein engineering, where known proteins are mutated to gain information, to optimise a function, or repurpose the protein for another function. In this use case, the protein structure fold, stability and dynamics tend to be retained. This is not a trivial pursuit, demonstrated by the award of a Nobel Prize in 2018 [67]. In the last two decades, however, protein design has made extraordinary progress utilising both ‘physics-based’ structural modelling and of course Machine Learning [68], culminating in the award of its own Nobel Prize in 2024 [69]. AlphaFold [70], RosettaFold [44] and the evolutionary-scale LLM (ESM) family [71] are leading examples of these impressive new capabilities that are set to affect the design of enzymes, antibodies, vaccines, nanomachines and more [68]. These methods are built on the billions of publicly available sequences which sample diverse protein families and encode evolutionary constraints on the sequence–structure relationship. This is supplemented by >200,000 protein structures in the PDB [72].

AlphaFold successfully bridged the disciplines of bioinformatics, structural biology and ML by using multiple sequence alignments (MSA), patterns of conformations/interactions observed in protein crystal structures, and a deep learning architecture adopted from natural language processing [73]. AlphaFold3 was trained to predict not only protein structures but also biomolecular complexes of proteins, nucleic acids and their ligands (Figure 2). AlphaFold3 has an updated learning architecture to reduce dependency on the MSA and has introduced a diffusion model that creates the atomic co-ordinates of the models.

Example structures predicted using AF3.

Figure 2:
Example structures predicted using AF3.

(a) Bacterial CRP/FNR family transcriptional regulator protein bound to DNA and cGMP (PDB 7PZB). (b) Human coronavirus OC43 spike protein, 4665 residues, heavily glycosylated and bound by neutralising antibodies (PDB 7PNM). (c) AF3 performance on PoseBusters (v.1, August 2023 release), a recent PDB evaluation set and CASP15 RNA.(d) AF3 architecture for inference. The rectangles represent processing modules and the arrows show the data flow. Yellow, input data; blue, abstract network activations; green, output data. The coloured balls represent physical atom co-ordinates. Reproduced from reference (70) under the Creative Commons Attribution 4.0 International License (https://creativecommons.org/licenses/by/4.0/)

Figure 2:
Example structures predicted using AF3.

(a) Bacterial CRP/FNR family transcriptional regulator protein bound to DNA and cGMP (PDB 7PZB). (b) Human coronavirus OC43 spike protein, 4665 residues, heavily glycosylated and bound by neutralising antibodies (PDB 7PNM). (c) AF3 performance on PoseBusters (v.1, August 2023 release), a recent PDB evaluation set and CASP15 RNA.(d) AF3 architecture for inference. The rectangles represent processing modules and the arrows show the data flow. Yellow, input data; blue, abstract network activations; green, output data. The coloured balls represent physical atom co-ordinates. Reproduced from reference (70) under the Creative Commons Attribution 4.0 International License (https://creativecommons.org/licenses/by/4.0/)

Close modal

RosettaFold builds on its protein-modelling heritage, utilising a residue-based presentation of amino acids and DNA bases, 1D sequences, 2D pairwise distance information from homologous proteins and 3D co-ordinate information as input to a deep learning architecture. The RoseTTAFold Diffusion method (RFDiffusion) [74] utilises a Diffusion Model to create the final atomic model.

The ESM family of models starts from a completely different area of ML – that of LLMs. ESM-2 is trained using over 65 million unique sequences, using a technique known as masked language modelling, whereby sequences in the training set have (in this case) a random 15% of amino acids ‘masked’, and the model is trained to predict them correctly. This strategy removes the need for sequence alignments. The sequence model is then passed to a folding model which benefits from a low-resolution picture of protein structure (such as residue-residue contact probabilities) that has been learnt by the LLM.

Successful applications of state-of-the-art protein design tools are impressive. The AlphaProteo design system [75] (based on AlphaFold) designed novel protein binders for eight diverse target proteins. Binders were experimentally verified for seven proteins, with affinities ranging from 80 pico-molar to low nano-molar. Two were tested for biological function, demonstrating inhibition of VEGF signalling in human cells and SARS-CoV-2 neutralisation in Vero monkey cells. Designed binder and binder-target complex structures were confirmed with X-ray crystallography and Cryo-EM.

RFDiffusion was able to design de novo protein binders for four protein targets: Influenza Haemagglutinin A, IL-7 Receptor-ɑ, PD-L1 and TrkA receptor with Kd of 28 nM, 30 nM, 1.4 mM and 328 nM, respectively. In the same paper, de novo proteins with mixed alpha-beta topologies are designed, characterised with circular dichroism and their thermostability validated. Symmetric oligomers with unprecedented structures were designed, as were novel proteins designed to ‘scaffold’ known binding sites (e.g. the scaffolding of the p53 helix that binds MDM2) and enzyme active sites (e.g. a retroaldolase active site triad TYR1051-LYS1083-TYR1180).

The ESM LLM was used to affinity mature seven human immunoglobulin G (IgG) antibodies that bind to antigens from coronavirus, ebolavirus and influenza A virus representing diverse degrees of maturity. In each case, affinity was improved after creating 20 or fewer new variants of each antibody, across only two rounds of evolution. Although many of the suggested mutations would be considered common in nature, 5/32 affinity-enhancing mutations involved a rare or uncommon substitution. One surprising but effective substitution was that of a glycine in the wild-type (observed in 99% of natural antibody sequences) to a proline (observed in <1% of natural sequences).

‘One shot’ ML enabled de novo antibody design has been reported [76] using a model trained on known antibody-antigen complex structures. As validation of the method, the known product trastuzumab and its antigen HER2 were taken as a case study. Novel HCDR3 and HCDR123 sequences (diverse with respect to trastuzumab and each other) were generated from the model, which were validated using SPR, with 71 having affinities less than 10 nM. Three antibodies had a higher affinity for HER2 than trastuzumab.

LLMs as an orchestrator of experiments

No article would be complete without mentioning the integration of ML with experiment planning and execution. ChemCrow [77] and Coscientist [78] are LLM-based systems which design, plan and execute complex experiments. The user interface is the LLM and it is augmented with modules or agents which are designed for very specific tasks (e.g. web search, retrosynthetic analysis, structure to price, programming of liquid handlers). The LLM is able to take user instruction, e.g. ‘Find and synthesize a thiourea organocatalyst which accelerates a Diels-Alder reaction’, orchestrate the various tools to produce an answer and even create code to drive an automated synthesis platform. ChemCrow was able to design a new chromophore with a predicted maximum absorption wavelength of 369 nm and a two-step synthetic protocol from available starting materials. Coscientist was able to orchestrate iterative experiments to optimise conditions for both Suzuki coupling and Buchwald–Hartwig reactions (Figure 3).

Cross-coupling Suzuki and Sonogashira reaction experiments designed and performed by Coscientist.

Figure 3:
Cross-coupling Suzuki and Sonogashira reaction experiments designed and performed by Coscientist.

(a) Overview of Coscientist’s configuration. (b) Available compounds (DMF, dimethylformamide; DiPP, 2,6-diisopropylphenyl). (c) Liquid handler setup. (d) Solving the synthesis problem. (e) Comparison of reagent selection performance with a large dataset. (f) Comparison of reagent choices across multiple runs. (g) Overview of justifications made when selecting various aryl halides. (h) Frequency of visited URLs. (I and j) analytical data on the synthesised materials compared with pure standards. Reproduced from reference [78] under the Creative Commons Attribution 4.0 International License (https://creativecommons.org/licenses/by/4.0/)

Figure 3:
Cross-coupling Suzuki and Sonogashira reaction experiments designed and performed by Coscientist.

(a) Overview of Coscientist’s configuration. (b) Available compounds (DMF, dimethylformamide; DiPP, 2,6-diisopropylphenyl). (c) Liquid handler setup. (d) Solving the synthesis problem. (e) Comparison of reagent selection performance with a large dataset. (f) Comparison of reagent choices across multiple runs. (g) Overview of justifications made when selecting various aryl halides. (h) Frequency of visited URLs. (I and j) analytical data on the synthesised materials compared with pure standards. Reproduced from reference [78] under the Creative Commons Attribution 4.0 International License (https://creativecommons.org/licenses/by/4.0/)

Close modal

Are we there yet?

The above examples illustrate the potential that ML tools have to improve the rate of scientific discovery. However, these examples represent the state of the art, and publication bias (see later) is very real – these are published because they are successful. Perhaps, the best way to comment is to explore the limitations of the current tools.

First and foremost, ML relies on good data and preferably in large quantities. Where this exists, the resultant models can be impressive. But large high-quality scientific data are expensive to acquire: the cost of replacing the protein structure data in the PDB is conservatively estimated at $20 billion [79]. Data are the recurring issue, particularly for applications such as drug discovery where the application domain is always outside or on the edge of the training set [80]. Extrapolation is the requirement for useable ML models, and here, we are still struggling to understand what it is these models are actually learning. High-profile docking models were exposed as learning the data but no physics [81], whilst there are justifiable concerns on overfitting to errors in data [82]. Even AlphaFold3 has been shown to memorise conformations and not the physics which underpin them [83]. This potential for memorisation and lack of causal reasoning has led to a call to make AI be more scientific [84]. As one researcher noted [85], LLMs and other AI systems ‘lack the basic capacities for intersubjectivity, semantics and ontology that are preconditions for the kind of collaborative world-making that allows scientists to theorize, understand, innovate and discover’.

ML researchers rely on public domain benchmarks to judge the effectiveness of their new algorithms. It was the CASP (Critical Assessment of Structure Prediction) [86] challenges that enabled the revolution in protein structure prediction. There are no comparable benchmarks for real world drug discovery, and this remains a constraint on the field [87]. The literature is full of publication bias – there is no ‘Journal of Failed Chemical Reactions’.

Evolution or revolution in drug design?

There is no ignoring the impact of ML or denying the potential impact in the coming years. How will it benefit drug discovery? It will depend on the implementation because this is a disruptive technology – to get the best out of it requires business process re-engineering [88]. AI demands data. With the appropriate data, the ML drive design and discovery will perform well. Getting the right data quickly and cheaply is the challenge.

Biologics discovery will most probably be first to feel the benefits as much of the necessary experiments are largely automated, and the performance of the foundational models is impressive.

In small molecule discovery, we are likely to see a dual track adoption. On one hand, new companies are built around automated design (e.g. ExScientia, now merged with Recursion) in much the same way that companies were formed to pursue Structure Based Design (Vertex) and Fragment Based Design (Astex). More established companies will need to overcome the well-established “human centric” model [89] of the designer-maker medicinal chemist, which is not well placed to adopt the new approaches. Change management in this community can be a difficult business [90]. Indeed, McKinsey estimates that change management costs are three times the development of generative AI solutions [91]. But change will need to come, and it is not out of place to mention Kodak [92] as a cautionary tale at this point.

Moving ML models away from interpolation and towards extrapolation/reasoning and mechanistic thinking is necessary. AlphaFold3 has probably extracted as much out of current public domain data as is possible. A possible solution to both of these issues could be greater integration of physics-based models and simulation as a source of data [93].

What we do know is that the pace of change in the ML world is faster than any previous technology change we have witnessed. The next few years will see the growth in ‘lab in the loop’ [94,95] and even Autonomous Discovery [96-98] approaches as ML, informatics and experimental automation converge. We will remember this period as the time when a Revolution started. Drug design will look very different in the future, even if at the moment it is difficult to predict what the end state will look like.

Summary Points

  • Machine learning (ML) in drug discovery builds on decades of innovation in bioinformatics, cheminformatics and computational chemistry.

  • ML adds significant capabilities to the computational toolbox, in some cases providing a significant leap in performance.

  • The literature is full of successful examples of ML-driven design in both small molecules and biologics.

  • Having the right data – both quality and quantity – is key to success.

  • This is a disruptive technology which will change how scientists work.

The authors declare that there are no competing interests associated with the manuscript.

AI

artificial intelligence

FEP

free energy perturbation

LLM

large language model

ML

machine learning

QSAR

quantitative structure activity relationship

RNN

recurrent neural network

1
Devereson
,
A.
,
Sandler
,
C.
and
McKinsey
,
L
. (
2022
)
How AI could revolutionize drug discovery
. https://www.mckinsey.com/industries/life-sciences/our-insights/how-ai-could-revolutionize-drug-discovery
2
Mikulic
,
M
. (
2023
)
Statista
.
Investment in leading AI-focused biotech companies worldwide in 2018-2022, by use case
. https://www.statista.com/statistics/1428307/investment-in-ai-focused-biotech-companies-by-use-case/
3
Rawal
,
J
. (
2024
)
Artificial Intelligence (AI) in drug discovery market size
.
Fortune Business Insights
. https://www.fortunebusinessinsights.com/artificial-intelligence-in-drug-discovery-market-105354
4
Temkin
,
S
. (
2024
)
Techcrunch
.
Xaira, an AI drug discovery startup, launches with a massive $1B, says it’s ‘ready’ to start developing drugs
. https://techcrunch.com/2024/04/24/xaira-an-ai-drug-discovery-startup-launches-with-a-massive-1b-says-its-ready-to-start-developing-drugs
5
Scannell
,
J.W.
,
Blanckley
,
A.
,
Boldon
,
H.
and
Warrington
,
B
. (
2012
)
Diagnosing the decline in pharmaceutical R&D efficiency
.
Nat. Rev. Drug Discov.
11
,
191
200
https://doi.org/10.1038/nrd3681
6
Yang
,
X.
,
Wang
,
Y.
,
Byrne
,
R.
,
Schneider
,
G.
and
Yang
,
S
. (
2019
)
Concepts of artificial intelligence for computer-assisted drug discovery
.
Chem. Rev.
119
,
10520
10594
https://doi.org/10.1021/acs.chemrev.8b00728
7
Yang
,
J.
,
Cai
,
Y.
,
Zhao
,
K.
,
Xie
,
H.
and
Chen
,
X
. (
2022
)
Concepts and applications of chemical fingerprint for hit and lead screening
.
Drug Discov. Today
27
,
103356
https://doi.org/10.1016/j.drudis.2022.103356
8
Krizhevsky
,
A.
,
Sutskever
,
I.
and
Hinton
,
G.E
. (
2017
)
ImageNet classification with deep convolutional neural networks
.
Commun. ACM
60
,
84
90
https://doi.org/10.1145/3065386
9
Weininger
,
D
. (
1988
)
SMILES, a chemical language and information system. 1. Introduction to methodology and encoding rules
.
J. Chem. Inf. Comput. Sci.
28
,
31
36
https://doi.org/10.1021/ci00057a005
10
Nguyen
,
C.Q.
,
Pertusi
,
D.
and
Branson
,
K.M
. (
2023
)
Molecule-morphology contrastive pretraining for transferable molecular representation
[
preprint arXiv:2305.09790
].
arXiv
. https://doi.org/10.48550/arXiv.2305.09790
11
Ho
,
J.
,
Jain
,
A.
and
Abbeel
,
P
. (
2020
)
Denoising diffusion probabilistic models
.
Advances in Neural Information Processing Systems
,
Vol
.
33
of
,
pp
.
6840
6851
12
Guo
,
Z.
,
Liu
,
J.
,
Wang
,
Y.
,
Chen
,
M.
,
Wang
,
D.
,
Xu
,
D.
et al.
(
2024
)
Diffusion models in bioinformatics and computational biology
.
Nat. Rev. Bioeng.
2
,
136
154
https://doi.org/10.1038/s44222-023-00114-9
13
Igashov
,
I.
,
Stärk
,
H.
,
Vignac
,
C.
,
Schneuing
,
A.
,
Satorras
,
V.G.
,
Frossard
,
P.
et al.
Equivariant 3D-conditional diffusion model for molecular linker design
.
Nat. Mach. Intell.
6
,
417
427
https://doi.org/10.1038/s42256-024-00815-9
14
Bjerrum
,
E.J.
and
Threlfall
,
R
. (
2017
)
Molecular generation with recurrent neural networks (RNNs)
[
preprint arXiv:1705.04612
].
arXiv
. https://doi.org/10.48550/arXiv.1705.04612
15
Van Houdt
,
G.
,
Mosquera
,
C.
and
Nápoles
,
G
. (
2020
)
A review on the long short-term memory model
.
Artif. Intell. Rev.
53
,
5929
5955
https://doi.org/10.1007/s10462-020-09838-1
16
Naveed
,
H.
,
Khan
,
A.U.
,
Qiu
,
S.
,
Saqib
,
M.
,
Anwar
,
S.
,
Usman
,
M.
et al.
(
2023
)
A comprehensive overview of large language models
[
preprint arXiv:2307.06435
].
arXiv
. https://doi.org/10.48550/arXiv.2307.06435
17
Overton
,
C.E
. (
1901
)
Studien über die Narkose: zugleich ein Beitrag zur allgemeinen pharmakologie
.
G. Fischer
18
Meyer
,
H
. (
1901
)
Zur theorie der alkoholnarkose
.
Archiv f. experiment. Pathol. u. Pharmakol.
46
,
338
346
https://doi.org/10.1007/BF01978064
19
Hansch
,
C.
,
Maloney
,
P.P.
,
Fujita
,
T.
and
Muir
,
R.M
. (
1962
)
Correlation of biological activity of phenoxyacetic acids with hammett substituent constants and partition coefficients
.
Nature
194
,
178
180
https://doi.org/10.1038/194178b0
20
Cherkasov
,
A.
,
Muratov
,
E.N.
,
Fourches
,
D.
,
Varnek
,
A.
,
Baskin
,
I.I.
,
Cronin
,
M.
et al.
(
2014
)
QSAR modeling: where have you been? where are you going to?
J. Med. Chem.
57
,
4977
5010
https://doi.org/10.1021/jm4004285
21
Geladi
,
P
. (
1988
)
Notes on the history and nature of partial least squares (PLS) modelling
.
J. Chemom.
2
,
231
246
https://doi.org/10.1002/cem.1180020403
22
Gasteiger
,
J.
and
Zupan
,
J
. (
1993
)
Neural networks in chemistry
.
Angew. Chem. Int. Ed. Engl.
32
,
503
527
https://doi.org/10.1002/anie.199305031
23
Harper
,
G.
,
Bradshaw
,
J.
,
Gittins
,
J.C.
,
Green
,
D.V.
and
Leach
,
A.R
. (
2001
)
Prediction of biological activity for high-throughput screening using binary kernel discrimination
.
J. Chem. Inf. Comput. Sci.
41
,
1295
1300
https://doi.org/10.1021/ci000397q
24
Svetnik
,
V.
,
Liaw
,
A.
,
Tong
,
C.
,
Culberson
,
J.C.
,
Sheridan
,
R.P.
and
Feuston
,
B.P
. (
2003
)
Random forest: a classification and regression tool for compound classification and QSAR modeling
.
J. Chem. Inf. Comput. Sci.
43
,
1947
1958
https://doi.org/10.1021/ci034160g
25
Warmuth
,
M.K.
,
Liao
,
J.
,
Rätsch
,
G.
,
Mathieson
,
M.
,
Putta
,
S.
and
Lemmen
,
C
. (
2003
)
Active learning with support vector machines in the drug discovery process
.
J. Chem. Inf. Comput. Sci.
43
,
667
673
https://doi.org/10.1021/ci025620t
26
Darvas
,
F
. (
1974
)
Application of the sequential simplex method in designing drug analogs
.
J. Med. Chem.
17
,
799
804
https://doi.org/10.1021/jm00254a004
27
Hodgkin
,
E.E
. The Castlemaine project: development of an AI-based drug design support system.
In
In Molecular Modelling and Drug Design
,
pp
.
137
169
,
Palgrave
, https://doi.org/10.1007/978-1-349-12973-7_4
28
Gillet
,
V.J.
,
Willett
,
P.
,
Fleming
,
P.J.
and
Green
,
D.V.S
. (
2002
)
Designing focused libraries using MoSELECT
.
J. Mol. Graph. Model.
20
,
491
498
https://doi.org/10.1016/s1093-3263(01)00150-4
29
Nicolaou
,
C.A.
,
Brown
,
N.
and
Pattichis
,
C.S
. (
2007
)
Molecular optimization using computational multi-objective methods
.
Curr. Opin. Drug Discov. Devel.
10
,
316
324
30
Cartmell
,
J.
,
Enoch
,
S.
,
Krstajic
,
D.
and
Leahy
,
D.E
. (
2005
)
Automated QSPR through competitive workflow
.
J. Comput. Aided Mol. Des.
19
,
821
833
https://doi.org/10.1007/s10822-005-9029-8
31
Cox
,
R.
,
Green
,
D.V.S.
,
Luscombe
,
C.N.
,
Malcolm
,
N.
and
Pickett
,
S.D
. (
2013
)
QSAR workbench: automating QSAR modeling to drive compound design
.
J. Comput. Aided Mol. Des.
27
,
321
336
https://doi.org/10.1007/s10822-013-9648-4
32
Athanasiou
,
C.
and
Cournia
,
Z
.
From computers to bedside: computational chemistry contributing to FDA approval
.
Biomolecular Simulations in Structure-Based Drug Discovery
2018
,
163
203
https://doi.org/10.1002/9783527806836
33
Schultz
,
T.W.
,
Diderich
,
R.
,
Kuseva
,
C.D.
and
Mekenyan
,
O.G
.
The OECD QSAR toolbox starts its second decade
.
Computational Toxicology: Methods and Protocols
2018
,
55
77
https://doi.org/10.1007/978-1-4939-7899-1_2
34
Griffen
,
E.
,
Leach
,
A.G.
,
Robb
,
G.R.
and
Warner
,
D.J
. (
2011
)
Matched molecular pairs as a medicinal chemistry tool
.
J. Med. Chem.
54
,
7739
7750
https://doi.org/10.1021/jm200452d
35
Hussain
,
J.
and
Rea
,
C
. (
2010
)
Computationally efficient algorithm to identify matched molecular pairs (MMPs) in large data sets
.
J. Chem. Inf. Model.
50
,
339
348
https://doi.org/10.1021/ci900450m
36
Ehmki
,
E.S.R.
and
Kramer
,
C
. (
2017
)
Matched molecular series: measuring SAR similarity
.
J. Chem. Inf. Model.
57
,
1187
1196
https://doi.org/10.1021/acs.jcim.6b00709
37
Kandasamy
,
K.
,
Vysyaraju
,
K.R.
,
Neiswanger
,
W.
,
Paria
,
B.
,
Collins
,
C.R.
,
Schneider
,
J.
et al.
(
2020
)
Tuning hyperparameters without grad students: Scalable and robust bayesian optimisation with dragonfly
.
21
,
1
27
Journal of machine learning research: JMLR
.
38
Grygorenko
,
O.O.
,
Radchenko
,
D.S.
,
Dziuba
,
I.
,
Chuprina
,
A.
,
Gubina
,
K.E.
and
Moroz
,
Y.S
. (
2020
)
Generating multibillion chemical space of readily accessible screening compounds
.
iScience
23
, 101681 https://doi.org/10.1016/j.isci.2020.101681
39
Abel
,
R.
,
Wang
,
L.
,
Harder
,
E.D.
,
Berne
,
B.J.
and
Friesner
,
R.A
. (
2017
)
Advancing drug discovery through enhanced free energy calculations
.
Acc. Chem. Res.
50
,
1625
1632
https://doi.org/10.1021/acs.accounts.7b00083
40
Tropsha
,
A.
,
Isayev
,
O.
,
Varnek
,
A.
,
Schneider
,
G.
and
Cherkasov
,
A
. (
2024
)
Integrating QSAR modelling and deep learning in drug discovery: the emergence of deep QSAR
.
Nat. Rev. Drug Discov.
23
,
141
155
https://doi.org/10.1038/s41573-023-00832-0
41
Du
,
Y.
,
Jamasb
,
A.R.
,
Guo
,
J.
,
Fu
,
T.
,
Harris
,
C.
,
Wang
,
Y.
et al.
(
2024
)
Machine learning-aided generative molecular design
.
Nat. Mach. Intell.
6
,
589
604
https://doi.org/10.1038/s42256-024-00843-5
42
Smith
,
J.S.
,
Isayev
,
O.
and
Roitberg
,
A.E
. (
2017
)
ANI-1: an extensible neural network potential with DFT accuracy at force field computational cost
.
Chem. Sci.
8
,
3192
3203
https://doi.org/10.1039/c6sc05720a
43
Jumper
,
J.
,
Evans
,
R.
,
Pritzel
,
A.
,
Green
,
T.
,
Figurnov
,
M.
,
Ronneberger
,
O.
et al.
(
2021
)
Highly accurate protein structure prediction with AlphaFold
.
Nature
596
,
583
589
https://doi.org/10.1038/s41586-021-03819-2
44
Krishna
,
R.
,
Wang
,
J.
,
Ahern
,
W.
,
Sturmfels
,
P.
,
Venkatesh
,
P.
,
Kalvet
,
I.
et al.
(
2024
)
Generalized biomolecular modeling and design with RoseTTAFold All-atom
.
Science
384
, eadl2528 https://doi.org/10.1126/science.adl2528
45
Ramsundar
,
B.
,
Liu
,
B.
,
Wu
,
Z.
,
Verras
,
A.
,
Tudor
,
M.
,
Sheridan
,
R.P.
et al.
(
2017
)
Is multitask deep learning practical for pharma?
J. Chem. Inf. Model.
57
,
2068
2076
https://doi.org/10.1021/acs.jcim.7b00146
46
Kaufman
,
B.
,
Williams
,
E.C.
,
Underkoffler
,
C.
,
Pederson
,
R.
,
Mardirossian
,
N.
,
Watson
,
I.
et al.
(
2024
)
COATI: multimodal contrastive pretraining for representing and traversing chemical space
.
J. Chem. Inf. Model.
64
,
1145
1157
https://doi.org/10.1021/acs.jcim.3c01753
47
Segler
,
M.H.S.
,
Kogej
,
T.
,
Tyrchan
,
C.
and
Waller
,
M.P
. (
2018
)
Generating focused molecule libraries for drug discovery with recurrent neural networks
.
ACS Cent. Sci.
4
,
120
131
https://doi.org/10.1021/acscentsci.7b00512
48
Ahmad
,
W.
,
Simon
,
E.
,
Chithrananda
,
S.
,
Grand
,
G.
and
Ramsundar
,
B
.
Chemberta-2: towards chemical foundation models
[
preprint arXiv:2209.01712
].
arXiv
. https://doi.org/10.48550/arXiv.2209.01712
49
Madani
,
A.
,
Krause
,
B.
,
Greene
,
E.R.
,
Subramanian
,
S.
,
Mohr
,
B.P.
,
Holton
,
J.M.
et al.
(
2023
)
Large language models generate functional protein sequences across diverse families
.
Nat. Biotechnol.
41
,
1099
1106
https://doi.org/10.1038/s41587-022-01618-2
50
Fluetsch
,
A.
,
Di Lascio
,
E.
,
Gerebtzoff
,
G.
and
Rodríguez-Pérez
,
R
. (
2024
)
Adapting deep learning QSPR models to specific drug discovery projects
.
Mol. Pharm.
21
,
1817
1826
https://doi.org/10.1021/acs.molpharmaceut.3c01124
51
King-Smith
,
E
. (
2024
)
Transfer learning for a foundational chemistry model
.
Chem. Sci.
15
,
5143
5151
https://doi.org/10.1039/d3sc04928k
52
Bassani
,
D.
,
Brigo
,
A.
and
Andrews-Morger
,
A
. (
2023
)
Federated learning in computational toxicology: an industrial perspective on the effiris hackathon
.
Chem. Res. Toxicol.
36
,
1503
1517
https://doi.org/10.1021/acs.chemrestox.3c00137
53
Heyndrickx
,
W.
,
Mervin
,
L.
,
Morawietz
,
T.
,
Sturm
,
N.
,
Friedrich
,
L.
,
Zalewski
,
A.
et al.
(
2024
)
MELLODDY: cross-pharma federated learning at unprecedented scale unlocks benefits in QSAR without compromising proprietary information
.
J. Chem. Inf. Model.
64
,
2331
2344
https://doi.org/10.1021/acs.jcim.3c00799
54
Zhavoronkov
,
A.
,
Ivanenkov
,
Y.A.
,
Aliper
,
A.
,
Veselov
,
M.S.
,
Aladinskiy
,
V.A.
,
Aladinskaya
,
A.V.
et al.
(
2019
)
Deep learning enables rapid identification of potent DDR1 kinase inhibitors
.
Nat. Biotechnol.
37
,
1038
1040
https://doi.org/10.1038/s41587-019-0224-x
55
Yoshimori
,
A.
,
Asawa
,
Y.
,
Kawasaki
,
E.
,
Tasaka
,
T.
,
Matsuda
,
S.
,
Sekikawa
,
T.
et al.
(
2021
)
Design and synthesis of DDR1 inhibitors with a desired pharmacophore using deep generative models
.
ChemMedChem
16
,
955
958
https://doi.org/10.1002/cmdc.202000786
56
Grisoni
,
F.
,
Huisman
,
B.J.H.
,
Button
,
A.L.
,
Moret
,
M.
,
Atz
,
K.
,
Merk
,
D.
et al.
(
2021
)
Combining generative artificial intelligence and on-chip synthesis for de novo drug design
.
Sci. Adv.
7
, eabg3338 https://doi.org/10.1126/sciadv.abg3338
57
Garnsey
,
M.R.
,
Robinson
,
M.C.
,
Nguyen
,
L.T.
,
Cardin
,
R.
,
Tillotson
,
J.
,
Mashalidis
,
E.
et al.
(
2024
)
Discovery of SARS-CoV-2 papain-like protease (PLpro) inhibitors with efficacy in a murine infection model
.
Sci. Adv.
10
, eado4288 https://doi.org/10.1126/sciadv.ado4288
58
Besnard
,
J.
,
Ruda
,
G.F.
,
Setola
,
V.
,
Abecassis
,
K.
,
Rodriguiz
,
R.M.
,
Huang
,
X.P.
et al.
(
2012
)
Automated design of ligands to polypharmacological profiles
.
Nature
492
,
215
220
https://doi.org/10.1038/nature11691
59
Olszewski
,
A.
,
Kahn
,
D.
,
Yoo
,
B.
,
Tan
,
J.B.
,
Gupta
,
V.K.
,
Schuck
,
E.
et al.
(
2023
)
A Phase 1, open-label, multicenter, dose-escalation study of Sgr-1505 as monotherapy in subjects with mature B-cell malignancies
.
Blood
142
,
3102
3102
https://doi.org/10.1182/blood-2023-182838
60
Nie
,
Z.
and
Schrodinger Inc
.
Hit to development candidate in 10 months: Rapid discovery of a novel, potent MALT1 inhibitor
. https://www.schrodinger.com/life-science/learn/case-studies/hit-development-candidate-10-months-rapid-discovery-novel-potent-malt1-inhibitor/
61
Klarich
,
K.
,
Goldman
,
B.
,
Kramer
,
T.
,
Riley
,
P.
and
Walters
,
W.P
. (
2024
)
Thompson sampling─an efficient method for searching ultralarge synthesis on demand databases
.
J. Chem. Inf. Model.
64
,
1158
1171
https://doi.org/10.1021/acs.jcim.3c01790
62
de Oliveira
,
S.
,
Pedawi
,
A.
,
Kenyon
,
V.
,
van den Bedem
,
H.
. (
2024
)
NGT: generative AI with synthesizability guarantees identifies potent inhibitors for a G-protein associated melanocortin receptor in a tera-scale vHTS screen
.
ChemRxiv
. https://doi.org/10.26434/chemrxiv-2024-fz37h-v3
63
Thomas
,
M.
,
Matricon
,
P.G.
,
Gillespie
,
R.J.
,
Napiórkowska
,
M.
,
Neale
,
H.
,
Mason
,
J.S.
et al.
(
2024
)
Modern hit-finding with structure-guided de novo design: identification of novel nanomolar adenosine A2A receptor ligands using reinforcement learning
.
ChemRxiv
. https://doi.org/10.26434/chemrxiv-2024-wh7zw-v2
64
Mendez
,
D.
,
Gaulton
,
A.
,
Bento
,
A.P.
,
Chambers
,
J.
,
Veij
,
M.
,
Félix
,
E.
et al.
(
2019
)
ChEMBL: towards direct deposition of bioassay data
.
Nucleic Acids Res
47
,
D930
40
https://doi.org/10.1093/nar/gky1075
65
Ahmad
,
S.
,
Xu
,
J.
,
Feng
,
J.A.
,
Hutchinson
,
A.
,
Zeng
,
H.
,
Ghiabi
,
P.
et al.
(
2023
)
Discovery of a first-in-class small-molecule ligand for WDR91 using DNA-encoded chemical library selection followed by machine learning
.
J. Med. Chem.
66
,
16051
16061
https://doi.org/10.1021/acs.jmedchem.3c01471
66
Woolfson
,
D.N
. (
2021
)
A brief history of de novo protein design: minimal, rational, and computational
.
J. Mol. Biol.
433
, 167160 https://doi.org/10.1016/j.jmb.2021.167160
67
NobelPrize.org
.
The nobel prize in chemistry 2018
. https://www.nobelprize.org/prizes/chemistry/2018/summary/
68
Notin
,
P.
,
Rollins
,
N.
,
Gal
,
Y.
,
Sander
,
C.
and
Marks
,
D
. (
2024
)
Machine learning for functional protein design
.
Nat. Biotechnol.
42
,
216
228
https://doi.org/10.1038/s41587-024-02127-0
69
Callaway
,
E
. (
2024
)
Chemistry nobel goes to developers of alphafold AI that predicts protein structures
.
Nature New Biol.
634
,
525
526
https://doi.org/10.1038/d41586-024-03214-7
70
Abramson
,
J.
,
Adler
,
J.
,
Dunger
,
J.
,
Evans
,
R.
,
Green
,
T.
,
Pritzel
,
A.
et al.
(
2024
)
Accurate structure prediction of biomolecular interactions with alphafold 3
.
Nature
630
,
493
500
https://doi.org/10.1038/s41586-024-07487-w
71
Lin
,
Z.
,
Akin
,
H.
,
Rao
,
R.
,
Hie
,
B.
,
Zhu
,
Z.
,
Lu
,
W.
et al.
(
2023
)
Evolutionary-scale prediction of atomic-level protein structure with a language model
.
Science
379
,
1123
1130
https://doi.org/10.1126/science.ade2574
72
Berman
,
H.
,
Henrick
,
K.
,
Nakamura
,
H.
and
Markley
,
J.L
. (
2007
)
The worldwide protein data bank (wwPDB): ensuring a single, uniform archive of PDB data
.
Nucleic Acids Res.
35
,
D301
3
https://doi.org/10.1093/nar/gkl971
73
Vaswani
,
A.
,
Shazeer
,
N.
,
Parmar
,
N.
,
Uszkoreit
,
J.
,
Jones
,
L.
,
Gomez
,
A.N.
et al.
(
2017
)
Attention is all you need
.
Adv. Neural Inf. Process. Syst.
2017-December
,
5999
6009
74
Watson
,
J.L.
,
Juergens
,
D.
,
Bennett
,
N.R.
,
Trippe
,
B.L.
,
Yim
,
J.
,
Eisenach
,
H.E.
et al.
(
2023
)
De novo design of protein structure and function with RFdiffusion
.
Nature
620
,
1089
1100
https://doi.org/10.1038/s41586-023-06415-8
75
Zambaldi
,
V.
,
La
,
D.
,
Chu
,
A.E.
,
Patani
,
H.
,
Danson
,
A.E.
,
Kwan
,
T.O.
et al.
(
2024
)
De novo design of high-affinity protein binders with alphaproteo
[
arXiv:2409.08022
].
arXiv
. https://doi.org/10.48550/arXiv.2409.08022
76
Shanehsazzadeh
,
A.
,
McPartlon
,
M.
,
Kasun
,
G.
,
Steiger
,
A.K.
,
Sutton
,
J.M.
,
Yassine
,
E.
et al.
Unlocking de novo antibody design with generative artificial intelligence
.
2023
,
2023
01
bioRxiv
. https://doi.org/10.1101/2023.01.08.523187
77
M Bran
,
A.
,
Cox
,
S.
,
Schilter
,
O.
,
Baldassari
,
C.
,
White
,
A.D.
and
Schwaller
,
P
. (
2024
)
Augmenting large language models with chemistry tools
.
Nat. Mach. Intell.
6
,
525
535
https://doi.org/10.1038/s42256-024-00832-8
78
Boiko
,
D.A.
,
MacKnight
,
R.
,
Kline
,
B.
and
Gomes
,
G
. (
2023
)
Autonomous chemical research with large language models
.
Nature
624
,
570
578
https://doi.org/10.1038/s41586-023-06792-0
79
Burley
,
S.K.
,
Bhikadiya
,
C.
,
Bi
,
C.
,
Bittrich
,
S.
,
Chao
,
H.
,
Chen
,
L.
et al.
(
2023
)
RCSB protein data bank (RCSB.org): delivery of experimentally-determined PDB structures alongside one million computed structure models of proteins from artificial intelligence/machine learning
.
Nucleic Acids Res.
51
,
D488
D508
https://doi.org/10.1093/nar/gkac1077
80
Durant
,
G.
,
Boyles
,
F.
,
Birchall
,
K.
and
Deane
,
C.M
. (
2024
)
The future of machine learning for small-molecule drug discovery will be driven by data
.
Nat. Comput. Sci.
4
,
1
9
https://doi.org/10.1038/s43588-024-00699-0
81
Buttenschoen
,
M.
,
Morris
,
G.M.
and
Deane
,
C.M
. (
2024
)
Posebusters: AI-based docking methods fail to generate physically valid poses or generalise to novel sequences
.
Chem. Sci.
15
,
3130
3139
https://doi.org/10.1039/d3sc04185a
82
Crusius
,
D.
,
Cipcigan
,
F.
and
Biggin
,
P.C
. (
2025
)
Are we fitting data or noise? Analysing the predictive power of commonly used datasets in drug-, materials-, and molecular-discovery
.
Faraday Discuss.
256
,
304
321
https://doi.org/10.1039/d4fd00091a
83
Chakravarty
,
D.
,
Schafer
,
J.W.
,
Chen
,
E.A.
,
Thole
,
J.F.
,
Ronish
,
L.A.
,
Lee
,
M.
et al.
(
2024
)
AlphaFold predictions of fold-switched conformations are driven by structure memorization
.
Nat. Commun.
15
, 7296 https://doi.org/10.1038/s41467-024-51801-z
84
Coveney
,
P.V.
and
Highfield
,
R
. (
2024
)
Artificial intelligence must be made more scientific
.
J. Chem. Inf. Model.
64
,
5739
5741
https://doi.org/10.1021/acs.jcim.4c01091
85
Birhane
,
A.
,
Kasirzadeh
,
A.
,
Leslie
,
D.
and
Wachter
,
S
. (
2023
)
Science in the age of large language models
.
Nat. Rev. Phys.
5
,
277
280
https://doi.org/10.1038/s42254-023-00581-4
86
Kryshtafovych
,
A.
,
Schwede
,
T.
,
Topf
,
M.
,
Fidelis
,
K.
and
Moult
,
J
. (
2023
)
Critical assessment of methods of protein structure prediction (CASP)-round XV
.
Proteins
91
,
1539
1549
https://doi.org/10.1002/prot.26617
87
Wognum
,
C.
,
Ash
,
J.R.
,
Aldeghi
,
M.
,
Rodríguez-Pérez
,
R.
,
Fang
,
C.
,
Cheng
,
A.C.
et al.
(
2024
)
A call for an industry-led initiative to critically assess machine learning for real-world drug discovery
.
Nat. Mach. Intell.
6
,
1120
1121
https://doi.org/10.1038/s42256-024-00911-w
88
Rosemann
,
M.
,
Brocke
,
J. vom
,
Van Looy
,
A.
and
Santoro
,
F
. (
2024
)
Business process management in the age of AI – three essential drifts
.
Inf. Syst. E-Bus. Manage.
22
,
1
15
https://doi.org/10.1007/s10257-024-00689-9
89
Green
,
DVS
. (
2019
)
Using machine learning to inform decisions in drug discovery: an industry perspective
.
In
Machine Learning in Chemistry. ACS Symposium Series
(
Pyzer-Knapp
,
E.
, ed),
pp
.
81
101
. https://doi.org/10.1021/bk-2019-1326.ch005
90
Oprea
,
T.I.
and
Weininger
,
D
. (
2024
)
Rethinking medicinal chemistry in the cheminformatics age
.
J. Med. Chem.
67
,
17935
17939
https://doi.org/10.1021/acs.jmedchem.4c02179
91
McKinsey & Company
. (
2024
)
Moving past gen AI’s honeymoon phase: seven hard truths for CIOs to get from pilot to scale
. https://www.mckinsey.com/capabilities/mckinsey-digital/our-insights/moving-past-gen-ais-honeymoon-phase-seven-hard-truths-for-cios-to-get-from-pilot-to-scale
92
Lucas
,
H.C.
and
Goh
,
J.M
. (
2009
)
Disruptive technology: how kodak missed the digital photography revolution
.
The Journal of Strategic Information Systems
18
,
46
55
https://doi.org/10.1016/j.jsis.2009.01.002
93
Frasnetti
,
E.
,
Cucchi
,
I.
,
Pavoni
,
S.
,
Frigerio
,
F.
,
Cinquini
,
F.
,
Serapian
,
S.A.
et al.
(
2024
)
Integrating molecular dynamics and machine learning algorithms to predict the functional profile of kinase ligands
.
J. Chem. Theory Comput.
20
,
9209
9229
https://doi.org/10.1021/acs.jctc.4c01097
94
Zenil
,
H.
,
Tegnér
,
J.
,
Abrahão
,
F.S.
,
Lavin
,
A.
,
Kumar
,
V.
,
Frey
,
J.G.
et al.
The future of fundamental science led by generative closed-loop artificial intelligence
[
preprint arXiv:2307.07522. 2023
].
arXiv
. https://doi.org/10.48550/arXiv.2307.07522
95
Saikin
,
S.K.
,
Kreisbeck
,
C.
,
Sheberla
,
D.
,
Becker
,
J.S.
and
Aspuru-Guzik
,
A.
. (
2019
)
Closed-loop discovery platform integration is needed for artificial intelligence to make an impact in drug discovery
.
Expert Opin. Drug Discov.
14
,
1
4
https://doi.org/10.1080/17460441.2019.1546690
96
Coley
,
C.W.
,
Eyke
,
N.S.
and
Jensen
,
K.F
. (
2020
)
Autonomous discovery in the chemical sciences part I: progress
.
Angew. Chem. Int. Ed. Engl.
59
,
22858
22893
https://doi.org/10.1002/anie.201909987
97
Coley
,
C.W.
,
Eyke
,
N.S.
and
Jensen
,
K.F
. (
2020
)
Autonomous discovery in the chemical sciences part II: outlook
.
Angew. Chem. Int. Ed.
59
,
23414
23436
https://doi.org/10.1002/anie.201909989
98
Sparkes
,
A.
,
Aubrey
,
W.
,
Byrne
,
E.
,
Clare
,
A.
,
Khan
,
M.N.
,
Liakata
,
M.
et al.
(
2010
)
Towards robot scientists for autonomous scientific discovery
.
Autom. Exp.
2
,
1
1
https://doi.org/10.1186/1759-4499-2-1
This is an open access article published by Portland Press Limited on behalf of the Biochemical Society and distributed under the Creative Commons Attribution License 4.0 (CC BY).