Dynamic pathway engineering aims to build metabolic production systems embedded with intracellular control mechanisms for improved performance. These control systems enable host cells to self-regulate the temporal activity of a production pathway in response to perturbations, using a combination of biosensors and feedback circuits for controlling expression of heterologous enzymes. Pathway design, however, requires assembling together multiple biological parts into suitable circuit architectures, as well as careful calibration of the function of each component. This results in a large design space that is costly to navigate through experimentation alone. Methods from artificial intelligence (AI) and machine learning are gaining increasing attention as tools to accelerate the design cycle, owing to their ability to identify hidden patterns in data and rapidly screen through large collections of designs. In this review, we discuss recent developments in the application of machine learning methods to the design of dynamic pathways and their components. We cover recent successes and offer perspectives for future developments in the field. The integration of AI into metabolic engineering pipelines offers great opportunities to streamline design and discover control systems for improved production of high-value chemicals.
Introduction
A key aim in metabolic engineering is the production of high-value chemicals using the metabolic machinery of microorganisms [1,2]. In a typical metabolic engineering pipeline, microbial strains are transformed with enzymatic genes that convert native precursors of the host into target products. However, production is typically limited by multiple factors such as pathway sensitivity to fermentation conditions, accumulation of toxic intermediates, and difficulties in scaling up production. To overcome these challenges, last decade has witnessed the birth of dynamic pathway engineering, a technology where production strains are endowed with built-in feedback control systems. Such control systems can adapt the temporal expression of pathway enzymes in response to changes in cellular or bioreactor conditions [3]. This strategy can improve robustness and diminish the impact of toxic intermediate accumulation, gene expression burden, and other common challenges encountered in applications [4].
Dynamic pathways contain two core components [5]: a backbone production pathway and a set of biosensors that control enzymatic expression in response to metabolite signals. But assembling these systems requires bringing together various disparate molecular components such as catalytic enzymes, metabolite-sensing proteins and genetic elements (e.g. promoters or ribosomal binding sites). The implementation of these systems thus requires costly experimental work for assembling, testing and fine-tuning the system components. Computational methods can help accelerating the design cycle with effective tools for in silico modelling and simulation of system performance. To date, such computational tools have been largely dominated by kinetic models using ordinary differential equations. Most recently there has been an increased interest in methods from artificial intelligence (AI) and machine learning [6], owing to their flexibility and ability to detect patterns in complex datasets.
Here, we discuss recent applications of AI and machine learning to aid the design of dynamic pathways. We focus on three aspects of pathway design where machine learning methods have the potential to provide substantial benefits over traditional modelling approaches (Figure 1): pathway assembly via retrosynthesis, design of small molecule biosensors, and the selection of suitable control architectures. For conciseness, we do not discuss details of specific machine learning models, as this is an extensive subject beyond the scope of this review. For a primer on AI and machine learning for biological applications, we refer the reader to the excellent review by Greener et al. [7]. We restrict this review to dynamic pathway engineering, as machine learning applications for static pathways has been covered extensively elsewhere in the literature [6,8–10].
Applications areas of machine learning in dynamic pathway engineering: retrosynthesis, biosensor design, and circuit architecture design.
(A) Exemplar dynamic pathway whereby metabolites bind to transcriptional biosensors that control the temporal enzyme expression. (B) Pathway assembly begins with retrosynthesis of the pathway backbone from native metabolic substrates. Retrosynthesis algorithms predict a given reactant and enzyme which produce the desired product. Machine learning models can be trained on reaction rules or SMILES strings to find the best route from substrates to products [12–15]. (C) Metabolite biosensors such as transcription factors or RNA aptamers can be engineered to bind to small molecule ligands [16,17]; progress in protein design guided by machine learning offers exciting routes for the design of ligand-specific biosensors [18,19]. The biosensor dose-response curves can be tuned by changing the promoter sequence or other non-coding genetic elements. Several works have built sequence-to-expression machine learning models that can be employed for the design of such non-coding sequences [20–23]. (D) Specific pathway dynamics can be achieved by different control architectures that differ in their implementation costs. The selection of optimal architectures can be aided with optimization methods from machine learning [24–26].
(A) Exemplar dynamic pathway whereby metabolites bind to transcriptional biosensors that control the temporal enzyme expression. (B) Pathway assembly begins with retrosynthesis of the pathway backbone from native metabolic substrates. Retrosynthesis algorithms predict a given reactant and enzyme which produce the desired product. Machine learning models can be trained on reaction rules or SMILES strings to find the best route from substrates to products [12–15]. (C) Metabolite biosensors such as transcription factors or RNA aptamers can be engineered to bind to small molecule ligands [16,17]; progress in protein design guided by machine learning offers exciting routes for the design of ligand-specific biosensors [18,19]. The biosensor dose-response curves can be tuned by changing the promoter sequence or other non-coding genetic elements. Several works have built sequence-to-expression machine learning models that can be employed for the design of such non-coding sequences [20–23]. (D) Specific pathway dynamics can be achieved by different control architectures that differ in their implementation costs. The selection of optimal architectures can be aided with optimization methods from machine learning [24–26].
Pathway retrosynthesis
The first step when designing a production pathway is the identification of enzymatic conversion routes from host metabolites to the target product. Finding such routes involves specifying sequences of reactions steps catalyzed by enzymes that need to expressed in the host of interest. This is a pathway retrosynthesis problem [11] for which numerous computational tools have been developed [27–30]. Typical approaches to retrosynthesis employ template-based strategies, whereby databases of expert-curated pathways and substrate-enzyme pairs are converted into reaction rules. Computational algorithms are then employed to find suitable pathway components and stoichiometries among a combinatorially large design space. These tools produce retrosynthesis networks linking target compounds to metabolites of the host strain, typically ranking the possible pathways based on enzyme availability, performance, product and intermediate toxicities, or theoretical yield.
Machine learning algorithms are finding a growing number of applications in pathway retrosynthesis. For example, retrosynthesis software packages incorporate supervised machine learning models to score candidate pathways based on their ability to retrieve the correct product [28]. Baylon et al. [31] built a machine learning retrosynthesis pipeline with two stages: first, a neural network predicts a group of rules which can be applied to the target chemical, and then a second network predicts a specific chemical transformation within a predicted group of rules. Another approach relied on reinforcement learning to build a tree search algorithm that selects chemical transformations and then ranks the results based on chemical similarity between the current transformation and the native chemical reaction [12]. It has been shown that expert curation can improve the accuracy of machine learning methods, as compared with either of them in isolation [32]. Recent work has also focussed on using graph neural networks (GNNs) for chemical retrosynthesis [33] and their application to biochemical pathways holds substantial promise.
Most recently, progress in large language models has triggered a new wave of template free retrosynthesis algorithms. These work by training machine learning models directly on molecular representations such as SMILES strings and learn chemical reaction rules from a vast corpus of chemical structure data. An initial attempt at the problem was made using an encoder-decoder structure with recurrent neural networks [13]. Following the enormous success of the Transformer architecture [34], several works employed it for pathway retrosynthesis with prediction accuracy surpassing those of template-based methods [14,35,36]. Extensions of this work include architecture modification [37] as well as training on raw patent data rather than SMILES strings, which appears to learn reaction description information in addition to the reaction details [38].
Design of metabolite biosensors
Biosensors are used throughout metabolic engineering as screening or strain selection tools, and have been built to respond to many signals, including cellular stress responses, temperature, and small molecules [17]. In the case of dynamic pathway engineering, robust production requires up- or down-regulation of enzyme expression in response to metabolic signals. To this end, genetically-encoded metabolite biosensors have been widely adopted to close the loop between pathway activity and enzyme expression. Biosensors employed so far are mostly based on metabolite-responsive transcription factors [5] or RNA aptamers [39], both of which can be used to control gene expression in response to a target metabolite of interest.
Biosensor design comprises primarily two tasks: engineering specificity/affinity toward a target metabolite [17], and engineering the shape of the biosensor dose-response curve, including key parameters as its sensitivity, dynamic range, and leaky expression levels [40]. Modifications to affinity or specificity are typically done with tools from protein or DNA engineering techniques [16]. While not specifically aimed at biosensor design, a large portion of current work at the interface of AI and synthetic biology focuses on protein engineering [41,42]. Significant advances in protein structure prediction algorithms such as AlphaFold2 can learn sequence representations that are predictive of protein secondary and tertiary structure [43,44]. Unsupervised language models have made significant progress in learning high-level protein representations that are predictive of both structure and function [45]. These developments are revolutionizing the predictive design of proteins with novel or improved functions and offer exciting opportunities for biosensor design in dynamic pathway engineering. Beyond protein design, a number of works developed machine learning pipelines to design or improve metabolite-responsive RNA devices. For example, Groher et al. [46] employed supervised learning to improve the function of a tetracycline-dependent riboswitch composed of two aptamers, and other works have incorporated models of RNA secondary structure for the design of S-adenosyl methionine (SAM) riboswitches, one of the most well studied for metabolite-responsive RNA aptamers [47]. A number of other approaches have employed deep learning models of varied complexity for the design of RNA toehold switches that respond to small molecules [20,48,49].
The design of biosensor dose-response curves, on the other hand, has primarily relied on controlling transcriptional and translational efficiency via non-coding elements such as promoters, ribosomal binding sites and terminators [17,50]. Thanks to progress in high-throughput DNA synthesis and sequencing, there is a growing interest in massively parallel reporter assays [51,52] to characterize sequence-function associations [53], and a number of works have employed deep learning to build models for the design of promoters [21,23] and sequences that impact translational efficiency [22,54]. These sequence-to-expression models can be particularly powerful for design, as they can be wrapped into sampling or optimization routines to discover sequences with improved phenotypes [21,55,56]. Using the lac repressor as a model system, machine learning algorithms have also been employed to design sequences that influence the shape of the dose-response curve [57]; the work by Zhou et al. [58] applied such approach to improve the dynamic range of a malonyl-CoA responsive transcription factor. Several approaches to response curve engineering have also utilized natural motifs found in related organisms. For example, Ding et al. [59] employed ribosomal binding site data to built a machine learning model that allows predictable tuning of the dynamic range of a glucarate biosensor. Wang et al. [60] successfully used a generalized adversarial networks to generate synthetic promoters after being trained on Escherichia coli promoter activity data. Recent work employed GANs to generate entire regulatory sequences with models trained on natural sequences [61].
In many applications of interest, there are few or no biosensors that can respond to intermediates of a specific pathway of interest [62]. To bridge this knowledge gap, several groups have assembled databases of metabolites and transcription factor interactions [63–65]. These datasets can potentially be employed to train machine learning models for biosensor discovery and expand the range of detectable metabolites, particularly considering recent successes in molecular discovery using phenotypic screening data [66,67].
Design of control architectures
Once a production pathway and the required metabolite biosensors have been established, the next step is the design of a control architecture, i.e. to decide how and which enzymes should be controlled by the biosensor. This is an important design decision because similar control systems can be built with several combinations of positive and negative feedback loops. Such architectures can differ substantially in their complexity and cost of implementation, for example because they require a different number of engineered promoters and transcription factors. To date, the selection of control architectures has been done largely on a trial and-error basis guided by pathway-specific knowledge [5], or with the use of computational pathway models based on differential equations [68]. Several works have employed such models to identify architectures that can support a specific production phenotype [69–72], analyze their temporal dynamics [73–76], and identify architectures that optimize production [77–79].
Recently, several studies have proposed the use of machine learning methods for optimizing the architecture of biological circuits [80,81]. Work by Hiscock [24] exploited gradient descent algorithms commonly employed for training machine learning models to find gene circuit architectures that matches a desired temporal output. Another recent work by Shen et al. [26] employed recurrent neural networks to design synthetic gene circuits, while Frank [82] used automatic differentiation methods from machine learning to select optimal architectures in transcription factor circuits. This body of work has focussed mostly on genetic circuits that do not interact with metabolic pathways. In the case of dynamic pathway engineering, a recent work proposed the use of Bayesian optimization, a technique widely used for model selection in deep learning, to simultaneously optimize control architectures and biosensor dose-response curves [25]. The use of machine learning approaches for circuit design allows exploring large design spaces in a computationally efficient manner, and provides a first step toward integrated design pipelines aimed at dynamic pathway engineering.
Conclusions
AI and machine learning are rapidly being adopted across many biological design tasks [6,83,84]. In the case of dynamic pathway engineering, recent works highlight how such methods can assist in various stages of the pathway design process. Here, we have discussed such progress along three key directions: pathway retrosynthesis, biosensor design, and control architecture design. The pace and depth of deployment of AI varies significantly across these three areas. For pathway retrosynthesis, the enormous success of language models already has produced new approaches to discover enzymatic conversion routes from host intermediates to target products. In the case of biosensor design, there are numerous AI approaches that support tasks in protein and DNA sequence engineering, which are both required for optimizing biosensor function; while most of these methods have not been specifically tailored for biosensor engineering yet, their increasing adoption will likely permeate to the design of metabolite-responsive molecular mechanisms. Finally, the design of control architectures is the most recent application area of AI in dynamic pathway engineering, and offers exciting avenues for the development of powerful algorithms to screen competing designs and identify those that meet specifications and experimental implementation constraints.
As the current literature shows, machine learning methods have so far been applied to a wide variety of design tasks, many of which require different input data modalities, model architectures and strategies for performance evaluation. Although this flexibility endows designers with a wide range of powerful algorithms, it comes at the cost of large data requirements for model training. Progress in laboratory automation and high-throughput screening are paving the way such data-rich approach for biological design. The development of biofoundries across the globe [85] together with progress in self-driving laboratories [86] offer exciting opportunities for large-scale data acquisition, which can pave the way for the systematic integration of AI and machine learning into pathway design pipelines.
The interface between AI and dynamic pathway engineering is a relatively new and evolving field, with much of the recent work is still at a proof-of-concept stage. Future efforts will likely place an increasing focus on more user-friendly software tools that can bring this technology into the hands of wetlab practitioners, much like in other areas that enjoy a growing number of bespoke software packages [87–89]. One area of particular interest is the use of active learning for pathway design. Active learning is a machine learning paradigm where the model selects the most informative designs to implement, thereby reducing the number of experiments required to explore the design space effectively. Several software packages such as BioAutomata [90], ART [91], ActiveOpt [92], and METIS [93] have implemented active learning pipelines for the design of static production pathways. In the case of dynamic pathways, however there is a pressing lack of comprehensive computational tools that support end-to-end system design. Given the complexity and number of designable components of dynamic pathways, the application of active learning tools could lead to important efficiency gains in implementation and prototyping. With the growing number of applications of machine learning in pathway engineering, and the continued efforts to develop comprehensive software packages, we can expect significant advancements in this area in the coming years that will support the wider adoption of AI and machine learning for strain design.
Perspectives
Dynamic pathway engineering offers promising routes for building robust production strains, but these require assembly of many biological components into complex circuits. Computational methods can rapidly screen potential designs in silico, thus accelerating the navigation of large and experimentally intractable design spaces.
There is a growing interest in artificial intelligence methods for the design of dynamic pathways, particularly for pathway retrosynthesis, design of metabolite-responsive biosensors, and the optimization of circuit architectures. Machine learning models can improve over classic algorithms and help solve previously intractable design problems.
Progress in laboratory automation and high-throughput screening will pave the way for more data-centric approaches to biological design, and enable the wider adoption of AI and machine learning in the field.
Competing Interests
The authors declare that there are no competing interests associated with the manuscript.
Funding
C.M. and D.A.O. were supported by the United Kingdom Research and Innovation (grant EP/S02431X/1, UKRI Centre for Doctoral Training in Biomedical AI).
Open Access
Open access for this article was enabled by the participation of University of Edinburgh in an all-inclusive Read & Publish agreement with Portland Press and the Biochemical Society under a transformative agreement with JISC.
Author Contributions
C.M. researched the literature; C.M. and D.A.O. wrote the manuscript.