Computational pathway design tools often face the challenges of balancing the stoichiometry of co-metabolites and cofactors, and dealing with reaction rule utilization in a single workflow. To this end, we provide an overview of two complementary stoichiometry-based pathway design tools optStoic and novoStoic developed in our group to tackle these challenges. optStoic is designed to determine the stoichiometry of overall conversion first which optimizes a performance criterion (e.g. high carbon/energy efficiency) and ensures a comprehensive search of co-metabolites and cofactors. The procedure then identifies the minimum number of intervening reactions to connect the source and sink metabolites. We also further the pathway design procedure by expanding the search space to include both known and hypothetical reactions, represented by reaction rules, in a new tool termed novoStoic. Reaction rules are derived based on a mixed-integer linear programming (MILP) compatible reaction operator, which allow us to explore natural promiscuous enzymes, engineer candidate enzymes that are not already promiscuous as well as design de novo enzymes. The identified biochemical reaction rules then guide novoStoic to design routes that expand the currently known biotransformation space using a single MILP modeling procedure. We demonstrate the use of the two computational tools in pathway elucidation by designing novel synthetic routes for isobutanol.
Synthetic chemists transform molecules by devising a series of catalytic reactions which involve fine-tuning the buffer, temperature, catalyst, reaction time and pressure. Their approaches have been employed industrially to produce a variety of products . However, product yield in organic multi-step synthesis is limited by factors such as high temperature, high pressure, utilization of toxic heavy metals and difficulty in retaining enantiopurity . In contrast, biosynthesis often naturally occurs under moderate temperature and pressure conditions catalyzed by enzymes with high specificity and selectivity . Driven by the advent of genome sequencing, increased understanding of cellular metabolism, and the development of genetic and enzyme engineering techniques, there are many emerging case studies which employ complete enzymatic routes for the synthesis of biofuels and commodity chemicals [4,5]. In addition, chemical and biological syntheses can act in synergy as shown by the recent successes in employing a semi-synthetic approach to produce small-molecule drugs artemisinin and taxol, wherein their precursors (artemisinic acid  and taxadiene ) are produced using biological routes.
Despite the many successes of applying pathway design tools to aid the design of synthetic routes , it is still challenging to identify the optimal stoichiometry of co-metabolites and cofactors while simultaneously maximizing the carbon flux directed toward the desired products . This motivated us to develop the pathway prospecting algorithm optStoic  that evaluates the problem as a design of an overall balanced stoichiometry beyond a simple connection between a single source and a single target metabolite. In addition, exploring the uncharted chemical space and finding a potential enzyme (from the highly diverse nature) that can act on a new substrate with high efficiency remains a complicated task. Along with similar limitations of ignoring cofactors and alternative co-reactant/co-product combinations, current de novo pathway design tools often have additional combinatorial challenges as they require the generation of an extended metabolic network a priori with a large number of reaction rule combinations. To tackle the cofactor/co-metabolite balance and combinatorial issue, we developed a stoichiometry-based de novo pathway design tool termed novoStoic  that combines the mass/redox balance consideration into a single step and searches both known and putative reactions simultaneously. In this review, we focus on the methodology and application of the two computational pathway prospecting tools, optStoic  and novoStoic . These tools are available at our group's website (https://github.com/maranasgroup) as open source software.
Design overall stoichiometry and intervening reactions of metabolic pathways
It is intuitive to study metabolism as a graph where metabolites and reactions are represented as vertices and edges. Graph-theory algorithms, such as breadth-first search, can thus be applied for pathway design. However, most graph-based pathway design tools  miss cyclic routes that often have high carbon and energy efficiencies (e.g. the non-oxidative glycolysis pathway ). These tools also exclude cofactor usage and do not recycle intermediate metabolites because stoichiometry information is absent from the graph representation of a metabolic network . Stoichiometry-based constraints can be applied in conjunction with a graph-based carbon exchange network as shown in carbon flux paths to restore the stoichiometry balance of pathways . However, such methods require a network pruning step to remove non-carbon transfer edges in the graph, but they may still predict biologically irrelevant pathways due to the incorrect atom mappings . Elementary flux mode (EFM)-based tools [14–18] have been applied to identify all non-decomposable flux modes that satisfy the steady-state and thermodynamics-based reaction irreversibility constraints of a metabolic network. However, the exhaustive enumeration of EFMs is currently limited to small metabolic networks as the process is computationally intensive . Moreover, in many situations, the choice of co-reactants/co-products, ATP yield or requirement, and cofactor possibilities greatly affect the performance of the designed strain . Therefore, the optimal stoichiometry of alternative co-reactants/co-products and combinations of cofactors should be explored first during pathway design to ensure a more comprehensive search for the optimal pathway(s).
optStoic  is designed to determine the stoichiometry of overall conversion with an optimized performance criterion (see Figure 1A). The performance criterion determines the stoichiometric coefficients which are then chosen as variables to maximize the given objective function. For example, in Figure 1A to determine the maximum yield of target product B, we maximize b subject to a fixed primary carbon source a, whereas to maximize co-utilization of other inexpensive carbon sources (e.g. CO2), we maximize c with a fixed b, but to maximize yield of another co-product D, we can maximize d with a fixed b. The stoichiometry variables can be defined as either integer (i.e. overall stoichiometry is scaled up to integers) or continuous variables, so that optStoic is an integer programming problem or a mixed-integer linear programming (MILP) problem. optStoic uses a comprehensive database MetRxn  to provide the list of metabolites and searches for the best combinations of metabolites while simultaneously considering elemental balance, charge balance, and overall Gibbs-free energy change. Next, we use the obtained overall stoichiometry to restrict the uptake rate and secretion rate and then search for the minimum number of intervening reactions connecting the source and sink metabolites. This step is similar to many stoichiometry-based pathway design tools  or EFM-based tools [14–18] (e.g. k-shortest EFMs ), which solve a MILP problem (minRxn ; see Figure 1A) using reactions from the MetRxn  database. Nevertheless, unlike the EFM-based approaches, the intervening reactions are not restricted to a single elementary flux mode as the overall stoichiometry may involve multiple source metabolites and target metabolites. It is noteworthy that minFlux , an approximation approach of minRxn, is often used, as solving any given MILP problem is computationally more expensive than solving the corresponding relaxed linear programming problem. However, minFlux may sometimes identify a pathway with two independent subnetworks, one of which regenerates the cofactors consumed by the main pathway. For example, in Figure 1B, the desired main pathway consumes ATP, while a subnetwork regenerates ATP as identified by minFlux. Thus, the optStoic/minFlux procedure often has to be supplemented with modified loopless constraints  to remove such cycles (Ng, Chowdhury, Maranas, submitted). It is also possible that sometimes no feasible intervening reactions can be selected by minFlux/minRxn using only reactions from a single genome-scale metabolic network (e.g. iAF1260 ) or even a large reaction network (e.g. MetRxn ) to match the overall stoichiometry designed by optStoic. In such a case, one may ‘dial back’ the overall stoichiometry by lowering the target yield and/or allow for an ATP or NAD(P)H less efficient overall stoichiometry. Alternatively, one can proceed by allowing for non-native reactions or de novo reactions enabled by novoStoic.
The schematic workflow of optStoic and modified minFlux/minRxn.
optStoic/minFlux has been successfully applied to design six pathways converting glucose into acetate with 100% carbon efficiency, nine pathways co-utilizing methanol and CO2 to produce 10 C2+ metabolites, and nine pathways co-utilizing CO2 and methane to produce acetate with possible C2+ byproducts . In addition, it has been applied to prospect for over 10 000 pathways that are capable of converting glucose into pyruvate while generating predetermined numbers of ATP to study the energy efficiencies and robustness of Entner–Doudoroff (ED) and Embden-Meyerhof-Parnas (EMP) pathways (Ng, Chowdhury, Maranas, submitted).
Explore uncharted chemical space with the optimization-based de novo pathway design tool
A pathway design tool does not have to be limited to natural enzymes as there remains a large uncharted biochemical (sequence) space. For example, Rand et al.  recently uncovered a previously unknown biodegradation pathway for levulinic acid by unraveling the specific enzymes and their corresponding biotransformations. Similarly, Ju et al.  discovered many naturally produced novel phosphonic acid products by genome mining of over 10 000 actinomycetes. Many enzymes in nature are promiscuous and their alternate enzyme activity has often not been characterized . Advanced de novo protein design  and protein engineering techniques (e.g. site-directed mutagenesis, random mutagenesis and directed evolution) allow for the targeted expansion of the substrate range of existing enzymes . Many de novo pathway design tools (such as BNICE  and RetroPath ) have established the potential to exploit promiscuous enzymes in the construction of novel pathways. The key concept behind de novo pathway design tools is a reaction operator that can generalize multiple reactions with the same mechanisms into a single reaction rule. The most widely used reaction operator is a bond–electron matrix  which encodes non-bonded valence electrons and bond orders. However, existing reaction operators are not compatible with the MILP optimization framework introduced for optStoic. Most of the current de novo pathway design tools are graph-based, supplemented with reaction rules to explore those uncharted biochemical spaces following a retrosynthetic search method . In addition, the combinatorial explosion of reaction rules requires additional steps to control the expanded network size by restricting the number of possible reaction rules such as using prior knowledge of reaction rules (e.g. UM-PPS ) and creating a diameter-based reaction operator (e.g. XTMS/RetroPath [29,31,32]).
Here, we provide an overview of an optimization-based de novo pathway design tool, novoStoic , that simultaneously searches known reactions and reaction rules (generated using a new reaction operator) using a single MILP optimization problem. In addition to the mass balance equations used to generate a ‘reaction network’ (as seen in optStoic ), we used ‘moiety balance’ equations to achieve rule-based transformation. First, we infer the reaction rules from the reaction database based on a reaction operator using the CLCA atom mapping procedure . For example in Figure 2a, at distance 1, we consider the single carbon atom as a moiety, whereas at distance 2, the moiety includes the carbon atom and its immediate neighboring atoms [i.e. moiety ‘[C]([C][O]=[O])’]. Similar to RetroPath , the different distances of a moiety are defined to control the specificity of reaction rules. Next, we define the molecular signature (i.e. a vector representing the count of each unique moiety) of each metabolite from distances one to three (see Figure 2b). Each moiety is uniquely represented by a prime number (e.g. atom feature 3-4-06-0 is encoded by prime number 17). Thus, for the molecular signature of a metabolite at moiety size 1, it is deconstructed to count the number of unique moieties represented by a vector (e.g., in Figure 2b, there are four carbon atoms with atom feature 3-4-06-0 which are represented as the number four in . As shown in Figure 2b, if a prime number is used to represent each moiety, molecular signatures at higher moiety sizes can be generated by the product of prime numbers [e.g. moiety ‘[C]([C][O]=[O])’ is represented by the product of atom C's prime number with itself and its surrounding prime numbers]. The prime product will encode the surrounding information into a single number (e.g. 171 955 in Figure 2b), which is then replaced with a new unique prime number (e.g. prime number 23). Finally, a reaction rule is derived by enumerating the change of moieties between reactants and products (see Figure 2c).
CLCA-based workflow to derive reaction rules encoded by a vector representation of the count of moieties.
As shown in Figure 3a, the de novo steps of the desired pathway will be explored through the rule-based biochemical space, which is represented by the molecular signatures of metabolites and the putative reactions using reaction rules. In addition to the stoichiometric matrix (represented by metabolites and reactions) used in the optStoic/minRxn/minFlux framework, a reaction rule matrix (represented by moieties and reaction rules) is defined to describe hypothetical reactions (see Figure 3b). Therefore, constraints of moiety change balance are defined per moiety to represent the balance between total moiety changes and the changes in both existing reactions and the putative steps. It is noteworthy that moiety representation in existing tools is incompatible with the use of MILP formulations . In contrast, a MILP compatible moiety balance is enabled in novoStoic  using the prime number representation. The MILP approach allows for the use of a performance metric (e.g. carbon yield) to rank order designs and the imposition of constraints, such as negative total Gibbs-free energy change limits the maximum number of reaction rules (see Figure 3c). Instead of iteratively applying reaction operators on all intermediate metabolites, novoStoic  only invokes reaction rules when there is a missing link between existing reactions or when they can lead to a reduction in the pathway length.
The novoStoic workflow for de novo pathway design.
novoStoic  has been applied to synthesize 1,4-butanediol (BDO), which improved upon a previously engineered pathway by controlling cofactor regeneration as well as the number of novel reactions. It has also identified novel biosynthetic routes for valuable pharmaceuticals (e.g. phenylephrine) from cost-effective aromatic precursors using complex pathway and cofactor-balanced designs. It has also been applied to design the oxidative degradation of benzo[a]pyrene, elucidating the biodegradation possibilities for a xenobiotic .
Application: design pathways for isobutanol synthesis
As a case study, we demonstrate the use of optStoic and novoStoic to design the pathways for the production of isobutanol from glucose and xylose. The KEGG  database catalogs a large number of carbohydrate catabolic pathways, but does not contain the reactions/pathways generating isobutanol. Therefore, we first used optStoic/minFlux to design pathways from the carbon sources to 2-ketoisovaleric acid, which is subsequently employed next in novoStoic as a source metabolite to prospect for reactions that can generate isobutanol. 2-Ketoisovaleric acid is a key precursor for the Ehrlich pathway-based isobutanol biosynthesis found natively in Klebsiella pneumoniae  and Saccharomyces cerevisiae , and successfully engineered in Escherichia coli , Clostridium cellulolyticum  and C. thermocellum. It is chosen here to establish a proof of concept for the identification of known pathways and to benchmark the potential of novel pathways even for small molecules using novoStoic.
Pathways produce 2-ketoisovaleric acid from glucose and xylose.
Using optStoic, the overall stoichiometry for production of 2-ketoisovaleric acid from glucose and xylose is as follows:
We imposed a constraint to ensure that the stoichiometric ratio of 2-ketoisovaleric acid to NADPH is 1 : 1. In other words, the reducing equivalent generated in the upstream pathway will be consumed during the conversion of 2-ketoisovaleric acid into isobutanol to achieve an overall pathway redox balance. (The conversion for the downstream pathway is 2-ketoisovaleric acid + NADPH + H+ ↔ isobutanol + NADP+ + CO2.) In Figure 4A, two alternative pathways for glucose-based conversions are shown as candidate pathways predicted by optStoic/minFlux. The first pathway is an ED pathway variant in which the substrate-level phosphorylation is bypassed (i.e. from glyceraldehyde 3-phosphate to glycerate 3-phosphate without ATP generation). The second pathway is the non-phosphorylative ED pathway that can be found in hyperthermophilic archaea . Compared with the EMP pathway, there is a deficit of ATP production, but the ED pathway variants are more thermodynamically favorable with a potentially higher glycolytic flux . A more complicated branched pathway (see Figure 4B) is predicted for converting xylose into 2-ketoisovaleric acid. The pathway bypasses xylose uptake from the pentose phosphate pathway and instead uses the same four reactions as the non-phosphorylative pathway (steps 1–4) engineered by Tai et al.  in E. coli that converts five carbon sugars into BDO. An alternative design (see Figure 4C) converts xylose into 2-ketoisovaleric acid with a requirement of an NADPH investment in contrast with pathways in Figure 4A,B (which generate the reducing equivalent). By employing a carbon fixation step to counter the loss of carbon dioxide by oxaloacetate decarboxylase, carbon conversion efficiency of xylose to 2-ketoisovaleric acid could be increased to 100%. The overall conversion requires an additional reducing power and the stoichiometry is as follows:
The pathway identified bypasses the carbon loss from pyruvate to 2-acetolactate (see Figure 4C).
We next explore the pathways from 2-ketoisovaleric acid to isobutanol using novoStoic. Three alternative pathways are identified and two of them recapitulate the pathways reported in the literature [43–46]. As shown in Figure 5, Rules (R1, R2 and R3) recover the native pathway of C. thermocellum [43,44]. Rules R3 and R4 recapitulate the heterologous pathway that has been introduced into C. thermocellum, E. coli and S. cerevisiae [45,46]. In addition, a new pathway (R1 and R5) is identified by novoStoic using an existing enzyme from C. thermocellum (R1) and a reaction rule (R5) that uses HMG-CoA reductase to convert isobutyryl-CoA directly into isobutanol. HMG-CoA reductase has a complex mechanism involving multiple chemical steps . It is predicted to catalyze the de novo step because the reaction from 3-hydroxy-3-methylgutaryl-CoA to mevalonate undergoes the same moiety changes as the transformation from isobutyryl-CoA to isobutanol. This defines the protein engineering substrate specificity change task to achieve the novel conversion. By combining optStoic and novoStoic results, the overall reactions from glucose and xylose to isobutanol with no supplementation of additional reducing equivalents are
Both of these reactions achieve the theoretical maximum yield  of isobutanol with glucose or xylose as the sole substrate.
optStoic and novoStoic provide the computational means to support the design of new pathways from substrates to products. However, many additional considerations remain. The reaction rules in the novoStoic algorithm  are currently treated in a reversible manner. The reversibility of a reaction rule can only be ascertained based on experimental evidence or computational predictions . A given reaction rule invoked in the design of a pathway may map to many reactions. Conversely, any given reaction can be described by a hierarchy of general to more specific reaction rules. Atom mapping information that is currently not involved in the workflow can be a posteriori applied (as implemented in RouteSearch  and ReTrace ) to compare the bond breaking and forming of the substrates and products with the corresponding predicted reaction rules, and remove degenerate reaction rules. Catalytic promiscuity of enzymes is difficult to predict. Given sufficient data, machine learning techniques can be used to explore the chemical space beyond rule-based systems .
A naturally promiscuous enzyme is usually the first option to catalyze the putative steps. However, on many occasions, protein engineering is required to alter enzyme activity and specificity or design de novo enzymes. Computational protein engineering techniques [26,27,53,54] that have been applied successfully to engineer existing enzymes as well as design de novo enzymes should be combined with pathway design tools to realize the ultimate goal of a completely automated workflow .
Pathways for isobutanol production from 2-ketoisovaleric acid.
The authors acknowledge the insights provided by Anupam Chowdhury that greatly improved the paper.
The authors gratefully acknowledge funding from the National Science Foundation (NSF) (http://www.nsf.gov/) awards no. EEC-0813570 and no. NSF/MCB-1546840, and Department of Energy (DOE) (http://www.energy.gov/) Center for Bioenergy Innovation (CBI). The Center for Bioenergy Innovation is a U.S. Department of Energy Bioenergy Research Center supported by the Office of Biological and Environmental Research in the DOE Office of Science.
The Authors declare that there are no competing interests associated with the manuscript.