There are near-to-infinite combinations of possibilities for evolution to happen within nature, making it yet impossible to predict how it occurs. However, science is now able to understand the mechanisms underpinning the evolution of biological systems and can use this knowledge to experimentally mimic nature. The fundamentals of evolution have been used in vitro to improve enzymes as suitable biocatalysts for applications in a process called ‘Directed Evolution of Enzymes' (DEE). It replicates nature's evolutionary steps of introducing genetic variability into enzymes, selecting the fittest variants and transmitting the genetic information for the next generation. DEE has tailored biocatalysts for applications, expanding the repertoire of enzymatic activities, besides providing experimental evidences to support mechanistic hypotheses of molecular evolution and deepen our understanding about nature. In this mini review, I discuss the basic concepts of DEE, the most used methodologies and current technical advancements, providing examples of applications and perspectives.
Life relies on a circuit of chemical reactions catalysed by enzymes to fulfil all its requirements, from obtaining energy to the reproduction of organisms. Enzymes involved in the primary metabolism, as glycolysis, are highly conserved , whereas all organisms present a secondary metabolism composed of unique enzymes essential to thrive under certain environmental conditions. Some secondary synthetic pathways produce essential metabolites that cover interesting chemical spaces for exploitation , whereas secondary pathways for detoxification can modify and degrade xenobiotics and toxic metabolic by-products [3,4]. In either case, enzymes involved in secondary metabolism have been continuously shaped by nature through the process of molecular evolution to confer a gain of fitness to organisms.
Enzymes are catalysts presenting a remarkable specificity towards their natural substrates, though several present substrate ambiguity  and side activities [6,7]. For molecular evolution to happen, nature needs enzymes flexible to mutate and assume parallel roles while keeping its main function. Thus, organisms present a collection of promiscuous enzymatic activities that can be repurposed into new chemical pathways under different selective pressures. Enzymes presenting advantageous side activities can render redundant in the genome, mutate and diverge its side activity as main into a new secondary pathway (gain of function). The gain of new functions for hydrolases seems to be driven by mutations nearly neutral to their main activity but largely impacting their side activity . Moreover, for the gain of a new function, the selective pressure strength seems similarly important: for TEM-1 β-lactamase, mutations conferring resistance to cefotaxime (gain of function) emerged under low ampicillin pressure (wild-type resistance), whereas high levels of ampicillin were deleterious to raise resistance to cefotaxime .
The finding of new side activities of enzymes offers a great biotechnological opportunity to create chemical toolboxes that synthetic biologists and chemists can use to exploit new chemical spaces and pharmacophores, expanding the applications of enzymes as catalysts. As example, Butelase 1 is an asparaginyl endopeptidase-like enzyme that has lost its main protease activity and evolved as an efficient cyclase to produce head-to-tail cyclic peptides . However, side activities often need improvements in their catalytic efficiency and substrate specificity to fulfil their potential as biocatalysts. To overcome this, scientists combined the increased understanding of evolution, advanced techniques of molecular biology and screening methodologies to mimic naturès molecular evolution and evolve enzymes in vitro for applications. This process called ‘Directed Evolution of Enzymes' (DEE), which awarded Dr. Frances H. Arnold a Nobel Prize in Chemistry in 2018, uses iterative rounds of the introduction of variability into genes, screening variants and genetic tracking of fittest mutants. The diversification of enzymes confers phenotypic plasticity to allow enzymes exploring different fitness landscapes, and evolutionary paths can be funnelled using selective pressures to favour desired phenotypic traits. The use of DEE has enabled scientists to improve enzymes by accelerating slow activities, modifying substrate preferences and changing their robustness to temperature and organic solvents, as well as generating enzymatic activities unusual to biological systems.
In the following sections, I will discuss the most common methods to evolve proteins in vitro, giving recent examples of applications and perspectives for DEE.
Strategies for the introduction of variability
Libraries of enzymes are traditionally assembled using random mutagenesis, DNA shuffling and structure-guided rational design to select specific positions for the introduction of mutagenesis (Figure 1). Machine-learning (ML) guided mutagenesis has also been used to rationally engineer proteins and will be discussed later. The generation of libraries is exponential and needs to be carefully planned, as the construction of a DNA library for a fully randomised protein of 60 amino acids would require all the mass available in the universe. The limit for the transformation of libraries in Escherichia coli is ∼1010 colonies, equivalent to the number of possibilities from a fully randomisation of only 10 positions in a protein, hence the need to constrain and plan the introduction of variability.
Schematic introduction of variability in a gene coding for an enzyme subjected to directed evolution.
The bigger the library of mutants, the higher the chances to find an improved mutant; however, also the higher are costs to screen all variants and deconvolute mutations, as genetic identification of fittest mutants is essential for heredity in DEE. To mitigate the high cost of phenotyping and genotypic deconvolution, DEE can alternatively be performed in multiple rounds, testing smaller libraries and use outputs from initial rounds to inform subsequent iterations and make the screening and the genotypic identification affordable processes. DEE can also combine different methods to introduce variability per round; for instance, the directed evolution of an epoxide hydrolase was performed with the introduction of mutations via random mutagenesis followed by DNA shuffling using the 15 best mutants .
The introduction of diversity using random mutagenesis often uses error-prone PCR (Figure 1A), which runs either at high Mg2+ concentrations or instead in the presence of Mn2+ to decrease the fidelity of DNA polymerases, but also can use low fidelity DNA polymerases (for which commercial Kits are currently available). Alternatively, errors can be introduced by using E. coli strains deficient for the DNA repair pathway, as XL-1 red. The main advantage of random mutagenesis is the exploitation of regions distant from the usual target active site. These mutations might alter the protein dynamics or allostery , which are difficult and unlikely to be predicted through rational design. For DEE, a desired rate of introduction of errors is 1–4 amino acids per protein [10,12]. A recent highly error-prone system called OrthoRep has been developed in yeast using an orthogonal low fidelity DNA polymerase that replicates a cytoplasmic linear plasmid to introduce mutations at high rates (10−3–10−5 bp). By evolving genes in a fast and uninterrupted fashion, OrthoRep raised mutants of malarial DHFR highly resistant to its inhibitor pyrimethamine in 90 independent replicates that uncovered different fitness landscapes, showcasing how random mutations can help DEE and expand our knowledge of molecular evolution .
Rational design selects hotspots to introduce variability based on the structural and mechanistic understanding of catalysis and substrate specificity (Figure 1B). As a rule of thumb, mutations at the conserved catalytic residues should be avoided as they usually lead to deleterious effects. On the other hand, residues adjacent to the catalytic ones can be good targets to improve enzyme activities. Similarly, residues involved in substrate binding can be targeted to modulate the enzyme specificity towards different substrates, as observed for β-glycosidases . This way, when trying to improve the secondary activity of an enzyme, one could target positions to pick up new interactions with the promiscuous substrate. In this sense, the exponentially increasing amount of public data for 3D structures, sequence and biochemical information has supported the use of the rational design in DEE. Usually, saturation mutagenesis is introduced into selected positions, but multiple sequence alignments can further constrain the introduction of diversity to the most accepted amino acids in each position. The ‘Statistical Coupling Analysis' (SCA) is a method to calculate covariations between positions from a multiple sequence alignment that ultimately can point to hotspots (covariant positions in a protein) and evolutionary favoured mutations to produce more focused libraries [15,16]. Also, HotSpot Wizard is an automated webserver that can help designing optimised libraries based on structural and evolutionary features .
Another strategy for the introduction of variability, DNA shuffling recombines homologous genes to create functional chimeras (Figure 1C). Genes presenting high homology (usually over 70%) are randomly digested using DNAse I and their resulting fragments, which can pair due to high homology, are used to create hybrid fragments in primerless reassembly reactions (cycles of denaturation, annealing and extension), followed by a PCR amplification to reconstitute full-length chimeras . A similar method for chimera production recombines homologous genes via PCR (staggered extension process, or StEP) . The recombination approach can greatly increase the chances of success in DEE as chimeric enzymes are assembled from gene fragments containing multiple mutations previously selected by nature. To help increase the chance of producing functional chimeras, a software (SCHEMA) was developed to calculate contact disruptions when homologous domains are shuffled, minimising the structural impact of changing whole domains of proteins [20,21].
Strategies for phenotypic selection of variants
The phenotyping comprises a functional assay to detect the desired enzymatic improvement. In DEE, you get what you look for through the introduction of a selective pressure bias towards a desired phenotype. For instance, to improve the thermal resistance of an enzyme, the phenotyping step will necessarily challenge the protein thermostability using an incubation at high temperatures, followed by the functional assay to detect the fittest variants (thermoresistant). Phenotyping is usually the bottleneck of DEE and requires the development of robust methods to test all variants, avoid false positives and select truly improved mutants. The enzymatic phenotyping can be performed through biochemical screens using multiwells plates (variants individually tested); cell-based screens, either agar plate-based assays or rescue selections (in which the survival of the expression system is conditioned to the sought after phenotype); in vitro compartmentalisation, which tests pooled variants of the library in a high throughput fashion; and cell-surface displays, which express enzymes at the outer cell membrane and, by trapping a fluorescent product at the cell surface, allow sorting of fittest mutants (Figure 2). Other ultra-high throughput strategies can use the polymerisation of a fluorescent hydrogel matrix surrounding the cell surface due to product formation and use of genetic circuits reliant on the enzyme activity to modulate the expression of fluorescent proteins. For more details about these, I here kindly refer to an excellent recent review by Markel et al. .
Methods of triaging mutants from libraries.
Enzymatic assays can be straightforward to identify improvements in an enzyme, and with the aid of automation, it is a relatively high throughput method to profile variants (∼104 per day using microtiter plates) . As biochemical screens test individual variants per well, the genetic deconvolution of beneficial mutations is direct (Figure 2A).
Cell-based assays can be performed as cell rescue selections and plate-based screens (Figure 2B). Plate-based assays are suitable to screen ∼105 mutants and test colonies by stamping a filter paper onto the agar-plate, which is incubated with a substrate that allows visualisation of the enzymatic product . Alternatively, substrates able to penetrate colonies are dispensed onto the plate and, upon enzymatic reaction, colonies expressing improved enzymes are identifiable . Cell rescue selections usually conditions the desired enzyme phenotype to the survival of cells, making the genotyping of mutants easier. The throughput bottleneck for cell-survival selections is the transformation step and therefore can be used to test larger libraries. As example, a library of over 3 million recombinant cyclic peptides was screened against a two-hybrid system to search for inhibitors of HIF-1 heterodimerisation, which is involved in hypoxia signalling in cancer cells . Cyclic peptides disrupting heterodimerisation rescue cells, and cyclo-CLLFVY was found as a potent inhibitor, demonstrating the potential of cyclic peptides in drug discovery. Survival assays are often linked to antibiotic resistance, exemplified by the selection of TEM-1 β-lactamase variants active on cefotaxime . Similarly, a split intein, a protein capable of protein splicing to link two polypeptides or cyclise linear ones, was evolved by coupling its splicing activity to the ability to rescue the functional kanamycin-resistance enzyme aminoglycoside phosphotransferase. Rationally selected three positions were fully randomised, raising inteins with increased substrate promiscuity .
The use of in vitro compartmentalisation allows the development of ultra-high throughput selection platforms to circumvent the often-prohibitive cost of screening larger libraries or the need to couple enzyme activity to cell survival. Small droplets of oil-in-water emulsions containing enzymatic substrates can encapsulate individual clones; as products of enzymatic reaction and the gene coding for the enzyme are contained inside the same droplet, phenotype and genotype are directly linked (Figure 2C). Compartmentalised self-replicating (CRS) is an example of selection platform in which variants of DNA polymerase replicate their own plasmid inside droplets , and plasmids act as both substrate and product of the reaction. Faster variants of DNA Polymerase will produce more of its own plasmids, serving as substrate for further reactions and resulting in higher accumulation of plasmids coding for faster mutants. Microfluidics combined with activated sorting enable ultra-high throughput triaging (up to 106 clones per hour) to cherry pick individual variants (droplets) based on the detection of fluorescent  or colorimetric  products.
Cell-surface Display exposes a recombinant protein to the outer surface of cells, also linking the mutant genotype to its phenotype. The use of cell-surface displays for DEE is conditioned to trapping a fluorescent product either intracellularly or onto the surface of the cell expressing the enzyme variants for further cell sorting (Figure 2C). As example, a cell-surface display was used to evolve a human β-glucuronidase in mammalian cells . Upon the enzymatic reaction at the membrane, the fluorescent product ELF-97 accumulated inside the cell and enabled Fluorescent-activated cell sorting (FACS). The same principles can be applied to E. coli  and yeast cells  as expression systems. Cell-displays could also be combined to in vitro compartmentalisation to generate high throughput selection platforms in which droplets contain cell-displayed enzyme and the enzyme reactants.
The last example of phenotyping strategy is a recent selection platform (VEGAS) that uses the RNA virus Sindbis to express proteins in mammalian cells. VEGAS works on continuous evolution cycles without the need of handling between steps , allowing the simultaneous introduction of variability and selection of fittest mutants. By applying different selective pressures to this highly mutagenic system, VEGAS was able to evolve GPCRs that constitutively activate a signalling pathway, nanobodies able to activate GPCRs and a tetracycline transactivator, with the full evolution of this performed as fast as 7 days.
DEE towards applications
Enzymes are used as affordable and green catalysts in a wide range of applications , as they offer a great combination of efficiency, substrate specificity and stereoselectivity of generated products. However, while enzymes normally work under mild conditions of temperature, pH and organic solvent content, some processes require biocatalysts able to endure more stressful and extreme conditions. For instance, industries that require biocatalysts resistant to high temperatures (such as food, biofuels, paper production and laundry) have extensively exploited enzymes from thermophilic organisms . Raising thermal stability also benefits the process of DEE: as mutations are often destabilising for enzymes, thermostable variants offer a good start point to allow further tradeoffs during the engineering of enzyme activity . The understanding of structural features of thermophilic enzymes has supported DEE to improve enzyme thermoresistance , activity at cold temperatures [38,39] and tolerance to higher organic solvent contents [40–42]. Two main structural features of thermoresistant enzymes (increased core packing and burial of hydrophobic surface area) were used to develop an objective computer algorithm which predicted beneficial mutations in the Streptococcal protein G β1 domain (Gβ1). The resulting mutant is a hyperthermostable protein with melting temperature increased from 83°C to higher than 100°C . Similarly, an automated ML protocol (RosettaVIP) predicted mutations to improve the core packing, which increased the stability of two underpacked proteins (λ repressor and Protein L) and the thermostability of a methionine aminopeptidase . Higher rigidity is another feature of thermoresistant proteins used by a strategy called B-FIT to increase the thermal resistance of mesophilic enzymes. B-FIT targeted residues presenting high B-factors in an epoxide hydrolase to improve its thermostability and the half-life at 60°C in 80-fold . Interestingly, the raised thermoresistant mutant has evolved from neutral or slightly inferior parental variants, as observed for the gain of function in TEM-1 β-lactamase as well . This supports that evolution from nearly neutral variants allows the exploitation of different fitness landscapes both for gain of function and thermoresistance. Also, it suggests that iterations from improved mutants might lead to sub-optimal traps in the fitness landscape. This phenomenon has been described as diminished returns and tradeoffs, in which variants distant from their optimum landscape (i.e. nearly neutral mutations) offer more opportunities for improvements .
Increased thermoresistance has also been linked to increased robustness to organic solvents. The mutagenesis of a lipase from Bacillus subtilis targeted 10 positions identified by B-FIT, resulting in a quintuple mutant with higher thermostability  and robustness to organic solvents . These properties raised as a result of a cooperative network of H-bonds and salt bridges at the protein surface that increased the protein rigidity . In another example, an haloalkane dehalogenase had its stability to 40% DMSO improved from minutes to weeks by the introduction of mutations in four positions, which alongside improved enzyme thermoresistance in 19°C due to better hydrophobic core packing . The increased robustness of haloalkane dehalogenases to organic solvents enables their applications as catalyst in processes requiring high cosolvents to solubilise substrates, such as bioproduct recycling, bioremediation and decontamination .
The development of detergents is an example of an application that, besides using DEE to raise thermoresistant enzymes, has also engineered: cold-active proteases to work at temperatures below 15°C [38,50]; psychrophilic lipases (usually unstable at temperatures higher than 20°C) for improved thermoresistance ; and alpha-amylases more active at higher pHs .
DEE has also enabled the speeding up of slow side activities and the discovery of new activities not previously described for natural enzymes. In the first example, DNA polymerases were evolved to produce genetic material from synthetic nucleotides (XNA) . The evolved XNA polymerases demonstrated that heredity and evolution, two main features of living organisms, are not restricted to DNA and RNA but applied to any system able to store and retrieve genetic information. As an application, XNA polymers can be used as a genetic firewall in xenobiology for the creation of harmless genetic organisms, as synthesis of their XNA genetic material would require supplementation with nucleotides not found in nature. A second example is the evolution of Cytochrome C for catalysing the formation of carbon–silica bonds, an unknown enzymatic activity until then. The wild-type enzyme from a thermohalofilic bacterium presented great enantioselectivity (>97%) but slow activity (total turnover of 4) to produce organosilicon products. Rational introduction of mutations near the heme group resulted in a triple mutant 15-fold faster than the benchmark synthetic catalysts (>1500 turnovers) for the production of organosilicon, with broad acceptance of silicon reagents as substrates and excellent chemoselectivity of products (>99%), even in the presence of possible competing carbene-transfer groups . Lastly, a slow cocaine-degrading butyrylcholinesterase was computationally designed to improve the interactions between the residues in the oxyanion hole and the substrate cocaine, and DEE resulted in a mutant (E30-6) with similar hydrolytic efficiency for cocaine and acetylcholine (natural substrate) .
Perspectives: use of machine learning for DEE
ML encompasses a set of powerful computational methods to predict a system`s behaviour based on previous data. ML has been applied to understand and engineer protein function and to model predictions that map structural modifications to phenotypes. Two main ML approaches have been applied to predict improvements in enzyme stability upon mutations: neural networks, exemplified by tools PoPMusic [55,56] and NeEMO ; and Gaussian processes, exemplified by mGPfusion . In addition to predicting protein stability, ML has been applied to a Gaussian Process for estimation of the enzyme-substrate affinity . More in-depth ML approaches for DEE can be found in an excellent work by Frances Arnold's group .
Ideally, ML filters the library size (in silico design) to optimise experimental efforts, but also allows greater sampling of sequence diversity to avoid sub-optimal fitness traps. UniRep is a recurrent neural network able to predict different features of a protein from its sequence only . UniRep has been trained using millions of amino acid sequences to initially learn the statistical representation of proteins. Without the introduction of additional enzyme information, such as structure, stability or evolutionary information, UniRep could predict stability for natural and de novo designed proteins, and phenotypic effects of single mutations with more accuracy than ML tools trained using previous information. As a result, UniRep can dramatically reduce the size of the designed library, decreasing the costs of synthesis in DEE programs. Also, as UniRep training is not constrained by phenotypic features, it allows the exploitation of distant landscapes to find variants that otherwise would be inaccessible via traditional DEE.
DEE is a powerful tool to engineer enzymes for applications and to help expand our understanding of molecular evolution. The recent development of new tools and methods for DEE allows new possibilities to use DEE and decreases the costs to engineer new catalysts. Therefore, we could expect an exponential increase of its application in the next years.
DEE is an in vitro process that mimics nature's evolution to engineer improved enzymes towards applications.
This process comprises rounds of genetic diversification and phenotypic selection that also help increasing our understanding of molecular evolution.
New technical achievements enable to improve the design of libraries and phenotypic triaging of mutants, two critical steps in directed evolution.
Recent ML methods increase the possibilities to exploit new landscapes inaccessible via experimental directed evolution.
The author declares that there are no competing interests associated with this manuscript.
F.K.T. wrote the manuscript.
F.K.T. is supported by the ‘Bill & Melinda Gates Foundation' — Grant OPP1191579 — ‘Lead Optimization Research identifying novel agents for Diseases of the Developing World (LEOPARD)’ and the ‘Wellcome Centre for Anti-Infectives Research' (WCAIR — University of Dundee).
Thanks to Dr. Pedro Augusto Galvao Tizei and Dr. Gabriela Mol Avelar for the invaluable discussions during the preparation of this review.