A central goal in molecular evolution is to understand the ways in which genes and proteins evolve in response to changing environments. In the absence of intact DNA from fossils, ancestral sequence reconstruction (ASR) can be used to infer the evolutionary precursors of extant proteins. To date, ancestral proteins belonging to eubacteria, archaea, yeast and vertebrates have been inferred that have been hypothesized to date from between several million to over 3 billion years ago. ASR has yielded insights into the early history of life on Earth and the evolution of proteins and macromolecular complexes. Recently, however, ASR has developed from a tool for testing hypotheses about protein evolution to a useful means for designing novel proteins. The strength of this approach lies in the ability to infer ancestral sequences encoding proteins that have desirable properties compared with contemporary forms, particularly thermostability and broad substrate range, making them good starting points for laboratory evolution. Developments in technologies for DNA sequencing and synthesis and computational phylogenetic analysis have led to an escalation in the number of ancient proteins resurrected in the last decade and greatly facilitated the use of ASR in the burgeoning field of synthetic biology. However, the primary challenge of ASR remains in accurately inferring ancestral states, despite the uncertainty arising from evolutionary models, incomplete sequences and limited phylogenetic trees. This review will focus, firstly, on the use of ASR to uncover links between sequence and phenotype and, secondly, on the practical application of ASR in protein engineering.

Introduction

Proteins, along with the genes that encode them, are the product of a continuous process of mutation and natural selection. Understanding how and why proteins have evolved to work as they do necessitates an investigation of their structure and evolutionary history. While the seminal work on protein crystallography [1] and the idea of resurrecting extinct ancient proteins [2] emerged around the same time in the mid-20th century, decades passed before the first synthesis of an ancestral protein was undertaken [3].

The power of this approach, termed ancestral sequence reconstruction (ASR), lies in the ability to test evolutionary hypotheses by inferring the sequence of ancient proteins based on their extant descendants, then experimentally resurrecting the predicted ancestors and analysing them in the laboratory. To date, more than 50 biomolecules have been resurrected in the laboratory and tested either biochemically or biophysically. This approach has been most heavily applied to proteins, in particular to study the functional divergence of a protein family [49]. However, ASR has also been implemented on other ancient biomolecules such as transposable elements, transcription factors, regulatory sequences, RNA, viruses and entire genomes [1014]. In this review, we will focus on ancestral protein reconstruction and provide a practical perspective on the use of this approach for protein engineering.

Approaches used for ASR

The resurrection of ancient proteins requires a set of related extant sequences and a phylogenetic tree that shows their evolutionary relationships. ASR has become ever more feasible in recent years due to several factors: the explosion in genome sequence data; the proliferation of bioinformatics tools for the alignment of homologous extant sequences [15] and the determination of the phylogeny that describes their evolutionary relationships; and the wide commercial availability of custom gene synthesis services [16]. Once an ancestral protein is inferred, a DNA molecule encoding it can now be readily designed and synthesized, allowing it to be expressed and characterized experimentally.

Most of the ancestral sequences studied to date have been inferred using computational methods originally developed for phylogenetic analysis. The maximum parsimony (MP) method [17] was used in initial reconstruction studies, due to its ease of implementation. The MP method assumes that a phylogenetic tree with minimum number of substitutions should be the most likely. However, MP fails to take into account either biased amino acid substitution patterns or long branch attraction (i.e., the erroneous grouping of two or more long branches as sister groups). In this situation, rapidly evolving genes and unequal rates of molecular evolution between different lineages can potentially lead to the confusion of homoplasy (i.e. the sharing of characteristics in the descendants that were not present in the common ancestor) for homology. This problem can be minimized using methods that incorporate differential rates of substitution among lineages.

For these reasons, the probabilistic-based approaches, maximum likelihood (ML) [18,19] and Bayesian reconstruction [20], have been used in more recent ASR studies. ML and Bayesian methods give more reliable results than the MP method [21] and estimate the confidence in each inferred ancestral state, commonly indicated by the posterior probability of an amino acid at each node of a phylogenetic tree.

There are two types of ML reconstruction methods, namely marginal and joint [18]. Marginal reconstruction compares the probabilities of different character states at an internal node at a given site and finds the amino acid that yields the ML for the tree at that site. Marginal reconstruction assigns a character state to a single node, whereas joint reconstruction assigns the most likely set of character states to all ancestral nodes. Marginal reconstruction is more suitable when one wants the sequence at a particular node (e.g. to synthesize the hypothetical nuclear receptor of the ancestor of Metazoa), whereas joint reconstruction is more suitable for counting changes at each site (e.g. the number of threonine to methionine replacements over the entire tree) [18]. The result of joint and marginal reconstruction may differ; the marginal reconstruction, while it can be highly efficient, does not necessarily give the globally optimal ancestral state and can only be considered as an approximation to the joint reconstruction [22]. In practice, the marginal approach is most often employed in ancestral protein reconstruction experiments. Both ML and empirical Bayesian methods assume that the phylogenetic tree and evolutionary model are known without error, but these assumptions are often not true, especially for highly divergent proteins. To account for this uncertainty, the hierarchical Bayesian approach calculates the probabilities of various ancestral states and averages the probabilities over all possible trees and models of evolution, in proportion to how likely these trees and models are in the observed data [23]. However, Bayesian integration, has not substantially improved the accuracy of the inferred ancestral sequences [24] compared with ML in studies on simulated data sets, and is more challenging to implement. Very recently, an experimental phylogeny based on fluorescent proteins was used to show that Bayesian methods outperformed MP in reflecting more accurately the phenotype of ancestral nodes; however, ML methods were not examined [25].

The process of ancestral sequence inference

The first step of any reconstruction study is the generation of a multiple sequence alignment (MSA) of extant sequences (Figure 1, [26]). Homologous sequences are retrieved from databases such as NCBI or UniProt, using the basic local alignment search tool (BLAST) and aligned using MSA algorithms (reviewed in [15]) such as ClustalW [27], MAFFT [28], MUSCLE [29], T-Coffee [30] or PRANK (used for sequences with insertions [31]). Extant DNA or protein sequences can be used in ancestral reconstruction, depending on which models are applied for the inferences: nucleotide-, amino acid- or codon-based models. Most ASR studies have used protein sequences, however, using nucleotide sequences is less reliable when aligning regions with many insertions and deletions (indels), and does not take into account the redundancy of the genetic code and its implications for conservation of function versus nucleotide sequence. For both DNA and protein sequences, the collected sequences should represent a diverse set of homologues, generally from different evolutionary lineages or domains of life. A phylogenetic relationship is then built by means of a distance-based ML or Bayesian approach. The quality of the MSA and phylogenetic tree is crucial for the accuracy of reconstruction, thus often only the full-length sequences or sequences that contain critical residues such as active sites are included in the alignment [32]. Any implausible sequences (e.g. sequences that do not appear to represent a true homologue) or sequences that contain many insertions or deletions (indicative of pseudogenes) may need to be culled. Often the alignment must be fine-tuned manually to improve its reliability at gap positions [15]. The conservation of the absence or presence of gaps within a particular taxonomic group can be used as guidance in manual gap handling. Using the pruned alignment, a phylogenetic tree (gene tree) is built to determine the evolutionary relationships between sequences and compared with an accepted evolutionary tree (species tree) for the organisms represented in the sequence collection. If discrepancies are found, the MSA may require adjustment or the initial collection of sequences may need to be reassessed (e.g. by sampling from a greater number of species [33], investigating sequences from limited representative taxa or more stringent exclusion of implausible sequences). Masking of highly divergent regions of the alignment during generation of the tree may also improve the accuracy of the tree. However, there are many factors that lead to discordance between gene trees and species trees that cannot always be controlled [3436].

The process of ASR and resurrection.

Figure 1.
The process of ASR and resurrection.

The inference of ancestral sequences from extant homologues involves many stages, many of which involve manual, discretionary, optimization steps [15,26]. Sequences collected from public databases must first be reviewed and any unreliable sequences removed. The selected sequences are then aligned and used to build a phylogenetic tree. Manual fine-tuning of the alignment may be necessary and the evolutionary relationships indicated by the tree should conform to the accepted phylogeny for the organisms in question. Ancestral sequences are then inferred for each internal node of the tree based on the optimized MSA and phylogenetic tree. Depending on the evolutionary relatedness of the sequences and the gap-handling accuracy of the algorithm used for inference, decisions may need to be made as to whether to retain or remove sequence insertions in the ancestor. A DNA sequence encoding the ancestral protein is then synthesized and cloned into an expression plasmid, enabling heterologous expression and characterization of the ancestral protein. For each stage, many bioinformatics tools are publicly available, as indicated on the right.

Figure 1.
The process of ASR and resurrection.

The inference of ancestral sequences from extant homologues involves many stages, many of which involve manual, discretionary, optimization steps [15,26]. Sequences collected from public databases must first be reviewed and any unreliable sequences removed. The selected sequences are then aligned and used to build a phylogenetic tree. Manual fine-tuning of the alignment may be necessary and the evolutionary relationships indicated by the tree should conform to the accepted phylogeny for the organisms in question. Ancestral sequences are then inferred for each internal node of the tree based on the optimized MSA and phylogenetic tree. Depending on the evolutionary relatedness of the sequences and the gap-handling accuracy of the algorithm used for inference, decisions may need to be made as to whether to retain or remove sequence insertions in the ancestor. A DNA sequence encoding the ancestral protein is then synthesized and cloned into an expression plasmid, enabling heterologous expression and characterization of the ancestral protein. For each stage, many bioinformatics tools are publicly available, as indicated on the right.

Recently, an elegant method to overcome uncertainty in ancestral sequence prediction arising from ambiguity in tree topology (e.g. for cases in which the gene tree is not in accordance with the tree of life) has been developed. Reconciling the tree obtained from the gene of interest with a species tree, the phylogenetic tree that best describes the evolutionary relationship among various biological species derived from phylogenetic analysis of multiple alternative genes, resulted in a biochemically more realistic and kinetically more stable ancestral protein [37].

The other source of ambiguity comes from the evolutionary models used in the inferences. The most widely used models are the Dayhoff [38], JTT [39], WAG [40] and LG [41] substitution matrices. Allowing rate variation among sites, in the form of a discrete γ-distribution, provides a substantial improvement in fits of models of protein evolution [42]. The discrete-γ model has since been used to improve the accuracy of phylogenetic analysis.

Using the MSA and phylogenetic tree as inputs, software packages such as PAUP (Phylogenetic Analysis Using Parsimony; [43,44]), PAML (phylogenetic analysis by maximum likelihood) [45], MrBayes [46], ANCESCON [47] or FastML [48] can then be used to find the most probable sequence at interior nodes in the phylogenetic tree. Table 1 provides a comparison of the available software packages, their respective features and examples of their usage. While many options are now available for inference of ancestral states, a general limitation of most is in the interpretation of gaps and insertions, often necessitating potentially subjective interpretations to be made by the investigator as to the relative likelihood of alternative explanations for the evolution of specific sequence features [15]. Moreover, each software package has an effective upper limit to the number of sequences that can be used for the reconstruction, constrained by the available computational power. For the more user-friendly algorithms such as FastML, this may be as few as ∼250–300 average-length sequences, which restricts the information that can be exploited for the inference.

Table 1
Comparison of tools for ancestral sequence inference
Software Evolutionary models Inference method Features of note Exemplar ASR studies 
PAUP1 DNA, protein MP, ML and distance methods are all incorporated in version 4.0 Various options for phylogenetic methods [5
PAML1 DNA, protein ML Likelihood ratio tests, estimation of synonymous and non-synonymous rates, detection of positive Darwinian selection, estimation of species divergence times under molecular clock models. [9,5052,54,55,59,67,100,101,104,118122
MrBayes1 DNA, codon, protein Empirical and hierarchical Bayesian Incorporates uncertainty (e.g. topology, model parameters), abundance of evolutionary models, estimation of positively selected sites [8,68,123
ANCESCON1 Protein Distance-based Considers rate variation among sites [124
Lazarus1 DNA, protein Empirical Bayesian, ML Uses PAML, runs parallelized PAML jobs (one for each of a set of user-specified topologies) [105,125
FastML1 DNA, codon, protein ML User-friendly web-server, reconstructs ancestral indel states [106
Phylobot DNA, protein Empirical Bayesian, ML User-friendly web-server, automated pipeline, uses PAML and Lazarus for inferences [105,125,126
GASP Protein ML Can infer ancestral sequences from gap-containing alignments although not as accurately as some of the more sophisticated ML approaches [54,97,140
Software Evolutionary models Inference method Features of note Exemplar ASR studies 
PAUP1 DNA, protein MP, ML and distance methods are all incorporated in version 4.0 Various options for phylogenetic methods [5
PAML1 DNA, protein ML Likelihood ratio tests, estimation of synonymous and non-synonymous rates, detection of positive Darwinian selection, estimation of species divergence times under molecular clock models. [9,5052,54,55,59,67,100,101,104,118122
MrBayes1 DNA, codon, protein Empirical and hierarchical Bayesian Incorporates uncertainty (e.g. topology, model parameters), abundance of evolutionary models, estimation of positively selected sites [8,68,123
ANCESCON1 Protein Distance-based Considers rate variation among sites [124
Lazarus1 DNA, protein Empirical Bayesian, ML Uses PAML, runs parallelized PAML jobs (one for each of a set of user-specified topologies) [105,125
FastML1 DNA, codon, protein ML User-friendly web-server, reconstructs ancestral indel states [106
Phylobot DNA, protein Empirical Bayesian, ML User-friendly web-server, automated pipeline, uses PAML and Lazarus for inferences [105,125,126
GASP Protein ML Can infer ancestral sequences from gap-containing alignments although not as accurately as some of the more sophisticated ML approaches [54,97,140

1[127], 2[45], 3[20], 4[47], 5[24], 6[48], 7[128], 8[129]. For a more comprehensive review on software used for ancestral reconstruction see ref. [130].

Resurrection in the laboratory

Once an ancestral protein sequence is inferred, it must be back-translated to derive a DNA sequence that can be used to express the recombinant ancestral protein. Most initial ASR studies used site-directed mutagenesis of a cloned extant gene to obtain an open reading frame (ORF) encoding an ancestral protein [3,49]. However, this approach is limited to more recent ancestors, where the evolutionary distance between the extant and ancestral forms is relatively short. However, as older proteins have been resurrected in more recent studies, gene synthesis or assembly of gene fragments has become the method of choice [5052].

Due to cost and time constraints on the characterization of ancestral proteins in the laboratory, the sequence of the single, most probable ancestor is typically synthesized for each node to be studied. The most probable ancestor is obtained by assigning to each position in the alignment the amino acid inferred with the highest posterior probability for that position. However, it is often difficult to conclusively assign an amino acid at every position in a protein; the inference is typically less definitive at some positions in the alignment than others. Therefore, to resurrect variants at ambiguous sites (e.g. sites at which a similar probability is obtained for two or more ancestral states in a marginal reconstruction), a conservative approach can be taken; an ancestor can be synthesized first which encodes the most probable amino acids at each position and then site-directed mutagenesis can be used to introduce alternative ancestral states at ambiguous positions. Another way of introducing variability is by limited saturation mutagenesis, using oligonucleotides that are degenerate at ambiguous sites in the gene synthesis steps or by synthesizing oligonucleotides encoding different possible ancestral states and mixing them at the frequencies that reflect the posterior distribution [53]. If there is no significant difference in the biochemical properties (e.g. ligand- or substrate-binding characteristics, enzyme specificity, turnover rate, thermal stability) between ancestral sequences sampled from posterior distribution and the most likely sequence, then it is reasonable to assume that conclusions drawn about the ancestral sequences are robust to slight changes in sequence. Indeed, it may be more reasonable to consider a population of possible ancestors than a single ancestral state as is calculated by Bayesian posterior probability of all possible states.

The DNA sequences that encode the ancestral proteins can be codon optimized in the gene synthesis step to increase their expression in the chosen host organism (e.g. Escherichia coli). However, designing a gene for optimal expression, in particular eukaryotic proteins expressed in bacteria, may require additional sequence modification, particularly in the N-terminal region of the enzyme. Moreover, the N- and C-termini of homologous proteins often show a high degree of divergence relative to the rest of the sequence. Substituting the N-terminus of the inferred ancestral proteins with sequences known to increase the expression of cognate modern proteins can be a practical solution for cases where the extant sequences have poor identity in those regions. In these instances, care must be taken in the interpretation of the ‘ancestral’ properties of such hybrid sequences.

Since 1990, resurrections and functional analyses have been carried out on putative ancestral proteins belonging to eubacteria, archaea, animals, plants and fungi. Resurrection of ancestral proteins has provided direct insights into how natural selection has shaped proteins found in nature and elucidated evolutionary processes behind their stability, specificity and structure [52,5457]. It has also provided insights into defining the environments in which the earliest forms of bacterial life lived [50]. Figure 2 depicts the biomolecules that have been resurrected in the past 25 years relative to their evolutionary age. Particularly notable amongst these are the studies discussed below which illustrate how ASR has advanced the understanding of protein evolution in specific gene families.

Biomolecules that have been resurrected to date.

Figure 2.
Biomolecules that have been resurrected to date.

Ancestral proteins and other biomolecules have been resurrected from three main eras, the Archaean (4000–2500 Mya), Proterozoic (2500–550 Mya) and Cambrian, which can be further divided into Paleozoic (550–250 Mya), Mesozoic (250–65 Mya) and Cenozoic (0–65 Mya). The age estimates of the resurrected molecules are derived either from the relevant literature or from molecular clocks of evolution [137139]. Ancestral proteins that have been crystallized are shown in italics with the corresponding structures.

Figure 2.
Biomolecules that have been resurrected to date.

Ancestral proteins and other biomolecules have been resurrected from three main eras, the Archaean (4000–2500 Mya), Proterozoic (2500–550 Mya) and Cambrian, which can be further divided into Paleozoic (550–250 Mya), Mesozoic (250–65 Mya) and Cenozoic (0–65 Mya). The age estimates of the resurrected molecules are derived either from the relevant literature or from molecular clocks of evolution [137139]. Ancestral proteins that have been crystallized are shown in italics with the corresponding structures.

ASR as a tool to probe the evolutionary history of a protein family

The first resurrection of an ancestral protein involved full-length ancestral ruminant RNAses, revealing the history of the emergence of the digestive function of ribonucleases from a non-digestive precursor. RNases resurrected from organisms more ancient than the last common ancestor of the true ruminant did not behave like digestive enzymes (having significantly lower catalytic activity against single-stranded RNA or short RNA fragments than extant forms), yet displayed an increased ability to bind and melt double-stranded DNA [3]. Subsequent ancestral resurrections of RNA paralogues revealed the origins of the diverse physiological functions of RNases. Zhang and Rosenberg showed that the ancestor of eosinophil-derived neurotoxin (or RNase 2) and eosinophil cationic protein (or RNase 3) had weak antiviral activity and only two substitutions were required to obtain the huge enhancement of the RNAse activity in the descendants [58]. This suggests that the extant digestive RNAses have evolved from an ancestor that performed different biological functions and that all these activities are retained to a greater or lesser extent in the diverse extant RNAse paralogues that resulted from the expansion that occurred during early mammalian evolution.

Another of the first seminal ASR studies concerned steroid receptors [7], which were thought to have evolved in the vertebrate lineage (being apparently absent from the invertebrate lineage). However, after the isolation of an oestrogen receptor orthologue from the mollusc Aplysia californica, and the characterization of the ancestral protein from which all extant steroid receptors evolved, it was hypothesized that this protein family already existed in the ancestor of the protostomes and the deuterostomes and was lost in the Ecdysozoan lineage [7]. The DNA-binding domain of the resurrected ancestral steroid receptor increased transcription from an oestrogen response element, whereas the ligand-binding domain of the ancestral receptor activated transcription in the presence of oestrogens, supporting the prediction that the ancestor would function like an oestrogen receptor [7]. Resurrection of a more recent ancestor of this protein family, the ancestral corticosteroid receptor, revealed the evolution of hormone selectivity in the glucocorticoid and mineralocorticoid receptors [5961].

Another good example of using ASR for understanding the factors underpinning the diversity found in a protein family is the study of fluorescent, GFP-like proteins from corals [8]. Resurrecting ancestral genes encoding the common ancestor of all extant pigment colours and the intermediate nodes demonstrated elegantly how the red colour evolved from a green ancestor through a stepwise adaptation [8].

ASR as a tool for testing hypotheses about protein structure–function relationships

ASR has been used to examine various aspects of protein structure and function. Traditional site-directed approaches to testing hypotheses about residues responsible for particular properties of proteins are confounded by the fact that the change is usually introduced into a different background from that in which it evolved originally [62,63]. Therefore, the results that are obtained and the conclusions that can be drawn from them are complicated by the possibility of epistatic interactions. Epistasis is the phenomenon whereby the effect of a substitution at a given site is altered by the nature of a residue at a second position, i.e. the effect of a given mutation on protein function depends on the background protein sequence into which the mutation is introduced [62]. Put differently, where epistasis occurs, the effect of two substitutions is quantitatively or qualitatively different from the simple addition of their effects when introduced separately. Positive epistatic interactions are thought to arise during evolution where an initial mutation allows a subsequent change to be tolerated. For example, an initial mutation may increase the stability of a protein such that a subsequent destabilizing mutation can be introduced. If epistasis exists, mutations that are responsible for desirable characteristics in one branch of an evolutionary tree may not show the same effect when introduced into members of a different branch of the tree since they lack the neutral permissive mutations (i.e. substitutions of no immediate functional consequence but which are required to buffer the protein against destabilizing changes) or since the mutations introduced disrupt the protein due to unfavourable steric or other physico-chemical interactions (restrictive mutations) [63].

Nature has navigated efficiently through the epistatic protein space, producing functional proteins at each node on a phylogeny, despite large numbers of accumulated mutations. Thus, ancestral proteins can be used as a suitable background for studying the effects of mutations on functional diversification in the context in which they originally arose, and prior to the introduction of any subsequent, confounding epistatic interactions. While it is never possible to be certain that the inferred ancestor represents the historically accurate evolutionary intermediate, it is nevertheless likely to be closer to that antecedent than any contemporary form.

Using ancestral protein resurrection, the Thornton group [60,64] demonstrated that two permissive and five restrictive mutations played important roles in the loss of aldosterone sensitivity in the modern glucocorticoid receptors. Zhang and Rosenberg [58] revealed how neutral substitutions, Arg to Ser at site 64 and Thr to Arg at site 132, may play constructive roles in the subsequent evolution of ribonuclease genes of higher primates. By introducing five conserved amino acids that were different in red and green vertebrate opsins into the ancestral background, Yokoyama et al. [65] successfully recapitulated the shift in the opsin absorbance spectrum from red to green, whereas previous mutagenesis studies using modern proteins had resulted in contradictory results concerning the functional importance of key mutations. When deconvoluting each of these five mutations (in single, double or triple mutants), 27% of the λmax shift from red to green was found to be attributable to epistatic interactions rather than the direct effect of individual mutations.

Structural characterization of ancient proteins

To date, few ancestral protein structures have been characterized. Initial studies on lysozymes of modern game birds in 1990 resulted in seven structures of ancestral intermediate variants [49]. However, as these ancestral variants only differed from the contemporary ones in three positions, no significant structural changes were observed among them, except for a correlation between the side chain volume of the triplet and the thermal stability of the protein [49]. The first truly ancient domain or full-length ancestral protein structures to be resolved were the ancestors of the ligand-binding domain of a steroid receptor [60] and of fish galectins [66], respectively. Since then, 46 crystal structures of ancestral proteins have been deposited in the Protein Data Bank (PDB) (Table 2). The oldest ancestral protein for which the crystal structure has been solved was the last bacterial common ancestor of thioredoxin, hypothesized to have existed 4 billion years ago [57]. The sequence identity of these ancient proteins ranges from around 50 to 70% in comparison with the contemporary ones, and they fold into structures that are similar to the modern descendants.

Table 2
Reconstructed ancestors for which structural information is available
Enzyme Ancestors resurrected and respective PDB codes References 
Steroid hormone receptors   
- Ancestors of the ligand-binding domain AncCR (2Q3Y, 2Q1H, 2Q1V), AncSR1 (3RY9), AncSR2 (4E2J) [60,61,131,132
- Ancestors of the DNA-binding domain AncSR1 (4OLN), AncSR2 (4OND, 4OOR, 4OV7), AncGR (5CBX), AncGR2 (5CBY), AncMR (5CBZ) 
Congerins Con-anc’ (3AJZ), Con-anc (3AJY) [66
β-Lactamase Last common ancestors of enzymes from enterobacteria (3ZDJ), Gram-negative bacteria (4B88), Gram-positive and Gram-negative bacteria (4C6Y) [52,98
Thioredoxin Last common ancestor of thioredoxins from Cyanobacterial, Deinococcus and Thermus groups (4ULX, 2YJ7), Eukaryote (2YOI), Archaeal-Eukaryote (3ZIV), Archaea (2YNX), Fungi and animals (2YPM), γ-Proteobacteria (2YN1), Bacteria (4BA7) [57], [133
GFP-like proteins Least evolved ancestor (4GOB, 4DXN), ancestor of all Faviina proteins (4DXI) [69,134
Uricase Ancestor of mammalian enzymes (4MB8) [121
PyR family of pyrimidine operon attenuators AncGREENPyrR (4P80), AncORGANGEPyrR (4P81) [68
Tyrosine kinase Ancestor of Src and Abl enzymes (4UEU) [56
Lactate dehydrogenase Ancestor of Apicomplexa enzymes (4PLC, 4PLF, 4PLG) [135
Malate dehydrogenase Ancestor of Apicomplexa enzymes (4PLH, 4PLT, 4PLV, 4PLW, 4PLY, 4PLZ) [135
Lactate and malate dehydrogenase Ancestor of trichomonad enzymes (4UUP) [67
Ribonuclease H1 Ancestor of E. coli and T. thermophilus enzymes (4LY7) [101
3-Isopropylmalate dehydrogenase Ancestor of Bacillus enzymes (3U1H) [100
Amino acid-binding proteins Ancestor of arginine-binding proteins (4ZV1, 4ZV2) [136
Enzyme Ancestors resurrected and respective PDB codes References 
Steroid hormone receptors   
- Ancestors of the ligand-binding domain AncCR (2Q3Y, 2Q1H, 2Q1V), AncSR1 (3RY9), AncSR2 (4E2J) [60,61,131,132
- Ancestors of the DNA-binding domain AncSR1 (4OLN), AncSR2 (4OND, 4OOR, 4OV7), AncGR (5CBX), AncGR2 (5CBY), AncMR (5CBZ) 
Congerins Con-anc’ (3AJZ), Con-anc (3AJY) [66
β-Lactamase Last common ancestors of enzymes from enterobacteria (3ZDJ), Gram-negative bacteria (4B88), Gram-positive and Gram-negative bacteria (4C6Y) [52,98
Thioredoxin Last common ancestor of thioredoxins from Cyanobacterial, Deinococcus and Thermus groups (4ULX, 2YJ7), Eukaryote (2YOI), Archaeal-Eukaryote (3ZIV), Archaea (2YNX), Fungi and animals (2YPM), γ-Proteobacteria (2YN1), Bacteria (4BA7) [57], [133
GFP-like proteins Least evolved ancestor (4GOB, 4DXN), ancestor of all Faviina proteins (4DXI) [69,134
Uricase Ancestor of mammalian enzymes (4MB8) [121
PyR family of pyrimidine operon attenuators AncGREENPyrR (4P80), AncORGANGEPyrR (4P81) [68
Tyrosine kinase Ancestor of Src and Abl enzymes (4UEU) [56
Lactate dehydrogenase Ancestor of Apicomplexa enzymes (4PLC, 4PLF, 4PLG) [135
Malate dehydrogenase Ancestor of Apicomplexa enzymes (4PLH, 4PLT, 4PLV, 4PLW, 4PLY, 4PLZ) [135
Lactate and malate dehydrogenase Ancestor of trichomonad enzymes (4UUP) [67
Ribonuclease H1 Ancestor of E. coli and T. thermophilus enzymes (4LY7) [101
3-Isopropylmalate dehydrogenase Ancestor of Bacillus enzymes (3U1H) [100
Amino acid-binding proteins Ancestor of arginine-binding proteins (4ZV1, 4ZV2) [136

The study of ancestral protein structures has confirmed that functional divergence in a protein family can emerge either as a result of structural modification or changes in protein dynamics. The evolution of substrate specificity in the congerin and lactate/malate dehydrogenase families paralleled structural changes in dimerization and α-helix or β-sheet conformation [66,67]. Conformational epistasis, which was defined by Ortlund et al. [60] as allosteric mutations that reposition other residues (usually active site residues) and therefore change the effect of substitutions at that site, was shown to remodel the receptor ligand contacts in the ancestral glucocorticoid receptor and switch the oligomeric state of PyrR proteins [60,68]. Conformational dynamics without significant protein backbone rearrangements were the underlying mechanism for green-to-red photoconversion of the GFP-like proteins and the ancestral promiscuity of β-lactamase [69,70].

Limitations of ASR

As with other prediction methods, ASR has a common pitfall in that there is no guarantee that the sequences inferred are historically correct. In many cases, a single, most probable, ancestral sequence cannot be unambiguously identified and multiple near equally probable amino acids might be assigned to a site in the alignment. Also, when an incorrect model of amino acid substitution is used, inaccurate inferences can occur, yet with high probability. These ambiguities in ASR arise from external factors like limited available sequence data, especially from key evolutionary intermediates, and intrinsic factors such as the evolutionary model that was used to infer the ancestral sequences. Ambiguous gap placement in MSAs and uncertainty of the phylogenetic tree joining the protein family members will also contribute. Uncertainty in inferring ancestral residues is normally greatest at sites that have high sequence divergence among the contemporary proteins, and which generally have little or no effect on the properties of the protein, i.e. sites that reflect neutral genetic drift. Posterior probability of the marginal reconstruction at each residue can be used as a measure of prediction accuracy [71].

To account for the uncertainty in ancestral reconstruction, in theory, all possible combinations of the ancestral states at the ambiguous sites need to be resurrected. For example, Jermann et al. [4] synthesized a variety of alternative candidate ancestral RNase sequences to cover the ambiguity in the tree topology. This solution is generally not feasible, however, in particular when there is a substantial number of ambiguous sites, since the number of possible permutations of ancestral sequences rises exponentially with the number of uncertain positions. Uncertainty in the phylogenetic tree (e.g. unresolved connectivity of deep branches) can be clarified by getting more sequence data, especially from organisms representing poorly sampled branches of the evolutionary tree. For example, Zhang and Rosenberg [72] sequenced additional genes from various primates to better reconstruct ancestral sequences of RNAse paralogues. Improving the phylogenetic tree reconstruction leads to a more precise parsimony analysis or alters the posterior probability in an ML analysis and thereby increases the confidence in the ancestral inference.

Whereas statisticians have developed better evolutionary models to address bias in phylogenetic inferences, protein engineers have tried to overcome this issue more effectively by targeting ambiguous sites that have low marginal probability by making libraries of possible alternative ancestors. Ancestral libraries, which contain multiple candidate ancestral sequences, can be functionally characterized to determine whether the properties subject to biological interpretation are robust with respect to the ambiguity [73] (Gumulya et al., unpublished). Uncertainties in the reconstruction process have been shown to result in apparently inactive, yet thermostable proteins, such as the ancestor of the glutaminase subunit (HisH) of the imidazole glycerol phosphate synthase [74], which, despite having a melting temperature at ∼79°C, showed no activity towards the hydrolysis of glutamine. Unlike the ancestral reconstruction of the cyclase subunit (HisF) that resulted in a thermostable yet catalytically active enzyme, the resurrection of the glutaminase subunit (HisH) was much more prone to uncertainty. The MSA of extant HisH forms contains several insertions and deletions and only 21 out of 226 residues are conserved in the alignment. Taken together, these features suggest that the inferred sequence for the HisH ancestor is largely ambiguous and may be inaccurate at residues that underwent frequent changes during evolution (due to insertions and deletions). However, the possibility cannot be excluded that the antecedent simply lacked the expected catalytic activity. This explanation is somewhat less likely, however, in proteins that have an essential and fundamental highly conserved physiological function, e.g. in a primary metabolic pathway.

As the above studies exemplify, one can be more confident that a given ancestral sequence is historically correct if a convincing evolutionary narrative can be put together when the properties of the ancestor and extant proteins are considered in the context of their respective environments. In this regard, improved character mapping within phylogenies should be helpful [75]. However, it is important to avoid subjectivity in the evaluation of ancestors (i.e., to avoid what has been called an evolutionary ‘just so story’ [33]), especially since it is, at best, not trivial, and in most cases not possible, to independently verify assumptions about co-evolving proteins and environmental conditions at the time a given ancestor existed.

The emergence of ASR as a tool in protein engineering

The field of ASR initially developed due to the interest of evolutionary biologists in understanding how biomolecules adapt their function optimally in changing environment. It is not surprising, therefore, that such an approach can be adapted to the purposeful alteration of protein function to altered environments, such as required for industrial processes. Concern over the inability to validate whether resurrected proteins are the ‘true’ antecedents of modern sequences is less relevant where the ancestor is simply used as a starting material for protein engineering. ASR has already been shown to be a highly useful tool for protein engineering for drug discovery [76,77] and has great potential for use in other applications, such as the development of biocatalysts.

Whereas directed evolution is seen as moving forwards in evolution, ASR can be considered as travelling back to the origin. Importantly, traversing back and forth along an evolutionary timeline between extant and ancestral enzymes can enhance the search for proteins with novel or enhanced properties in areas of sequence space enriched in functional proteins, i.e. sequences that have survived natural selection. Importantly, this approach allows more sequence diversity to be assessed compared with, e.g. random mutagenesis or recombination of extant forms, where the introduction of deleterious mutations or adverse epistatic interactions is more likely to compromise the viability of the libraries created [77].

Extant specialist enzymes have been hypothesized to have evolved from ancestral generalist enzymes that catalysed many related reactions [52,78]. Moreover, based on the assumption that the Earth experienced periods of elevated ambient temperature in primordial times [79,80], it has been predicted that ancient proteins may have been generally more thermostable than extant forms, a prediction that has been supported by several studies [50,52,8183]. Delivering dramatic changes in sequence, ASR has been shown to be more effective than any other currently available protein engineering method for increasing enzyme stability while preserving catalytic activity [51,52,54,84]. A handful of ancestral resurrection studies have revealed that at least some of the ancestral proteins from the deep past show both thermostability and high catalytic activity, making them highly suitable and evolvable departure points for engineering new traits not present in contemporary enzymes.

The ancestral mutation method (AMM)

The first ASR-related approach implemented for protein engineering was termed ancestral mutation [85], and involved introducing ancestral residues into a modern protein sequence. Four resurrected proteins, namely 3-isopropylmalate dehydrogenase [86,87], isocitrate dehydrogenase (IDH) [88], glycyl-tRNA synthetase [89] and β-amylase [90], were moderately increased in their stability, supporting the idea of a thermophilic common ancestor of the extant life on Earth. The ancestral mutation approach can be useful for combining desired properties of modern and ancestral proteins. Ancestral variants of serum paraoxonases and cytosolic sulfotransferases are highly active and functionally diverse and with just a few ancestral mutations the enzyme active site was reshaped to adopt new specificity [91]. However, the lack of strategies for identifying which ancestral residues should be selected for substitution makes the approach less amenable to wider implementation. Importantly, epistatic interactions between the ancestral and non-ancestral residues could affect the property of interest, compromising the success of this approach.

The reconstructing evolutionary adaptive paths approach (REAP)

To identify which subset of ancestral residues should be targeted for substitutions, the REAP approach [32] uses signatures of functional divergence found in the phylogenetic analysis of a protein family. The hypothesis behind REAP is that when a phenotypic difference (e.g. qualitative change in activity) occurs between two subfamilies in a tree, the residues that change along the branches that diverge leading to the two subfamilies may be responsible for the observed functional switch. This approach has been used to broaden the substrate scope of a DNA polymerase to include a new class of triphosphate substrates. By screening a library of only 93 variants, each having replacements at three or four of 35 sites identified by REAP, eight variants that showed improved ability to accept the new class of triphosphate substrate were obtained [92].

The consensus mutation method

An alternative to ancestral mutation, but one that is superficially similar and sometimes inappropriately assumed to be effectively the same, is the consensus approach, in which the residues most commonly found at a given position or positions in a protein family are introduced into an extant protein [93]. The consensus approach is commonly used in enzyme engineering to generate protein variants with increased stability [94], based on the hypothesis that the residue most commonly found at a given position is likely to be the one providing greatest fitness, i.e. commonly the best stability [95]. However, it must be stressed that no information on phylogenetic relationships is incorporated into the consensus approach, and this may compromise its usefulness [33,96]. Four independent studies [50,54,97,98] comparing the phenotypes of consensus variants versus ancient proteins resurrected using evolutionary model-based methods show the superiority of true ASR in terms of producing proteins that can be expressed at high levels and show greater thermal stability. Despite having high sequence similarity to the ancestral sequence, the consensus forms of elongation factor (EF) Tu, the β-subunit of DNA gyrase, nucleoside diphosphate kinases and β-lactamase showed a substantially lower stability than those obtained from laboratory-resurrected enzymes [50,54,97,98].

Application of ASR to specific protein engineering objectives

Common goals of protein engineering strategies are: to make enzymes more robust to industrial conditions, e.g. with respect to temperature and solvent concentration; to alter substrate or reaction specificity; to enhance enantioselectivity; to improve catalytic rates and to adapt enzymes to a different cofactor. ASR has the potential to address several of these objectives in enzyme engineering as described below. However, recent studies have also revealed improvements in properties of more particular benefit to specific classes of proteins, such as the immunogenicity of protein drugs [76].

Thermal stability

Thermal stability is desirable in proteins used for industrial applications to allow reactions to be run at higher temperatures but also because thermostable proteins have longer half-lives at mild temperatures. In addition, a robust template is often required for directed evolution studies, as mutations that promote new functions are frequently destabilizing [99]. The seminal study that indicated that ASR could be used to engineer thermostable proteins was the reconstruction of EF proteins, very loosely dated to between 3.5 and 0.5 billion years old. The thermostability of more than 25 resurrected ancestral EFs suggested that the ancestors of current organisms were thermophilic and they subsequently adapted to lower temperatures [50,81]. Several later ASR studies on thioredoxin [51], β-lactamase [52] and nucleoside diphosphate kinase [54] have documented similar increases in the thermal stability profile as one goes back in time, and this result has been interpreted as evidence of ancient proteins needing to survive a hot ancient global environment. However, older ancestors do not necessarily show the highest thermal stability, as was demonstrated in the resurrected ancestors of 3-isopropylmalate dehydrogenase [100] and ribonuclease H1 [101]. Thus, thermostability is not exclusively a primitive trait, meaning that it can be readily gained and lost throughout evolutionary history, as proteins traverse their separate evolutionary paths to adapt to the local conditions under which they were selected and to the functions they evolved to perform. Controversially, based on a theoretical study [102], it has been reported that the thermal stability of resurrected proteins might be overestimated by the ML method used for inferring ancestral sequences, as it tends to disregard variants at a position that are detrimental to stability and less frequent, including those that were never fixed. The ancestor predicted by ML can thus be considered as an average of the ancestral population genotypes rather than a sequence that would be expected to be sampled from any given individual. However, the alternative Bayesian method, which sometimes chooses less probable residues from the posterior probability distribution, can eliminate this bias and thus provide more robust conclusions. However, the authors acknowledge that it is unclear whether this interpretation will prove to be correct for reconstructions of real proteins, especially where epistatic effects can affect the stability of the evolving proteins. Interestingly, in the same theoretical study, ML produced the most accurate reconstructions [102].

The increases in denaturation temperature of ∼30–40°C that have been obtained by resurrecting ancestors from the deep past are much larger than those typically obtained in engineering studies aimed at protein stabilization (as reviewed recently in [103]). Moreover, the ancestral resurrection approach allows for the stabilization of protein without requiring detailed structural information. To date, most studies have used the AMM to engineer stability into proteins of interest. However, Jackson and colleagues used the ancestral form of periplasmic-binding proteins to develop thermostable variants for the construction of robust biosensors [104].

Altered (broadened) substrate scope or novel function

It was hypothesized originally by Jensen, in 1976 [78], that specialized enzymes have evolved from more generalist (i.e. more promiscuous) primordial forms and that the specificity of modern enzymes has been tuned to an optimal state by natural selection. ASR studies have revealed that some ancient proteins appear to have broad substrate scope and some have either the same or a different/novel function compared with their descendants. Two to three billion-year-old β-lactamases can degrade a variety of antibiotics with catalytic efficiencies that are similar to those of an average modern enzyme, suggesting that the predecessors of β-lactamase enzymes were moderately efficient promiscuous enzymes [52]. The common ancestor of cyclin-dependent kinases (CDKs), mitogen-activated protein kinases (MAPKs) and the related kinase, Ime2, appears to have been able to phosphorylate peptides having either a proline or an arginine residue at the +1 position, in contrast with the more stringent proline requirement of the extant enzymes [105]. The ancestor of a fungal glucosidase showed some activity for maltose and isomaltose, yet, over time, the modern enzyme appears to have specialized to hydrolyse one substrate, with a significant loss of activity towards the other substrate [55]. The bi-functionality of the resurrected mammalian ancestor of serum paraoxonases, the quorum-quenching and detoxifying lactonase activities, explained the overlapping specificities of some extant paraoxonases [106].

The broader substrate scope seen in some resurrected ancestors may originate in an intrinsic prediction bias assigned to ancestors that ‘averages’ the different specificities seen in the descendent families [106]. Alternatively, it may indicate sub-functionalization of a gene duplication event in a protein family, where the old and new copies of the gene have each evolved to perform just one of multiple ancestral functions. Neofunctionalization, i.e. the gain of a new function, can also occur following duplication [107], with or without retention of the original function in either paralogue, depending on subsequent selection pressures.

‘Artificial’, prospective evolution studies have demonstrated that the ability to evolve a promiscuous function often requires an existing weak initial activity in the wild-type template [108]. Thus, it is highly advantageous to be able to use a more promiscuous ancestral protein as a template for protein engineering, i.e., to be able to improve on a property that is already present to some extent. The results of many ancestral resurrection studies support the prevailing view that ancestral forms may be less specialized than contemporary ones, e.g. the ancestral corticoid receptor apparently had affinity for both classes of hormones (glucocorticoids and mineralocorticoids), whereas the contemporary proteins are activated selectively by only one of these types of ligands [59]; similarly, ancestral β-lactamases showed activity towards a broader range of antibiotics [52].

Ancestral proteins could, however, have different phenotypes or catalyse entirely distinct reactions to modern proteins due to gene duplication events followed by sub-functionalization or via orthologous neofunctionalization. The Filozoan ancestor of the GK protein interaction domain appears to have had different biochemical functions, such as in spindle orientation, which play important roles in forming organized tissues in multicellular organisms and which have been lost in the evolution of guanylate kinase activity in the extant forms [109].

While ample studies indicate that ancestral forms with broadened specificity and modest levels of activity can serve as potential templates for further optimization, to date, examples of this approach are lacking (Gumulya et al., unpublished). Only the AMM has been used to alter the specificity of an enzyme by the incorporation of ancestral mutations [91]. Understanding the change or loss of catalytic activities in ancestral proteins can be used as a guide for designing novel enzymes. Yet characterizing amino acid residues responsible for a switch in enzyme activity in a protein family can be a challenging task. To identify such mutations in the family of seminal ribonuclease, Sassi et al. [110] calculated the dN/dS ratio (ratio of non-synonymous to synonymous substitutions) for individual branches in the tree and detected episodes of positive selection that led to an increase in immunosuppressive activity in modern RNases. Regardless of the evolutionary model, the choice of out-group or the tree topology, only the branch leading to the modern seminal RNase in ox underwent adaptive evolution (high dN/dS ratio), and replacement of Gly 38 by Asp accounted for the increased catalytic activity against duplex RNA. Another approach used has been to calculate the rates of amino acid substitution at the sites that differ between two protein superfamilies. By selecting residues with low rates and hypothesizing that this characteristic indicated importance in the binding of substrates, Sun et al. [11] were able to identify that replacement at a single position (His 47 to Asn) is sufficient to cause a nearly complete swap in specificity between two supergroups of Pax genes. Combining a non-homogeneous γ-model (which allows site-specific rate shifts) and the available structural information for a protein allows identification of sites that are likely to be involved in the change of function [111]. The DIVERGE software has been used to detect variability in the evolutionary rate associated with Type-I evolutionary functional divergence [112].

Cofactor switching

Unlike approaches for improving thermal stability and broadening substrate scope, currently there is no straightforward strategy implemented in enzyme engineering for altering an enzyme's cofactor preference, for example from the more costly nicotinamide adenine dinucleotide phosphate (NADP) to nicotinamide adenine dinucleotide (NAD) in oxidoreductases. Phylogenetic analysis of the IDH family suggested that the ancestral form of IDH used NAD(H), but a switch to NADP(H) occurred ∼3.5 billion years ago [113]. Introducing three ancestral residues (Lys344Asp, Tyr345Ile and Val351Ala) in the binding pocket switched the cofactor specificity of E. coli IDH from NADP(H) to NAD(H), from a 7000-fold preference for NADP(H) to a 200-fold preference for NAD(H) [114]. As the sequence divergence within the IDH enzyme family is very high, it was not possible for the authors to reconstruct the ancestor without significant ambiguity. Additional data including X-ray structures of two extant proteins, NADP bound to E. coli IDH and NAD bound to Thermus thermophilus isopropylmalate dehydrogenase, were used to align conserved residues in the sequence alignment. The study highlighted that with the addition of two substitutions outside the binding pocket (Cys201Met and Cys332Tyr), the ability of the altered cofactor IDH to catalyse the decarboxylation of isocitrate was as effective as naturally occurring, NAD-dependent members of the family. The fact that there is so far only this single example of cofactor switching reported in the literature using the AMM [113] indicates the difficulty in identifying two enzyme families that utilize different cofactors yet are not so divergent as to compromise the feasibility of ASR.

Ancestral libraries

The first ancestral libraries were created to address the uncertainty issues in the inferences of ancestral sequences, by experimentally generating a ‘cloud’ of probable estimates of the true ancestral sequence. Ugalde et al. [8] targeted ambiguous sites in the ancestral sequences of GFP-like proteins to be varied, thus creating a library to sample alternative ancestral states predicted using different evolutionary models. Several groups have resurrected alternative ancestral sequences for positions for which the predictions are ambiguous, either with site-directed or saturation mutagenesis approaches, depending on the degeneracy to be introduced. However, there is no consensus on what constitutes an appropriate ‘uncertainty’ threshold [33]. In one study, low-confidence sites have been defined as those that have a second plausible reconstruction with a posterior probability of 20% [109], whereas other studies assign ambiguity to sites when the posterior probability was <90 and 80%, respectively [9,115]. A more desirable approach would be to recreate experimentally the calculated posterior the distribution of ancestral sequences and characterize the phenotypic traits of a sample of the sequences [116]. Unfortunately, the construction of a degenerate gene in which the variants are represented at unequal, predefined proportions represents a significant technical challenge. Currently, such libraries are not routinely available via commercial gene synthesis.

In more recent studies, ancestral libraries have been used as a way to generate ‘smart libraries’ for directed evolution. Ancestral libraries, in which phylogenetic analysis and ancestral inference have been used to choose sites that are important for functional diversification, could potentially distinguish amino acid replacements that are functionally important from those that are neutral, and thus increase the chances of finding improved variants in the screening process. Tawfik's group has used such a strategy to create ancestral libraries consisting of 300 variants of serum paraoxonases and cytosolic sulfotransferase, from which mutants with up to 50-fold higher activity could be obtained [91]. Assuming the inference is robust, the combinatorial ancestral libraries widen the protein sequence space to be explored compared with that which can be attained in conventional directed evolution libraries (Figure 3). Additionally, ancestral libraries can also be used as a way to deconvolute combinatorial mutations that have led to functional diversification in protein families. Matz and colleagues [117] have successfully identified epistatic interactions that resulted in the evolution of the red phenotype from a green ancestor of GFP-like proteins by generating a library of ancestral intermediates, in which each variant contains half of the 37 amino acid changes that occurred between the two ancestors.

ASR and directed evolution explore sequence space going back and forward in time, respectively.

Figure 3.
ASR and directed evolution explore sequence space going back and forward in time, respectively.

The two principal methods of directed evolution (site-saturation mutagenesis and recombination) enable the exploration of sequence space of the descendants (forward evolution). When the same techniques are applied to ancestral sequences, it is possible to replay the evolutionary process under different selection pressures. Phylogenetic information can be used in two other approaches to alter the properties of the protein family in question. In the ancestral mutation method (AMM), selected ancestral residues are incorporated into extant forms to confer desired properties. In an analogous fashion, in the Reconstructing Evolutionary Adaptive Pathways (REAP) approach, residues believed to confer properties inherent to particular evolutionary branches are introduced into extant forms.

Figure 3.
ASR and directed evolution explore sequence space going back and forward in time, respectively.

The two principal methods of directed evolution (site-saturation mutagenesis and recombination) enable the exploration of sequence space of the descendants (forward evolution). When the same techniques are applied to ancestral sequences, it is possible to replay the evolutionary process under different selection pressures. Phylogenetic information can be used in two other approaches to alter the properties of the protein family in question. In the ancestral mutation method (AMM), selected ancestral residues are incorporated into extant forms to confer desired properties. In an analogous fashion, in the Reconstructing Evolutionary Adaptive Pathways (REAP) approach, residues believed to confer properties inherent to particular evolutionary branches are introduced into extant forms.

Conclusions and future perspectives

Despite continuous debate in the field concerning the accuracy of resurrected ancestral proteins in comparison with the true antecedents, and the best methods to use for inference, ASR has been used with increasing frequency by both evolutionary biologists to understand natural evolution processes and, more recently, by protein engineers to design novel enzymes. While it may never be possible to conclusively identify a single, historically accurate ancestor, approaches involving greater sampling of predicted sequences, such as the characterization of ancestral libraries that encompass the uncertainty in predictions, can indicate whether the inferences made about ancestral characteristics of a given protein family are likely to stand the test of time. Importantly, new methods are urgently needed that better manage the treatment of insertions and deletions, ideally in an alignment-independent fashion, and that can deal with larger numbers of sequences, so as to fully exploit the copious amounts of information becoming available from whole genome sequences.

The near future should see an expansion in the use of ancestral sequences as templates for directed, prospective evolution. Combined with the ever-growing collection of sequenced genomes and the refinement in computational methods for phylogenetic analysis, ASR opens the door to areas of sequence space that were not previously accessible to protein engineers. In particular, resurrection of ancestors may provide thermostable forms of proteins that are robust to mutagenesis. Whether this is an artefact of the methods used for inference or reflects a true ancestral characteristic is somewhat irrelevant for protein engineering. While objectives such as engineering a switch in cofactor specificity may remain challenging to solve by ASR due to the evolutionary distance between homologous protein sequences, altering the substrate and reaction specificity of ancestral proteins, by directed evolution or conventional protein engineering, should be more tractable.

Ultimately ASR has the potential to allow biochemists to look, Janus-like, into both the past and future of protein evolution. The prospect of resetting the evolutionary clock using inferred ancestors and rerunning evolutionary ‘experiments’ under altered selection pressures should provide insights into how proteins have evolved in nature and also how we can better direct artificial evolution so as to produce proteins that are useful for the future.

Abbreviations

     
  • AMM

    ancestral mutation method

  •  
  • ASR

    ancestral sequence reconstruction

  •  
  • CDK

    cyclin dependent kinase

  •  
  • EF

    elongation factor

  •  
  • GASP

    gapped ancestral sequence prediction

  •  
  • IDH

    isocitrate dehydrogenase

  •  
  • MAPK

    mitogen activated protein kinase

  •  
  • ML

    maximum likelihood

  •  
  • MP

    maximum parsimony

  •  
  • MSA

    multiple sequence alignment

  •  
  • Mya

    million years ago

  •  
  • NAD

    nicotinamide adenine dinucleotide

  •  
  • NADP

    nicotinamide adenine dinucleotide phosphate

  •  
  • NCBI

    National Center for Biotechnology Information

  •  
  • PAML

    phylogenetic analysis by maximum likelihood

  •  
  • PAUP

    Phylogenetic Analysis Using Parsimony

  •  
  • PDB

    Protein Data Bank.

Acknowledgments

The authors would like to thank Drs. M. Boden, V. L. Arcus and J. W. Thornton for helpful discussions during the preparation of this review.

Competing Interests

The Authors declare that there are no competing interests associated with the manuscript.

References

References
1
Kendrew
,
J.C.
,
Bodo
,
G.
,
Dintzis
,
H.M.
,
Parrish
,
R.G.
,
Wyckoff
,
H.
and
Phillips
,
D.C.
(
1958
)
A three-dimensional model of the myoglobin molecule obtained by X-ray analysis
.
Nature
181
,
662
666
doi:
2
Pauling
,
L.
and
Zuckerkandl
,
E.
(
1963
)
Chemical paleogenetics. Molecular ‘restoration studies’ of extinct forms of life
.
Acta Chem. Scand
.
17
,
9
16
doi:
3
Stackhouse
,
J.
,
Presnell
,
S.R.
,
Mcgeehan
,
G.M.
,
Nambiar
,
K.P.
and
Benner
,
S.A.
(
1990
)
The ribonuclease from an extinct bovid ruminant
.
FEBS Lett.
262
,
104
106
doi:
4
Jermann
,
T.M.
,
Opitz
,
J.G.
,
Stackhouse
,
J.
and
Benner
,
S.A.
(
1995
)
Reconstructing the evolutionary history of the artiodactyl ribonuclease superfamily
.
Nature
374
,
57
59
doi:
5
Chandrasekharan
,
U.M.
,
Sanker
,
S.
,
Glynias
,
M.J.
,
Karnik
,
S.S.
and
Husain
,
A.
(
1996
)
Angiotensin II-forming activity in a reconstructed ancestral chymase
.
Science
271
,
502
505
doi:
6
Asano
,
Y.
(
2002
)
Overview of screening for new microbial catalysts and their uses in organic synthesis — selection and optimization of biocatalysts
.
J. Biotechnol.
94
,
65
72
doi:
7
Thornton
,
J.W.
,
Need
,
E.
and
Crews
,
D.
(
2003
)
Resurrecting the ancestral steroid receptor: ancient origin of estrogen signaling
.
Science
301
,
1714
1717
doi:
8
Ugalde
,
J.A.
,
Chang
,
B.S.
and
Matz
,
M.V.
(
2004
)
Evolution of coral pigments recreated
.
Science
305
,
1433
doi:
9
Thomson
,
J.M.
,
Gaucher
,
E.A.
,
Burgan
,
M.F.
,
De Kee
,
D.W.
,
Li
,
T.
,
Aris
,
J.P.
et al. 
(
2005
)
Resurrecting ancestral alcohol dehydrogenases from yeast
.
Nat. Genet.
37
,
630
635
doi:
10
Adey
,
N.B.
,
Tollefsbol
,
T.O.
,
Sparks
,
A.B.
,
Edgell
,
M.H.
and
Hutchison
,
C.A.
(
1994
)
Molecular resurrection of an extinct ancestral promoter for mouse L1
.
Proc. Natl Acad. Sci. U.S.A.
91
,
1569
1573
doi:
11
Sun
,
H.M.
,
Merugu
,
S.
,
Gu
,
X.
,
Kang
,
Y.Y.
,
Dickinson
,
D.P.
,
Callaerts
,
P.
et al. 
(
2002
)
Identification of essential amino acid changes in paired domain evolution using a novel combination of evolutionary analysis and in vitro and in vivo studies
.
Mol. Biol. Evol.
19
,
1490
1500
doi:
12
Lu
,
Q.
and
Fox
,
G.E.
(
2011
)
Resurrection of an ancestral 5S rRNA
.
BMC Evol. Biol.
11
,
218
doi:
13
Ducatez
,
M.F.
,
Bahl
,
J.
,
Griffin
,
Y.
,
Stigger-Rosser
,
E.
,
Franks
,
J.
,
Barman
,
S.
et al. 
(
2011
)
Feasibility of reconstructed ancestral H5N1 influenza viruses for cross-clade protective vaccine development
.
Proc. Natl Acad. Sci. U.S.A.
108
,
349
354
doi:
14
Blanchette
,
M.
,
Baniré Diallo
,
A.
,
Green
,
E.D.
,
Miller
,
W.
and
Haussler
,
D.
(
2008
) Computational reconstruction of ancestral DNA Sequences. In
Methods in Molecular Biology: Phylogenomics
(
Murphy
,
W.J.
, ed.), pp.
171
184
,
Humana Press Inc.
,
Totowa, NJ
15
Anisimova
,
M.
,
Cannarozzi
,
G.
and
Liberles
,
D.A.
(
2010
)
Finding the balance between the mathematical and biological optima in multiple sequence alignment
.
Trends Evol. Biol.
2
,
e10
doi:
16
Anisimova
,
M.
(
2015
)
Darwin and Fisher meet at Biotech: on the potential of computational molecular evolution in industry
.
BMC Evol. Biol.
15
,
76
doi:
17
Fitch
,
W.M.
(
1971
)
Toward defining course of evolution — minimum change for a specific tree topology
.
Syst. Zool.
20
,
406
416
doi:
18
Yang
,
Z.H.
,
Kumar
,
S.
and
Nei
,
M.
(
1995
)
A new method of inference of ancestral nucleotide and amino-acid-sequences
.
Genetics.
141
,
1641
1650
PMID:
[PubMed]
19
Koshi
,
J.M.
and
Goldstein
,
R.A.
(
1996
)
Probabilistic reconstruction of ancestral protein sequences
.
J. Mol. Evol.
42
,
313
320
doi:
20
Ronquist
,
F.
and
Huelsenbeck
,
J.P.
(
2003
)
Mrbayes 3: Bayesian phylogenetic inference under mixed models
.
Bioinformatics
19
,
1572
1574
doi:
21
Zhang
,
J.Z.
and
Nei
,
M.
(
1997
)
Accuracies of ancestral amino acid sequences inferred by the parsimony, likelihood, and distance methods
.
J. Mol. Evol.
44
,
S139
S146
doi:
22
Pupko
,
T.
,
Pe'er
,
I.
,
Shamir
,
R.
and
Graur
,
D.
(
2000
)
A fast algorithm for joint reconstruction of ancestral amino acid sequences
.
Mol. Biol. Evol.
17
,
890
896
doi:
23
Huelsenbeck
,
J.P.
and
Bollback
,
J.P.
(
2001
)
Empirical and hierarchical Bayesian estimation of ancestral states
.
Syst. Biol.
50
,
351
366
doi:
24
Hanson-Smith
,
V.
,
Kolaczkowski
,
B.
and
Thornton
,
J.W.
(
2010
)
Robustness of ancestral sequence reconstruction to phylogenetic uncertainty
.
Mol. Biol. Evol.
27
,
1988
1999
doi:
25
Randall
,
R.N.
,
Radford
,
C.E.
,
Roof
,
K.A.
,
Natarajan
,
D.K
. and
Gaucher
,
E.A
. (
2016
)
An experimental phylogeny to benchmark ancestral sequence reconstruction
.
Nat. Commun.
7
,
12847
doi:
26
Anisimova
,
M.
,
Liberles
,
D.A.
,
Philippe
,
H.
,
Provan
,
J.
,
Pupko
,
T.
and
von Haeseler
,
A.
(
2013
)
State-of the art methodologies dictate new standards for phylogenetic analysis
.
BMC Evol. Biol.
13
,
161
doi:
27
Thompson
,
J.D.
,
Higgins
,
D.G.
and
Gibson
,
T.J.
(
1994
)
CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice
.
Nucleic Acids Res.
22
,
4673
4680
doi:
28
Katoh
,
K.
,
Misawa
,
K.
,
Kuma
,
K.
and
Miyata
,
T.
(
2002
)
MAFFT: a novel method for rapid multiple sequence alignment based on fast Fourier transform
.
Nucleic Acids Res.
30
,
3059
3066
doi:
29
Edgar
,
R.C.
(
2004
)
MUSCLE: multiple sequence alignment with high accuracy and high throughput
.
Nucleic Acids Res.
32
,
1792
1797
doi:
30
Notredame
,
C.
,
Higgins
,
D.G.
and
Heringa
,
J.
(
2000
)
T-coffee: a novel method for fast and accurate multiple sequence alignment
.
J. Mol. Biol.
302
,
205
217
doi:
31
Loytynoja
,
A.
and
Goldman
,
N.
(
2005
)
An algorithm for progressive multiple alignment of sequences with insertions
.
Proc. Natl Acad. Sci. U.S.A.
102
,
10557
10562
doi:
32
Cole
,
M.F.
,
Cox
,
V.E.
,
Gratton
,
K.L.
and
Gaucher
,
E.A.
(
2013
)
Reconstructing evolutionary adaptive paths for protein engineering
.
Methods Mol. Biol.
978
,
115
125
doi:
33
Benner
,
S.A.
,
Sassi
,
S.O.
and
Gaucher
,
E.A.
(
2007
)
Molecular paleoscience: systems biology from the past
.
Adv. Enzymol. Relat. Areas Mol. Biol.
75
,
1
132
,
xi
PMID:
[PubMed]
34
DeGiorgio
,
M.
and
Rosenberg
,
N.A.
(
2016
)
Consistency and inconsistency of consensus methods for inferring species trees from gene trees in the presence of ancestral population structure
.
Theor. Popul. Biol.
110
,
12
24
doi:
35
Mehta
,
R.S.
,
Bryant
,
D.
and
Rosenberg
,
N.A.
(
2016
)
The probability of monophyly of a sample of gene lineages on a species tree
.
Proc. Natl Acad. Sci. U.S.A.
113
,
8002
8009
doi:
36
Stadler
,
T.
,
Degnan
,
J.H.
and
Rosenberg
,
N.A.
(
2016
)
Does gene tree discordance explain the mismatch between macroevolutionary models and empirical patterns of tree shape and branching times?
Syst. Biol.
65
,
628
639
doi:
37
Groussin
,
M.
,
Hobbs
,
J.K.
,
Szollosi
,
G.J.
,
Gribaldo
,
S.
,
Arcus
,
V.L.
and
Gouy
,
M.
(
2015
)
Toward more accurate ancestral protein genotype-phenotype reconstructions with the use of species tree-aware gene trees
.
Mol. Biol. Evol.
32
,
13
22
doi:
38
Dayhoff
,
M.O.
and
Eck
,
R.V.
and
Eck
,
C.M.
(
1972
) A model of evolutionary change in proteins. In
Atlas of Protein Sequence and Structure
(
Dayhoff
,
M.O.
, ed.), pp.
89
99
,
National Biomedical Research Foundation
,
Washington, DC
39
Jones
,
D.T.
,
Taylor
,
W.R.
and
Thornton
,
J.M.
(
1992
)
The rapid generation of mutation data matrices from protein sequences
.
Comput. Appl. Biosci.
8
,
275
282
PMID:
[PubMed]
40
Whelan
,
S.
and
Goldman
,
N.
(
2001
)
A general empirical model of protein evolution derived from multiple protein families using a maximum-likelihood approach
.
Mol. Biol. Evol.
18
,
691
699
doi:
41
Le
,
S.Q.
and
Gascuel
,
O.
(
2008
)
An improved general amino acid replacement matrix
.
Mol. Biol. Evol.
25
,
1307
1320
doi:
42
Yang
,
Z.H.
(
1994
)
Maximum-likelihood phylogenetic estimation from DNA sequences with variable rates over sites: approximate methods
.
J. Mol. Evol.
39
,
306
314
doi:
43
Swofford
,
D.L.
(
1991
)
PAUP: Phylogenetic Analysis Using Parsimony, Version 3.1 Computer program distributed by the Illinois Natural History Survey
,
Champaign
,
Illinois
44
Swofford
,
D.L.
and
Sullivan
,
J.
(
2009
) Phylogeny inference based on parsimony and other methods using PAUP*. In
The Phylogenetic Handbook: A Practical Approach to Phylogenetic Analysis and Hypothesis Testing
,
(Lemey, P., Salemi, M. and Vandamme, A.-M., eds), Cambridge University Press, Cambridge
,
2nd edn
, pp.
267
312
45
Yang
,
Z.
(
2007
)
PAML 4: phylogenetic analysis by maximum likelihood
.
Mol. Biol. Evol.
24
,
1586
1591
doi:
46
Huelsenbeck
,
J.P.
and
Ronquist
,
F.
(
2001
)
MRBAYES: Bayesian inference of phylogenetic trees
.
Bioinformatics.
17
,
754
755
doi:
47
Cai
,
W.
,
Pei
,
J.
and
Grishin
,
N.V.
(
2004
)
Reconstruction of ancestral protein sequences and its applications
.
BMC Evol. Biol.
4
,
33
doi:
48
Ashkenazy
,
H.
,
Penn
,
O.
,
Doron-Faigenboim
,
A.
,
Cohen
,
O.
,
Cannarozzi
,
G.
,
Zomer
,
O.
et al. 
(
2012
)
FastML: a web server for probabilistic reconstruction of ancestral sequences
.
Nucleic Acids Res.
40
,
W580
W584
doi:
49
Malcolm
,
B.A.
,
Wilson
,
K.P.
,
Matthews
,
B.W.
,
Kirsch
,
J.F.
and
Wilson
,
A.C.
(
1990
)
Ancestral lysozymes reconstructed, neutrality tested, and thermostability linked to hydrocarbon packing
.
Nature
345
,
86
89
doi:
50
Gaucher
,
E.A.
,
Govindarajan
,
S.
and
Ganesh
,
O.K.
(
2008
)
Palaeotemperature trend for Precambrian life inferred from resurrected proteins
.
Nature
451
,
704
702
doi:
51
Perez-Jimenez
,
R.
,
Inglés-Prieto
,
A.
,
Zhao
,
Z.-M.
,
Sanchez-Romero
,
I.
,
Alegre-Cebollada
,
J.
,
Kosuri
,
P.
et al. 
(
2011
)
Single-molecule paleoenzymology probes the chemistry of resurrected enzymes
.
Nat. Struct. Mol. Biol.
18
,
592
599
doi:
52
Risso
,
V.A.
,
Gavira
,
J.A.
,
Mejia-Carmona
,
D.F.
,
Gaucher
,
E.A.
and
Sanchez-Ruiz
,
J.M.
(
2013
)
Hyperstability and substrate promiscuity in laboratory resurrections of Precambrian β-lactamases
.
J. Am. Chem. Soc.
135
,
2899
2902
doi:
53
Pollock
,
D.D.
and
Chang
,
B.S.W.
(
2007
)
Ancestral Sequence Reconstruction
, pp.
85
94
,
Oxford University Press
,
Oxford
54
Akanuma
,
S.
,
Nakajima
,
Y.
,
Yokobori
,
S.-i.
,
Kimura
,
M.
,
Nemoto
,
N.
,
Mase
,
T.
et al. 
(
2013
)
Experimental evidence for the thermophilicity of ancestral life
.
Proc. Natl Acad. Sci. U.S.A.
110
,
11067
11072
doi:
55
Voordeckers
,
K.
,
Brown
,
C.A.
,
Vanneste
,
K.
,
van der Zande
,
E.
,
Voet
,
A.
,
Maere
,
S.
et al. 
(
2012
)
Reconstruction of ancestral metabolic enzymes reveals molecular mechanisms underlying evolutionary innovation through gene duplication
.
PLoS Biol.
10
,
e1001446
doi:
56
Wilson
,
C.
,
Agafonov
,
R.V.
,
Hoemberger
,
M.
,
Kutter
,
S.
,
Zorba
,
A.
,
Halpin
,
J.
et al. 
(
2015
)
Using ancient protein kinases to unravel a modern cancer drug's mechanism
.
Science
347
,
882
886
doi:
57
Ingles-Prieto
,
A.
,
Ibarra-Molero
,
B.
,
Delgado-Delgado
,
A.
,
Perez-Jimenez
,
R.
,
Fernandez
,
J.M.
,
Gaucher
,
E.A.
et al. 
(
2013
)
Conservation of protein structure over four billion years
.
Structure
21
,
1690
1697
doi:
58
Zhang
,
J.
and
Rosenberg
,
H.F.
(
2002
)
Complementary advantageous substitutions in the evolution of an antiviral RNase of higher primates
.
Proc. Natl Acad. Sci. U.S.A.
99
,
5486
5491
doi:
59
Bridgham
,
J.T.
,
Carroll
,
S.M.
and
Thornton
,
J.W.
(
2006
)
Evolution of hormone-receptor complexity by molecular exploitation
.
Science
312
,
97
101
doi:
60
Ortlund
,
E.A.
,
Bridgham
,
J.T.
,
Redinbo
,
M.R.
and
Thornton
,
J.W.
(
2007
)
Crystal structure of an ancient protein: evolution by conformational epistasis
.
Science
317
,
1544
1548
doi:
61
McKeown
,
A.N.
,
Bridgham
,
J.T.
,
Anderson
,
D.W.
,
Murphy
,
M.N.
,
Ortlund
,
E.A.
and
Thornton
,
J.W.
(
2014
)
Evolution of DNA specificity in a transcription factor family produced a new gene regulatory module
.
Cell
159
,
58
68
doi:
62
Starr
,
T.N.
and
Thornton
,
J.W.
(
2016
)
Epistasis in protein evolution
.
Protein Sci.
25
,
1204
1218
doi:
63
Harms
,
M.J.
and
Thornton
,
J.W.
(
2010
)
Analyzing protein structure and function using ancestral gene reconstruction
.
Curr. Opin. Struct. Biol.
20
,
360
366
doi:
64
Bridgham
,
J.T.
,
Ortlund
,
E.A.
and
Thornton
,
J.W.
(
2009
)
An epistatic ratchet constrains the direction of glucocorticoid receptor evolution
.
Nature
461
,
515
578
doi:
65
Yokoyama
,
S.
,
Tada
,
T.
,
Zhang
,
H.
and
Britt
,
L.
(
2008
)
Elucidation of phenotypic adaptations: molecular analyses of dim-light vision proteins in vertebrates
.
Proc. Natl Acad. Sci. U.S.A.
105
,
13480
13485
doi:
66
Konno
,
A.
,
Kitagawa
,
A.
,
Watanabe
,
M.
,
Ogawa
,
T.
and
Shirai
,
T.
(
2011
)
Tracing protein evolution through ancestral structures of fish galectin
.
Structure
19
,
711
721
doi:
67
Steindel
,
P.A.
,
Chen
,
E.H.
,
Wirth
,
J.D.
and
Theobald
,
D.L.
(
2016
)
Gradual neofunctionalization in the convergent evolution of trichomonad lactate and malate dehydrogenases
.
Protein Sci.
25
,
1319
1331
doi:
68
Perica
,
T.
,
Kondo
,
Y.
,
Tiwari
,
S.P.
,
McLaughlin
,
S.H.
,
Kemplen
,
K.R.
,
Zhang
,
X.
et al. 
(
2014
)
Evolution of oligomeric state through allosteric pathways that mimic ligand binding
.
Science.
346
,
1254346
doi:
69
Kim
,
H.
,
Zou
,
T.
,
Modi
,
C.
,
Dörner
,
K.
,
Grunkemeyer
,
T.J.
,
Chen
,
L.
et al. 
(
2015
)
A hinge migration mechanism unlocks the evolution of green-to-red photoconversion in GFP-like proteins
.
Structure
23
,
34
43
doi:
70
Zou
,
T.
,
Risso
,
V.A.
,
Gavira
,
J.A.
,
Sanchez-Ruiz
,
J.M.
and
Ozkan
,
S.B.
(
2015
)
Evolution of conformational dynamics determines the conversion of a promiscuous generalist into a specialist enzyme
.
Mol. Biol. Evol.
32
,
132
143
doi:
71
Thornton
,
J.W.
(
2004
)
Resurrecting ancient genes: Experimental analysis of extinct molecules
.
Nat. Rev. Genet.
5
,
366
375
doi:
72
Zhang
,
Y.-X.
,
Perry
,
K.
,
Vinci
,
V.A.
,
Powell
,
K.
,
Stemmer
,
W.P.C.
and
del Cardayré
,
S.B.
(
2002
)
Genome shuffling leads to rapid phenotypic improvement in bacteria
.
Nature
415
,
644
646
doi:
73
Bar-Rogovsky
,
H.
,
Stern
,
A.
,
Penn
,
O.
,
Kobl
,
I.
,
Pupko
,
T.
and
Tawfik
,
D.S.
(
2015
)
Assessing the prediction fidelity of ancestral reconstruction by a library approach
.
Protein Eng. Des. Sel.
28
,
507
518
doi:
74
Reisinger
,
B.
,
Sperl
,
J.
,
Holinski
,
A.
,
Schmid
,
V.
,
Rajendran
,
C.
,
Carstensen
,
L.
et al. 
(
2014
)
Evidence for the existence of elaborate enzyme complexes in the Paleoarchean Era
.
J. Am. Chem. Soc.
136
,
122
129
doi:
75
Bollback
,
J.P.
(
2006
)
SIMMAP: stochastic character mapping of discrete traits on phylogenies
.
BMC Bioinf.
7
,
88
doi:
76
Zakas
,
P.M.
,
Brown
,
H.C.
,
Knight
,
K.
,
Meeks
,
S.L.
,
Spencer
,
H.T.
,
Gaucher
,
E.A
et al.  et al.  (
2016
)
Enhancing the pharmaceutical properties of protein drugs by ancestral sequence reconstruction
.
Nat. Biotech
.
in press
doi:
77
Skovgaard
,
M.
,
Kodra
,
J.T.
,
Gram
,
D.X.
,
Knudsen
,
S.M.
,
Madsen
,
D.
and
Liberles
,
D.A.
(
2006
)
Using evolutionary information and ancestral sequences to understand the sequence-function relationship in GLP-1 agonists
.
J. Mol. Biol.
363
,
977
988
doi:
78
Jensen
,
R.A.
(
1976
)
Enzyme recruitment in evolution of new function
.
Annu. Rev. Microbiol.
30
,
409
425
doi:
79
Knauth
,
L.P.
and
Lowe
,
D.R.
(
1978
)
Oxygen isotope geochemistry of cherts from Onverwacht Group (3.4 billion years), Transvaal, South-Africa, with implications for secular variations in isotopic composition of cherts
.
Earth Planet. Sci. Lett.
41
,
209
222
doi:
80
Knauth
,
L.P.
and
Lowe
,
D.R.
(
2003
)
High Archean climatic temperature inferred from oxygen isotope geochemistry of cherts in the 3.5 Ga Swaziland Supergroup, South Africa
.
Geol. Soc. Am. Bull.
115
,
566
580
doi:
81
Gaucher
,
E.A.
,
Thomson
,
J.M.
,
Burgan
,
M.F.
and
Benner
,
S.A.
(
2003
)
Inferring the palaeoenvironment of ancient bacteria on the basis of resurrected proteins
.
Nature
425
,
285
288
doi:
82
Perez-Jimenez
,
R.
,
Inglés-Prieto
,
A.
,
Zhao
,
Z.-M.
,
Sanchez-Romero
,
I.
,
Alegre-Cebollada
,
J.
,
Kosuri
,
P.
et al. 
(
2011
)
Single-molecule paleoenzymology probes the chemistry of resurrected enzymes
.
Nat. Struct. Mol. Biol.
18
,
592
599
doi:
83
Romero-Romero
,
M.L.
,
Risso
,
V.A.
,
Martinez-Rodriguez
,
S.
,
Gaucher
,
E.A.
,
Ibarra-Molero
,
B.
and
Sanchez-Ruiz
,
J.M.
(
2016
)
Selection for protein kinetic stability connects denaturation temperatures to organismal temperatures and provides clues to archaean life
.
PLoS ONE
11
doi:
84
Trudeau
,
D.L.
,
Kaltenbach
,
M.
and
Tawfik
,
D.S.
(
2016
)
On the potential origins of the high stability of reconstructed ancestral proteins
.
Mol. Biol. Evol.
33
,
2633
2641
doi:
85
Cole
,
M.F.
and
Gaucher
,
E.A.
(
2011
)
Exploiting models of molecular evolution to efficiently direct protein engineering
.
J. Mol. Evol.
72
,
193
203
doi:
86
Miyazaki
,
J.
,
Nakaya
,
S.
,
Suzuki
,
T.
,
Tamakoshi
,
M.
,
Oshima
,
T.
and
Yamagishi
,
A.
(
2001
)
Ancestral residues stabilizing 3-isopropylmalate dehydrogenase of an extreme thermophile: Experimental evidence supporting the thermophilic common ancestor hypothesis
.
J. Biochem.
129
,
777
782
doi:
87
Watanabe
,
K.
,
Ohkuri
,
T.
,
Yokobori
,
S.-i.
and
Yamagishi
,
A.
(
2006
)
Designing thermostable proteins: ancestral mutants of 3-isopropylmalate dehydrogenase designed by using a phylogenetic tree
.
J. Mol. Biol.
355
,
664
674
doi:
88
Iwabata
,
H.
,
Watanabe
,
K.
,
Ohkuri
,
T.
,
Yokobori
,
S.-i.
and
Yamagishi
,
A.
(
2005
)
Thermostability of ancestral mutants of Caldococcus noboribetus isocitrate dehydrogenase
.
FEMS Microbiol. Lett.
243
,
393
398
doi:
89
Shimizu
,
H.
,
Yokobori
,
S.-i.
,
Ohkuri
,
T.
,
Yokogawa
,
T.
,
Nishikawa
,
K.
and
Yamagishi
,
A.
(
2007
)
Extremely thermophilic translation system in the common ancestor commonote: ancestral mutants of glycyl-tRNA synthetase from the extreme thermophile Thermus thermophilus
.
J. Mol. Biol.
369
,
1060
1069
doi:
90
Yamashiro
,
K.
,
Yokobori
,
S.
,
Koikeda
,
S.
and
Yamagishi
,
A.
(
2010
)
Improvement of Bacillus circulans beta-amylase activity attained using the ancestral mutation method
.
Protein Eng. Des. Sel.
23
,
519
528
doi:
91
Alcolombri
,
U.
,
Elias
,
M.
and
Tawfik
,
D.S.
(
2011
)
Directed evolution of sulfotransferases and paraoxonases by ancestral libraries
.
J. Mol. Biol.
411
,
837
853
doi:
92
Chen
,
F.
,
Gaucher
,
E.A.
,
Leal
,
N.A.
,
Hutter
,
D.
,
Havemann
,
S.A.
,
Govindarajan
,
S.
et al. 
(
2010
)
Reconstructed evolutionary adaptive paths give polymerases accepting reversible terminators for sequencing and SNP detection
.
Proc. Natl Acad. Sci. U.S.A.
107
,
1948
1953
doi:
93
Steipe
,
B.
,
Schiller
,
B.
,
Plückthun
,
A.
and
Steinbacher
,
S.
(
1994
)
Sequence statistics reliably predict stabilizing mutations in a protein domain
.
J. Mol. Biol.
240
,
188
192
doi:
94
Porebski
,
B.T.
and
Buckle
,
A.M.
(
2016
)
Consensus protein design
.
Protein Eng. Des. Sel.
29
,
245
251
doi:
95
Steipe
,
B.
(
2004
)
Consensus-based engineering of protein stability: from intrabodies to thermostable enzymes
.
Protein Eng.
388
,
176
186
doi:
96
Arenas
,
M.
and
Posada
,
D.
(
2010
)
Computational design of centralized HIV-1 genes
.
Current HIV Res.
8
,
613
621
doi:
97
Akanuma
,
S.
,
Iwami
,
S.
,
Yokoi
,
T.
,
Nakamura
,
N.
,
Watanabe
,
H.
,
Yokobori
,
S.-i.
et al. 
(
2011
)
Phylogeny-based design of a B-subunit of DNA gyrase and its ATPase domain using a small set of homologous amino acid sequences
.
J. Mol. Biol.
412
,
212
225
doi:
98
Risso
,
V.A.
,
Gavira
,
J.A.
,
Gaucher
,
E.A.
and
Sanchez-Ruiz
,
J.M.
(
2014
)
Phenotypic comparisons of consensus variants versus laboratory resurrections of Precambrian proteins
.
Proteins Struct. Funct. Bioinf.
82
,
887
896
doi:
99
Tokuriki
,
N.
,
Stricher
,
F.
,
Serrano
,
L.
and
Tawfik
,
D.S.
(
2008
)
How protein stability and new functions trade off
.
PLoS Comput. Biol.
4
,
e1000002
doi:
100
Hobbs
,
J.K.
,
Shepherd
,
C.
,
Saul
,
D.J.
,
Demetras
,
N.J.
,
Haaning
,
S.
,
Monk
,
C.R.
et al. 
(
2012
)
On the origin and evolution of thermophily: reconstruction of functional Precambrian enzymes from Ancestors of Bacillus
.
Mol. Biol. Evol.
29
,
825
835
doi:
101
Hart
,
K.M.
,
Harms
,
M.J.
,
Schmidt
,
B.H.
,
Elya
,
C.
,
Thornton
,
J.W.
and
Marqusee
,
S.
(
2014
)
Thermodynamic system drift in protein evolution
.
PLoS Biol.
12
,
e1001994
doi:
102
Williams
,
P.D.
,
Pollock
,
D.D.
,
Blackburne
,
B.P.
and
Goldstein
,
R.A.
(
2006
)
Assessing the accuracy of ancestral protein reconstruction methods
.
PLoS Comput. Biol.
2
,
e69
doi:
103
Wijma
,
H.J.
,
Floor
,
R.J.
and
Janssen
,
D.B.
(
2013
)
Structure- and sequence-analysis inspired engineering of proteins for enhanced thermostability
.
Curr. Opin. Struct. Biol.
23
,
588
594
doi:
104
Whitfield
,
J.H.
,
Zhang
,
W.H.
,
Herde
,
M.K.
,
Clifton
,
B.E.
,
Radziejewski
,
J.
,
Janovjak
,
H.
et al. 
(
2015
)
Construction of a robust and sensitive arginine biosensor through ancestral protein reconstruction
.
Protein Sci.
24
,
1412
1422
doi:
105
Howard
,
C.
,
Hanson-Smith
,
V.
,
Kennedy
,
K.J.
,
Miller
,
C.J.
,
Lou
,
H.J.
,
Johnson
,
A.D.
et al. 
(
2014
)
Ancestral resurrection reveals evolutionary mechanisms of kinase plasticity
.
eLife
3
, doi:
106
Bar-Rogovsky
,
H.
,
Hugenmatter
,
A.
and
Tawfik
,
D.S.
(
2013
)
The evolutionary origins of detoxifying enzymes: the mammalian serum paraoxonases (PONs) relate to bacterial homoserine lactonases
.
J. Biol. Chem.
288
,
23914
23927
doi:
107
Khersonsky
,
O.
,
Roodveldt
,
C.
and
Tawfik
,
D.S.
(
2006
)
Enzyme promiscuity: evolutionary and mechanistic aspects
.
Curr. Opin. Chem. Biol.
10
,
498
508
doi:
108
Peisajovich
,
S.G.
and
Tawfik
,
D.S.
(
2007
)
Protein engineers turned evolutionists
.
Nat. Methods
4
,
991
994
doi:
109
Anderson
,
D.P.
,
Whitney
,
D.S.
,
Hanson-Smith
,
V.
,
Woznica
,
A.
,
Campodonico-Burnett
,
W.
,
Volkman
,
B.F.
et al. 
(
2016
)
Evolution of an ancient protein function involved in organized multicellularity in animals
.
eLife
5
,
e10147
doi:
110
Sassi
,
S.O.
,
Braun
,
E.L.
and
Benner
,
S.A.
(
2007
)
The evolution of seminal ribonuclease: pseudogene reactivation or multiple gene inactivation events?
Mol. Biol. Evol.
24
,
1012
1024
doi:
111
Gaucher
,
E.A.
,
Gu
,
X.
,
Miyamoto
,
M.M.
and
Benner
,
S.A.
(
2002
)
Predicting functional divergence in protein evolution by site-specific rate shifts
.
Trends Biochem. Sci.
27
,
315
321
doi:
112
Gu
,
X.
and
Vander Velden
,
K.
(
2002
)
DIVERGE: phylogeny-based analysis for functional-structural divergence of a protein family
.
Bioinformatics
18
,
500
501
doi:
113
Zhu
,
G.
,
Golding
,
G.B.
and
Dean
,
A.M.
(
2005
)
The selective cause of an ancient adaptation
.
Science
307
,
1279
1282
doi:
114
Chen
,
R.
,
Greer
,
A.
and
Dean
,
A.M.
(
1995
)
A highly-active decarboxylating dehydrogenase with rationally inverted coenzyme specificity
.
Proc. Natl Acad. Sci. U.S.A.
92
,
11666
11670
doi:
115
Chang
,
B.S.W.
,
Jonsson
,
K.
,
Kazmi
,
M.A.
,
Donoghue
,
M.J.
and
Sakmar
,
T.P.
(
2002
)
Recreating a functional ancestral archosaur visual pigment
.
Mol. Biol. Evol.
19
,
1483
1489
doi:
116
Chang
,
B.S.W.
,
Matz
,
M.V.
,
Field
,
S.F.
,
Müller
,
J.
and
van Hazel
,
I.
(
2007
) Dealing with model uncertainty in reconstructing ancestral proteins in the laboratory: Examples from ancestral visual pigments and GFP-like proteins. In
Ancestral Sequence Reconstruction
(
Liberles
,
D.A.
, ed.),
Oxford University Press
,
Oxford
117
Field
,
S.F.
and
Matz
,
M.V.
(
2010
)
Retracing evolution of red fluorescence in GFP-like proteins from Faviina corals
.
Mol. Biol. Evol.
27
,
225
233
doi:
118
Yokoyama
,
S.
and
Radlwimmer
,
F.B.
(
2001
)
The molecular genetics and evolution of red and green color vision in vertebrates
.
Genetics
158
,
1697
1710
PMID:
[PubMed]
119
Konno
,
A.
,
Ogawa
,
T.
,
Shirai
,
T.
and
Muramoto
,
K.
(
2007
)
Reconstruction of a probable ancestral form of conger eel galectins revealed their rapid adaptive evolution process for specific carbohydrate recognition
.
Mol. Biol. Evol.
24
,
2504
2514
doi:
120
Miller
,
S.R.
,
McGuirl
,
M.A.
and
Carvey
,
D.
(
2013
)
The evolution of RuBisCO stability at the thermal limit of photoautotrophy
.
Mol. Biol. Evol.
30
,
752
760
doi:
121
Kratzer
,
J.T.
,
Lanaspa
,
M.A.
,
Murphy
,
M.N.
,
Cicerchi
,
C.
,
Graves
,
C.L.
,
Tipton
,
P.A.
et al. 
(
2014
)
Evolutionary history and metabolic insights of ancient mammalian uricases
.
Proc. Natl Acad. Sci. U.S.A.
111
,
3763
3768
doi:
122
McLean
,
R.
,
Hobbs
,
J.K.
,
Suits
,
M.D.
,
Tuomivaara
,
S.T.
,
Jones
,
D.R.
,
Boraston
,
A.B.
et al. 
(
2015
)
Functional analyses of resurrected and contemporary enzymes illuminate an evolutionary path for the emergence of exolysis in polysaccharide lyase family 2
.
J. Biol. Chem.
290
,
21231
21243
doi:
123
Castro-Fernandez
,
V.
,
Bravo-Moraga
,
F.
,
Ramirez-Sarmiento
,
C.A.
and
Guixe
,
V.
(
2014
)
Emergence of pyridoxal phosphorylation through a promiscuous ancestor during the evolution of hydroxymethyl pyrimidine kinases
.
FEBS Lett.
588
,
3068
3073
doi:
124
Butzin
,
N.C.
,
Lapierre
,
P.
,
Green
,
A.G.
,
Swithers
,
K.S.
,
Gogarten
,
J.P.
and
Noll
,
K.M.
(
2013
)
Reconstructed ancestral Myo-inositol-3-phosphate synthases indicate that ancestors of the Thermococcales and Thermotoga species were more thermophilic than their descendants
.
PLoS ONE
8
,
e84300
doi:
125
Finnigan
,
G.C.
,
Hanson-Smith
,
V.
,
Stevens
,
T.H.
and
Thornton
,
J.W.
(
2012
)
Evolution of increased complexity in a molecular machine
.
Nature
481
,
360
143
doi:
126
Baker
,
C.R.
,
Hanson-Smith
,
V.
and
Johnson
,
A.D.
(
2013
)
Following gene duplication, paralog interference constrains transcriptional circuit evolution
.
Science
342
,
104
108
doi:
127
Wilgenbusch
,
J.C.
and
Swofford
,
D.
(
2003
)
Inferring evolutionary trees with PAUP*
.
Curr. Protoc. Bioinf.
Chapter 6, Unit 6 4
doi:
128
Hanson-Smith
,
V.
and
Johnson
,
A.
(
2016
)
Phylobot: a web portal for automated phylogenetics, ancestral sequence reconstruction, and exploration of mutational trajectories
.
PLoS Comput. Biol.
12
,
e1004976
doi:
129
Edwards
,
R.J.
and
Shields
,
D.C.
(
2004
)
GASP: Gapped Ancestral Sequence Prediction for proteins
.
BMC Bioinf.
5
,
123
doi:
130
Joy
,
J.B.
,
Liang
,
R.H.
,
McCloskey
,
R.M.
,
Nguyen
,
T.
and
Poon
,
A.F.Y.
(
2016
)
Ancestral reconstruction
.
PLoS Comput. Biol.
12
,
e1004763
doi:
131
Carroll
,
S.M.
,
Ortlund
,
E.A.
and
Thornton
,
J.W.
(
2011
)
Mechanisms for the evolution of a derived function in the ancestral glucocorticoid receptor
.
PLoS Genet.
7
,
e1002117
doi:
132
Kohn
,
J.A.
,
Deshpande
,
K.
and
Ortlund
,
E.A.
(
2012
)
Deciphering modern glucocorticoid cross-pharmacology using ancestral corticosteroid receptors
.
J. Biol. Chem.
287
,
16267
16275
doi:
133
Risso
,
V.A.
,
Manssour-Triedo
,
F.
,
Delgado-Delgado
,
A.
,
Arco
,
R.
,
Barroso-DelJesus
,
A.
,
Ingles-Prieto
,
A.
et al. 
(
2015
)
Mutational studies on resurrected ancestral proteins reveal conservation of site-specific amino acid preferences throughout evolutionary history
.
Mol. Biol. Evol.
32
,
440
455
doi:
134
Kim
,
H.
,
Grunkemeyer
,
T.J.
,
Modi
,
C.
,
Chen
,
L.
,
Fromme
,
R.
,
Matz
,
M.V.
et al. 
(
2013
)
Acid-base catalysis and crystal structures of a least evolved ancestral GFP-like protein undergoing green-to-red photoconversion
.
Biochemistry
52
,
8048
8059
doi:
135
Boucher
,
J.I.
,
Jacobowitz
,
J.R.
,
Beckett
,
B.C.
,
Classen
,
S.
and
Theobald
,
D.L.
(
2014
)
An atomic-resolution view of neofunctionalization in the evolution of apicomplexan lactate dehydrogenases
.
eLife
3
, doi:
136
Clifton
,
B.E.
and
Jackson
,
C.J.
(
2016
)
Ancestral protein reconstruction yields insights into adaptive evolution of binding specificity in solute-binding proteins
.
Cell. Chem. Biol.
23
,
236
245
doi:
137
Kumar
,
S.
and
Hedges
,
S.B.
(
1998
)
A molecular timescale for vertebrate evolution
.
Nature
392
,
917
920
doi:
138
Battistuzzi
,
F.U.
,
Feijao
,
A.
and
Hedges
,
S.B.
(
2004
)
A genomic timescale of prokaryote evolution: insights into the origin of methanogenesis, phototrophy, and the colonization of land
.
BMC Evol. Biol.
4
,
44
doi:
139
Hedges
,
S.B.
,
Blair
,
J.E.
,
Venturi
,
M.L.
and
Shoe
,
J.L.
(
2004
)
A molecular timescale of eukaryote evolution and the rise of complex multicellular life
.
BMC Evol. Biol.
4
,
2
doi:
140
Alderson
,
R.G.
,
Barker
,
D.
and
Mitchell
,
J.B.O.
(
2014
)
One origin for metallo-beta-lactamase activity, or two? An investigation assessing a diverse set of reconstructed ancestral sequences based on a sample of phylogenetic trees
.
J. Mol. Evol.
79
,
117
129
doi: