Apoptotic caspases evolved with metazoans more than 950 million years ago (MYA), and a series of gene duplications resulted in two subfamilies consisting of initiator and effector caspases. The effector caspase genes (caspases-3, -6, and -7) were subsequently fixed into the Chordata phylum more than 650 MYA when the gene for a common ancestor (CA) duplicated, and the three effector caspases have persisted throughout mammalian evolution. All caspases prefer an aspartate residue at the P1 position of substrates, so each caspase evolved discrete cellular roles through changes in substrate recognition at the P4 position combined with allosteric regulation. We examined the evolution of substrate specificity in caspase-6, which prefers valine at the P4 residue, compared with caspases-3 and -7, which prefer aspartate, by reconstructing the CA of effector caspases (AncCP-Ef1) and the CA of caspase-6 (AncCP-6An). We show that AncCP-Ef1 is a promiscuous enzyme with little distinction between Asp, Val, or Leu at P4. The specificity of caspase-6 was defined early in its evolution, where AncCP-6An demonstrates a preference for Val over Asp at P4. Structures of AncCP-Ef1 and of AncCP-6An show a network of charged amino acids near the S4 pocket that, when combined with repositioning a flexible active site loop, resulted in a more hydrophobic binding pocket in AncCP-6An. The ancestral protein reconstructions show that the caspase-hemoglobinase fold has been conserved for over 650 million years and that only three substitutions in the scaffold are necessary to shift substrate selection toward Val over Asp.
The caspase family of proteases offers an attractive model for examining protein evolution because a common protein scaffold (the caspase-hemoglobinase fold) was used to develop subfamilies that differ in oligomeric states, enzyme specificity, and allosteric regulation. Caspase genes predate multicellularity  and are represented in all kingdoms of life . Caspase proteases are thought to have evolved from an ancestral immune system into two general classes, inflammatory or apoptotic caspases, consisting of over twelve proteins [3,4]. Within the apoptotic caspases, two subfamilies further evolved into apoptotic initiators or effectors [3,5]. Following additional gene duplications, an ancestral initiator caspase gave rise to four genes (caspases-8,-10,-18, c-FLIP), while an ancestral effector caspase gave rise to three genes (caspases-3,-6,-7) (Figure 1A) .
Ancestral caspase proteins.
Caspases are produced in the cell as inactive zymogens, and the oligomeric form of the zymogen is a key to regulation [7–9]. The initiator caspases are monomers, and the ability to form heterodimers versus homodimers in response to cellular conditions is a key characteristic in cell fate decisions regarding the activation of necroptosis or apoptosis pathways [10,11]. In contrast, effector caspases evolved as obligate homodimers that are processed by initiator caspases, so discrete cellular functions of effector caspases developed through a combination of changes in substrate specificity and allosteric regulation [9,12]. Studies of enzyme families have identified features that contribute to enzyme specificity [13–15]. Enzyme active sites provide the stereo-selective environment for reaction ground or transition states, or the protein scaffold provides the proper conformational dynamics that facilitate substrate binding and reaction chemistry [16,17], so there may be multiple combinations of residues that provide the proper environment within the context of the protein scaffold [18,19].
Caspases cleave target proteins through recognition of a tetrapeptide motif with the noted exception of caspase-2, which recognizes a pentapeptide sequence . In some cases, enzyme specificity is coupled to exosites that facilitate substrate selection [21–25]. Positions P1–P4 on the peptide are co-ordinated by their corresponding substrate pocket, S1–S4 in the active-site, and the P1 residue is almost always an aspartate . Because specificity is determined primarily by the amino acid at the P4 position, caspases are sub-categorized into three groups based on recognition for the P4 amino acid: group I prefers a bulky residue (W,H); group II prefers hydrophilic residues (D,E); and group III prefers aliphatic residues (I,L,V) (Figure 1B). Although the effector caspases are relatively closely related, caspases-3 and -7 are characterized as group II specificity, while caspase-6 shows group III specificity. The selection at P4 (D vs V) results in overlapping but nonidentical substrate profiles based on degradome analyses [27–30]. In the evolution of chordates, new caspase substrate specificities were important in developmental stages of the brain and nervous systems [31–34], so what may appear to be subtle changes in enzyme selection have large consequences in cellular development.
Current models suggest that modern enzymes evolved from promiscuous ancestral proteins through amino acid substitutions that were coupled to the selection of pre-existing suboptimal activity [35–37]. While the caspase-8 subfamily evolved into cell fate determinants, with largely uniform substrate selection ((I/L)EXD), changes in effector caspases resulted in two distinct specificities-DxxD versus VxxD. The evolutionary trajectories that resulted in the distinct substrate specificities are not known. Although horizontal studies that compare extant enzymes may identify the importance of key active site residues, they rarely uncover the set of residues that are responsible for functional diversity in large protein families . Generally, substitutions from horizontal studies lack evolutionary context, where protein epistasis affects the specific combination of amino acids along discrete evolutionary paths [39,40]. Directed evolutionary approaches expand the sequence space that can be examined, and such studies identified a combination of amino acids that relax specificity in caspase-7, resulting in a shift in substrate cleavage profiles in cellulo of evolved-caspase-7 enzymes . Evolutionary biochemical methods further expand the sequence space to include the entire protein, yet the methods simultaneously narrow the scope of the problem by also examining changes that occurred between common ancestral proteins .
To determine the evolutionary pathways leading to VxxD versus DxxD specificity in effector caspases, we reconstructed sequences and resurrected ancestral proteins for the common ancestor (CA) of caspases-3, -6, and -7 (called AncCP-Ef1) and for the CA of the caspase-6 branch (AncCP-6An), where ‘An' refers to anamniotes. We show that AncCP-Ef1 is indeed a promiscuous enzyme that exhibits low activity and little preference for Val, Asp, or Leu at the P4 position. The selection of Val over Asp occurred early in the evolution of caspase-6, with AncCP-6An. Structures of AncCP-Ef1 and of AncCP-6An determined by X-ray crystallography show a unique mechanism of introducing a network of charged amino acids that increase the hydrophobicity of the S4 binding pocket in the caspase-6 evolutionary pathway. Introduction of the network into AncCP-Ef1 shifted specificity to Val over Asp, as in caspase-6. Together, the data demonstrate that only three amino acid substitutions are required in the ancestral scaffold to shift specificity in the caspase-6 proteins.
Materials and methods
Ancestral protein reconstruction
Two lists of taxa were used for ancestral protein reconstruction (APR). One list (APR_1, Supplementary Information, Table S1) was generated using a precursor to the CaspBase  (caspbase.org), and one list (APR_2, Supplementary Information, Table S2) queried the CaspBase. Each list has high sequence coverage spanning the majority of known proteins within the caspase family. While APR_1 has a total of 253 caspase sequences, APR_2 has a total of 258 caspase sequences. APR_2 emphasizes mammal lineages over non-mammal lineages, with 127 mammal caspase sequences, while APR_1 has 82 mammalian caspase sequences. There is an overlap of 39.6% between the two data sets. Care was taken in both lists to mitigate erroneous sequences, to eliminate incomplete lineages sorting by including high coverage across all representative taxa from all major vertebrate groups, and to incorporate full gene tree representation of each known caspase family member within each vertebrate group. The prodomain was pruned from our sequences because the prodomains have high sequence variations and lengths in the caspase family, and their inclusion results in missing data and noise to downstream analyses. The multiple sequence alignment (MSA) was computed using PROMALS3D [43,44]. Alignments were checked in Geneious  to assess alignment accuracy, and we utilized Prottest 3  to generate the proper model for phylogenetic analysis using AICc (Akaike Information Criterion) weights to gather the highest probable model of protein evolution . The phylogenetic tree was generated using IQTREE , using a combination of hill-climbing approaches and stochastic perturbation methods for accuracy and time-efficiency, and the tree was bootstrapped 1000 times as a test of phylogeny . The tree was examined to remove erroneous sequences, mislabels, and to mitigate missing data, resulting in a highly effective alignment for APR. The APRs were constructed with FastML , using codon-bases reconstruction models for accuracy since the models were generated from whole annotated genomes with complete metadata. We used a LG model of substitution  generated by Prottest 3, and our framework used maximum likelihood (ML) for indel reconstruction. We provided our ML tree as a guide, optimizing branch lengths with highly divergent sequences, set gamma distribution, and computed a joint reconstruction to generate APRs at each node of interest. Sequences were codon-optimized for expression in E. coli, cloned into pET11a vector and included a C-terminal His6-tag (GenScript, U.S.A.). The AncCP-6An was also designed similarly to the caspase-6 CT (constitutive two-chain) construct described previously . All proteins were purified as described previously [53–55].
Crystallization and data collection
Each protein was dialyzed in a buffer of 10 mM Tris–HCl, pH 8.5, and 1 mM DTT, concentrated to 8–10 mg/ml, and stored at −80°C. The molar extinction coefficients for the APRs were determined by ProtParam under reduced conditions  (Supplementary Information, Table S3). The inhibitors (Ac-DEVD-CHO (acetyl-Asp-Glu-Val-Asp-aldehyde) or Ac-VEID-CHO (acetyl-Val-Glu-Ile-Asp-aldehyde) in DMSO) were added at 1 : 5 (w/w) ratio of protein : inhibitor, and solutions were incubated on ice for 1 h in the dark. Initial crystallization conditions were found using Hampton crystal screens (crystal screen 1 and PEG/ion screen 1). For each well, a solution of the screen (490 µl), DTT (5 µl of 1 M solution), and sodium azide (5 µl of 300 mM solution) were added. Crystals grew using the hanging drop vapor diffusion method at 18°C using 4 µl drops that contained equal volumes of protein and reservoir solutions. For AncCP-Ef1(DEVD), optimal conditions were found in a solution of 0.2 M ammonium acetate, pH 4.6, 0.1 M sodium acetate trihydrate, 30% PEG 4000. For AncCP-6An(VEID), optimal conditions were found in a solution of 0.2 M ammonium fluoride, pH 6.2, 20% PEG 3350. Crystals were flash frozen in liquid nitrogen following the addition of 20% MPD (2-methylpentane-2,4-diol) or 20% glycerol plus well buffer. Data were collected at 100 K at the SER-CAT synchrotron beamline (Advance Photon Source, Argonne National Laboratory Argonne, IL, U.S.A.). Each data set contained 180 frames at 1° rotation. The proteins crystallized in the orthorhombic space group P212121 and were phased with a previously published human CASP3 structure (PDB entry 2J30). Data reduction and model refinements were done using HKL2000, COOT, and Phenix, and a summary of the data collection and refinement statistics is shown in Supplementary Information, Table S4.
Enzyme specificity was determined by the substrate-phage display as described previously . Briefly, phage libraries consisting of caspase recognition sequences, with either random or fixed (aspartate) P1 position, were bound to Ni-NTA resin. An enzyme (10–100 nM) was added to initiate the reaction, and samples were incubated between 1 and 20 h. E. coli ER2738 cells were used to amplify the previous round by infecting cells with the supernatant from the reaction. The cells were grown for 4 h, and the supernatant was collected and used as the library for the following round of selection. Colony counting was used to determine the endpoint of the experiment, when the amount of library bound to the resin was similar to the amount released during the treatment. The amount of phage released during the reaction versus the control (without enzyme) was monitored to ensure progress in substrate selectivity.
Enzyme activity assay
Enzyme activity was determined in a buffer of 150 mM Tris–HCl, pH 7.5, 50 mM NaCl, 10 mM DTT, 1% sucrose, 0.1% CHAPS (assay buffer) at 25°C, as previously described [58,59]. The total reaction volume was 200 µl, and the final enzyme concentration was 10 nM. Following the addition of substrate (Ac-DEVD-AFC (acetyl-Asp-Glu-Val-Asp-7-amino-4-trifluoromethylcoumarin), Ac-LETD-AFC (Leu-Glu-Thr-Asp-7-amino-4-trifluoromethylcoumarin), or Ac-VEID-AFC (acetyl-Val-Glu-Ile-Asp-7-amino-4-trifluoromethylcoumarin), the samples were excited at 400 nm, and emission was monitored at 505 nm for 60 s. The steady-state parameters, KM and kcat, were determined from plots of initial velocity versus substrate concentration and are presented in Supplementary Information, Table S5.
Crystal structures have been deposited in the Protein Data Bank, www.wwpdb.org (PDB ID codes: 6PDQ, AncCP-Ef1(DEVD); 6PPM, AncCP-6An)
Reconstruction of ancestral effector caspases
A comparison of human caspases-3 and -6 shows that there are 118 differences out of 237 amino acids, not including the intersubunit linker or prodomain (Figure 1C). Many of the differences (Figure 2A, orange) map to three active site loops (called L1, L3, and L4) as well as the five α-helices on the surface of the protein. In particular, helix 2 (H2) and the adjoining short surface strands (β1–β3) are known to undergo a coil-to-helix transition in caspase-6, which allosterically inhibits the enzyme . Likewise, in caspase-3 a conserved network of water molecules binds to helices 1, 4, and 5 to couple an allosteric site in the dimer interface to changes in the active site . While each of the 118 sites that differ between caspases-3 and -6 could be considered for horizontal mutations, we reasoned that differences near the active site were more likely to affect substrate specificity while changes in the surface helices may be coupled to allosteric regulation. In addition, rather than simply swapping activity between the two enzymes, we wanted to determine the evolutionary changes in caspases that resulted in a specificity of the extant enzymes.
Comparison of changes in extant and ancestral caspases.
To examine changes that occurred in the active site binding pockets, we first queried the CaspBase  for effector caspase sequences, and based on the 667 sequences returned, we generated a consensus sequence for effector caspases (Figure 1C). We previously developed the common position (CP) numbering scheme in order to compare positions in evolutionarily divergent caspases , where the CP system describes CPs in all caspases as well as more divergent regions, such as active site loops (called gapped positions, or GP). Here, the amino acid position for each caspase is shown by superscript, whereas the CP is preceded by ‘CP-.'
The active site S1 binding pocket of caspase-6 is the same as all other caspases and consists of Arg64 (CP-018), Q161 (CP-115), and Arg220 (CP-161) (Supplementary Information, Figure S1). The S2 binding pocket is formed by the side chains of Tyr217 (CP-158), His168 (CP-122), and His219 (CP-160). While Tyr217 (CP-158) is highly conserved in effector caspases, CP-122 and CP-160 are both highly conserved (>90%) histidine residues that are unique to caspase-6. The S3 binding pocket is partially formed with Arg220 (CP-161), and the S4 binding pocket is formed by His219 (CP-160), Glu221 (CP-162), Trp227 (CP-168), and Val261 (GP9-V01) . In caspase-6, CP-162 is conserved as glutamate, whereas caspases with group II substrate specificity (caspase-2, -3, and -7) utilize asparagine (Figure 1C). In addition, GP9-01 is conserved as valine in caspase-6, whereas in the group II caspases the position is not completely conserved, although glutamate is used most frequently.
To determine the evolutionary changes in the active site that led to modern substrate selection, we reconstructed common ancestral proteins for the effector caspases and for caspase-6 (Figure 1A). APR techniques can reveal amino acid substitutions that result in neofunctionalization of proteins by creating an evolutionary map leading to the new functions . The effector caspases diverged from a CA ∼650 million years ago into the caspase-6 and caspase-3/-7 branches, and caspases-3 and -7 diverged into separate branches later (Figure 1A). Utilizing data from the CaspBase, we reconstructed proteins at the node that represents the CA of caspase-3/6/7, called AncCP-Ef1, and for the CA of caspase-6, called AncCP-6An (Figure 1C). The data show only 42 sites changed from AncCP-Ef1 to AncCP-6An, referred to as vertical substitutions, with few changes in active site loops L1 and L4 (Figures 1C and 2B).
APR analyses result in site-specific probabilities for each position in the protein, with the ultimate goal to examine the characteristics of the protein rather than the precise ancestral sequence [63,64]. Thus, the predictions made with APR may differ for each input dataset such that the nodes of a phylogenetic tree actually represent a pool of possible ancestors. Due to evolutionary divergence for each site, the APR analysis identifies sites that may be ambiguous, defined here as a site that has <70% probability. For the caspase APRs, ambiguous sites generally occurred in two types, those in which two amino acids show nearly equal probability (∼50% each) in leading to extant proteins (here referred to as A1 or ambiguity type 1), or those in which multiple amino acids show lower probabilities (here referred to as A2 or ambiguity type 2). In the first case, A1, the amino acids are generally conserved, but differ, for two branches of a family. In the second case, A2, the sites are generally less conserved among family members. For AncCP-Ef1, we mapped the site-specific probability onto the protein structure (Figure 2C), and the data show low ambiguity in the active site and protein core, and higher ambiguity in the surface helices, particularly helices 2 and 3. The data are similar to conservation maps of extant enzymes, which display lower conservation in the surface helices . Together, the data suggest that the ambiguous sites are largely due to highly variant regions in extant caspases.
To examine the robustness of the AncCP-Ef reconstruction, we carried out two separate APR experiments by using different datasets and comparing their posterior probabilities. The two proteins, called AncCP-Ef1 and AncCP-Ef2, represent the same pool of possible ancestors from the different reconstructions, and both proteins were resurrected to corroborate the experimental results. For cloning purposes, we used the prodomain sequences from caspases-3 or -6 from Homo sapiens and intersubunit linker (IL) sequences computed from the APRs (Figure 1C). Previous studies showed that the prodomain does not affect enzymatic function in vitro , and the sequences are removed during zymogen maturation. In addition, we estimated the length of the IL as the average length in the organisms that diverged around the time of the ancestral node, and the reconstructed residues were used for the IL of that node. For example, AncCP-6An represents the ancestor of caspase-6 up to bony fish, so we used the average length of the IL for Actinopterygii (ray-finned fishes) for reconstructing AncCP-6An. In this case, the IL is three residues shorter than that of human caspase-6 (Figure 1C).
A comparison of the protein sequences shows that while human caspases-3 and -6 are only 40% identical, AncCP-Ef1 and AncCP-6An have higher sequence identity with each other (68%) and with extant enzymes (54–75%) (Supplementary Information, Figure S2). In addition, the data show that the ancestral enzymes are more acidic compared with extant enzymes, with calculated pI of 5.2–5.4 versus 5.7–6.5, due to higher percentages of glutamate and aspartate (Supplementary Information, Table S3). The two reconstructions from the same ancestral node, AncCP-Ef1 and AncCP-Ef2, have 81% sequence identity and differ in 39 sites, not including the IL (Figure 1C and Supplemental Information, Figure S3). While several sites in helices 1 and 4 vary, many of the differences between AncCP-Ef1 and AncCP-Ef2 occur in active site loops L1 and L4. For example, in AncCP-Ef1 the sequence (CP-195)VSLRS(CP-199) is more similar to that of human caspase-6 (VSQRR) than that of human caspase-3 (VATEF) (Figure 1C), while the opposite is true for AncCP-Ef2 (VAQKF). In both reconstructions, however, the amino acid at position GP9-01, which forms part of the S4 binding pocket, is well-determined as glutamate. As described below, the differences between AncCP-Ef1 and AncCP-Ef2 have little effect on enzyme activity.
Ancestral effector caspase was a promiscuous enzyme
We examined the enzyme activity of AncCP-Ef1 and of AncCP-Ef2 against three tetrapeptide sequences (DEVD, VEID, and LETD), representing specificities for caspases-3, -6, and -8, respectively (Supplementary Information, Table S5). As described previously, kcat/KM values are in the range of ∼0.5–2 × 105 M−1 s−1 for human caspases on the optimal P4 residue [20,26,66], with kcat values generally ∼0.3 to 1 s−1 and KM values of ∼15 µM. In the case of AncCP-Ef1, kcat values were similar to those of extant caspases, but the activity was >5-fold lower due to significantly higher KM values (>90 µM), resulting in activities of ∼103 M−1 s−1. A comparison of the two ancestral effector caspase reconstructions shows that AncCP-Ef1 and AncCP-Ef2 exhibited similar activity for DEVD substrate (2.8–6.8 × 103 M−1 s−1), but the activity of AncCP-EF2 for VEID was below the detection limit for the assay (∼5 × 102 M−1 s−1) due to high KM values (Supplementary Information, Table S5). Together, the data for AncCP-Ef1 and AncCP-Ef2 show that the CA of effector caspases has low activity for the three substrates, due to high KM values, indicating weaker substrate binding compared with extant enzymes. There was a significant improvement in the activity of AncCP-6An compared with the CA, particularly regarding valine at P4 (Supplementary Information, Table S5). In this case, we find ∼3-fold increase in kcat/KM with Val, little change with P4 Leu, and no activity against Asp at P4. The change in activity for AncCP-6An was due to a decrease in KM.
We also compared the substrate specificity of AncCP-Ef1 and of AncCP-6An using substrate-phage selection. As described previously , we utilized two substrate-phage libraries, either with randomized P5–P1′ positions or with aspartate fixed at the P1 position, and the data were combined to define substrate selection. As shown in Figure 3A, HsCaspase-6 shows preference (P5–P1′) for (L/S)(T/V)EVDA. The results are similar to those of previous studies that used quantitative mass spectrometry  or fluorescence reporter screens . In contrast, AncCP-Ef1 was less selective for the P4 amino acid, with Leu, Ile, Asp being the most prevalent (Figure 3B) and an overall P5–P1′ selection of T(L/I/D)E(T/V)DG. In contrast, AncCP-6An showed a preference for Val at P4, with an overall P5–P1′ selection of Y(V/T)LTDS (Figure 3C), which is consistent with the change in specificity as shown in the tetrapeptide activity studies. Together, the enzyme activity data on small peptides and on substrate-phage libraries show that the ancestral effector caspase had >10-fold lower activity compared with extant enzymes, and with little preference for Asp, Val, or Leu. In contrast, the specificity of caspase-6 for hydrophobic versus charged amino acids at P4 arose in the early caspase-6 ancestor.
Substrate preferences determined by substrate-phage display.
Structures of ancestral caspases show evolutionary changes in enzyme selection
We determined the structures of AncCP-Ef1 and of AncCP-6An with either DEVD or VEID, respectively, bound in the active site. The proteins crystalized in the P212121 space group between 1.83 Å and 2.61 Å resolution (Supplementary Information, Table S4). The data show that the structures are very similar to extant caspases, with <0.5 Å RMSD (Figure 4A), demonstrating that the caspase-hemoglobinase fold has been conserved for more than 650 million years. For AncCP-Ef1, active site loop 4 is partially disordered. For example, we observe no electron density for residues GP9-S03 to GP9-A08, although there is good electron density for the P4 Asp (Figure 4B). In the case of AncCP-6An (VEID), loop 4 and the P4 Val are well-ordered (Figure 4C). We note that the two proteins crystallized in different unit cells, so crystal packing may contribute to the ordering of loop 4 in AncCP-6An. However, loop 4 is often disordered in caspase crystal structures, but typically not when substrate-based inhibitors are covalently bound in the active site. The disorder observed in loop 4 with substrate bound suggests that the DEVD substrate is not well recognized. In Figure 4B, it is surprising that a side chain from the inhibitor has poor density. To our knowledge, there are no other cases of structures of inhibited caspases in which a side chain from the inhibitor has poor density. The lack of density suggests that the substrate-based inhibitor is not well bound.
Structures of AncCP-Ef1 and of AncCP-6An.
In HsCaspase-3, the P4 Asp makes three hydrogen bonds with the side-chain of CP-N162, on active site loop L3, and to water molecules. The waters link the backbone atoms of GP9-E01 and of GP9-S02 to the P4 Asp (Figure 5A). As described above, CP-162 exhibits A1-type ambiguity since it is conserved as Asn in group II caspases or Glu in group III caspases (Figure 5A,B). In order for Val to bind in the S4 pocket of HsCaspase-6, a network of four charged residues interact with CP-E162 so that it is positioned away from the P4 Val (CP-N165, CP-Q171, CP-D172, and CP-R198) (Figure 5B). Only one of the four charged residues are found in HsCaspase-3 (CP-Q171), although CP-165 is a conservative substitution (N to D).
Electrostatic network near S4 subsite.
In AncCP-Ef1, CP-N162 interacts directly with the P4 aspartate, as in HsCaspase-3, but the hydrogen-bonding pattern is incomplete. We observe no hydrogen bonds with backbone atoms in loop 4, and CP-Q171 has rotated away from the active site and interacts with CP-N27 on Helix 1 (Figure 5C). In addition, CP-R198 is also rotated toward the P4 Asp. Thus, while AncCP-Ef1 contains three of the five amino acids in the network of HsCaspase-6, the hydrogen bonding network is not established. In contrast, the charged-network is fully formed in AncCP-6An, with the CP-E162 and CP-D172 substitutions, and the same H-bonding pattern is observed in AncCP-6An as in HsCaspase-6 (Figure 5D). Finally, we note that in HsCaspase-3, GP9-E01, on loop 4, is rotated away from the active site and toward solvent (Figure 5A). In caspase-6, the Glu is substituted with Val, which rotates toward the active site and forms part of the S4 binding pocket. In addition, CP-R199 and GP9-D09 flank the hydrophobic Val and form a salt bridge in HsCaspase-6 that may stabilize loop 4 (Figure 5B). In AncCP-Ef1 and AncCP-Ef2, one observes Glu at GP9-01, and the position changes to Val in AncCP-6An (Figure 1C). The CP-R199:GP9-D09 salt bridge is not observed until subsequent evolutionary nodes.
To examine the putative role of the charged network in substrate selection, we introduced several substitutions into AncCP-Ef1 and determined changes in enzyme activity on DEVD, VEID, and LETD tetrapeptide substrates. We first introduced substitutions at CP-162 (Asn to Glu) and GP9-01 (Glu to Val), as well as the double mutant (CP-N162E,GP9-E01V). The data show that activity increased ∼4-fold in the CP-N162E single mutant, but there was little effect on the selectivity (Supplementary Information, Table S5). In contrast, we observed a large decrease in activity (>10-fold) for the GP9-E01V single mutant. In this case, one observes changes in kcat and in KM, resulting in the low activity. When the two substitutions were combined, however, the activity (kcat/KM) of the double mutant (CP-N162E,GP9-E01V) increased ∼10-fold for P4 Val or Leu and only ∼3-fold for P4 Asp. The increased selection was due to both an increase in kcat and a decrease in KM for the two hydrophobic P4 residues compared with P4 Asp (Figure 6). We observed the largest change in activity when we substituted CP-S172 with Asp along with the CP-N162E,GP9-E01V substitutions. The data for the triple mutant show ∼80-fold increase in activity for Val and ∼20-fold increase for Leu, with only ∼4-fold change for Asp at P4 (Figure 6A and Supplementary Information, Table S5). The increase in activity is largely due to a decrease in KM (Figure 6B,C), which is below 10 µM for P4 Val. Overall, the series of mutants of AncCP-Ef1 show that the substitutions had little effect on the activity with P4 Asp (with the exception of the GP9-E01V single mutant). However, the three substitutions resulted in selection of Val over Asp through a large decrease in KM. In support of the tetrapeptide substrate data, we observed a similar shift in specificity using the substrate-phage display assay. In this case, the triple mutant of AncCP-Ef1 demonstrated a specificity (P5–P1′) of (S/T)(V/T)DVDA, with little or no selection for Leu or Asp at P4 (Figure 3D). We saw no further improvement in selection by including the CP-R199:GP9-D02 salt bridge with the three substitutions (Figure 6 and Supplementary Information, Table S5), so the importance, if any, of the salt bridge in stabilizing loop 4 of caspase-6 is not yet clear. Overall, the data show that only three substitutions are required in AncCP-Ef1 to change substrate selection, where two substitutions complete the network of charged amino acids that position CP-E162 away from the S4 pocket while one substitution positions the hydrophobic GP9-V01 in the S4 pocket.
Enzyme activity of ancestral effector caspases.
Variations in the C-terminus of helix-5
In caspase-6, the amino acid at CP-198 is a highly conserved arginine residue at the C-terminus of helix-5 at the junction with loop L4. As described above, the arginine participates in an interaction network that co-ordinates the charged carboxyl group of CP-E162 on loop L3 to position it away from the S4 pocket (Figure 5B), while also increasing the hydrophobicity by exposing the hydrophobic β and γ carbons of the side-chain of CP-E162 to the S4 pocket. The arginine at CP-198 (CP-R198) is also present in AncCP-Ef1; however, the interaction network in AncCP-Ef1 is missing a highly conserved aspartate residue on helix 4 observed in caspase-6 at CP-172, which forms hydrogen bonds with the side-chain of CP-R198. In contrast, the serine at CP-172 of AncCP-Ef1 is highly conserved in caspase-3, and the shorter side-chain does not interact with CP-E162. The substitutions in helix 5 result in structural changes in the last turn of the helix that affect loop L4 (Figure 7A). In HsCaspase-3 and in AncCP-Ef1, the backbone atoms of loop L4 are closer to the S4 binding pocket, and GP9-E01 is rotated toward solvent and away from the S4 pocket. In contrast, in HsCaspase-6 and AncCP-6An, the orientation of loop L4 is shifted by one residue as a result of the structural changes in helix 5 such that the backbone atoms are further from the S4 pocket. The shift causes GP9-V01 in HsCaspase-6 and in AncCP-6An to move into the S4 pocket.
Changes in helix 5 and the adjacent active site loop 4.
An analysis of initiator caspases revealed that caspase-2 also has a highly conserved arginine at CP-198, while caspases-8 and -9 utilize lysine residues in the same position. The structures of the initiator caspases show the same orientation of the helix 5-loop L4 residues as observed for HsCaspase-6 and AncCP-6An (Figure 7B). Comparatively, caspases-3 and -7 are outliers. The CP-E198 (caspase-3) or CP-H198 (caspase-7), as well as the more regular helical turn, orient the charged groups away from loop L3 and the charged interaction network observed in HsCaspase-6. Therefore, caspases that recognize hydrophobic residues (I/L/W/V) in the P4 substrate position utilize the long positively charged side-chain at CP-198 and subsequent alterations to the final turn of helix 5. We note that the interaction of CP-R198 and CP-D172 is uniformly conserved in all chordate caspase-6 genes.
Since the pioneering work of Zuckerkandl and Pauling over 50 years ago [68,69], research in the evolutionary biochemistry of protein structure-function relationships has aimed to apply rigorous biochemical and biophysical methods to understand how protein sequence changes affect protein structure and function. Understanding how random chance, selection pressure, and changes in free-energy landscapes determine the characteristics of divergent proteins within a family allows one to track form and function along a phylogeny . Specific, and limited, combinations of amino acids gleaned from comparative studies of extant proteins do not include the historical context that may allow the combination of amino acids to function in the active site environment. APR adds the dimension of evolutionary time and fitness to the structure-function relationship [63,64] and provides powerful tools to characterize evolutionary changes in proteins experimentally [70–72]. The evolutionary trajectory is thus determined by a combination of biochemical, biophysical, and regulatory factors, but the interplay among these factors and their role in protein evolution remains an unresolved question.
We used APR techniques to infer ancestral sequences and to resurrect the CA of effector caspases-3/6/7 and of caspase-6. The analysis provides site-specific probabilities throughout the protein sequence such that one can examine the robustness of the APR methods by characterizing multiple proteins from the pool of possible sequences in an ancestral node . Ambiguity, or uncertainty in ancestral sequence inference, is typically attributed to insufficient sequence data, uncertain gap placement in the multiple sequence alignment, tree topology, and the extent of sequence divergence relative to tree articulation [64,73]. As we described previously , the large dataset compilation from the CaspBase minimizes the major causes of ambiguity. Regions with higher ambiguity are generally observed in sites that are not evolutionarily constrained for proper structure-function, such as surface-exposed residues, loops, turns, and intrinsically disordered regions . In addition, assessing data from multiple reconstructions provides insight about positions of interest. For example, we resurrected two proteins from the common ancestral node (AncCP-Ef1 and AncCP-Ef2) and showed similar characteristics in that both enzymes had low activity and little selection against P4 valine, aspartate, or leucine, suggesting that both proteins robustly represent the pool of possible proteins in the ancestral node. Amino acid positions that were ambiguous were mostly found in the surface helices as well as two regions that are flexible or disordered, the intersubunit linker and prodomain (type A2 ambiguity). The surface helices were previously characterized as sites of allosteric regulation [22,60,61], so our data suggest that allosteric sites may have different evolutionary pressures compared with the active site residues. That is, the allosteric sites may have evolved in a species-dependent manner based on individual needs, leading to larger sequence variations at those sites and higher ambiguity in the APR analysis.
Overall, our data agree with previous studies showing that the pool of possible ancestors at the nodes of a phylogenetic tree reflect ancestral mechanistic function experimentally [75–77]. In addition, several studies also suggest that proteins have changed from lower to higher specificity over long evolutionary time [78,79]. Enzymes with low specificity presumably provided primordial organisms the ability to perform diverse chemical reactions with a limited proteome. So, the broadened specificity of the ancestral effector caspase most likely resulted in off-target effects, which benefited the ancestral organisms in a manner that is unknown at this time. Increased specificity also follows gene duplication , which is also shown in our data by the changes in the ancestral caspase-6, AncCP-6An.
The structure of AncCP-Ef1 with DEVD bound in the active site showed that the caspase-hemoglobinase fold has been conserved for over 650 million years as the protein exhibits <0.5 Å RMSD with extant HsCaspase-3. The data also showed that active site loop L4 was partially disordered in the ancestral effector caspase. In contrast, loop 4 was well-formed in the ancestral caspase-6 enzyme, AncCP-6An, with VEID inhibitor bound in the active site. For AncCP-Ef1 (DEVD), the hydrogen-bonding network with the P4 Asp, observed in HsCaspase-3, was incomplete in that interactions were missing between loop L4 and the P4 Asp while interactions with loop L3 were retained. The multiple structures showed flexibility in the side chains of CP-R198 and CP-Q171 as the side-chains were observed in different orientations. In AncCP-6An, both amino acids are part of a network of charged residues between loop L3, helix 5, and loop L4 that ultimately interact with CP-E162 to position the charged side-chain away from the hydrophobic S4 pocket. In AncCP-Ef1 and HsCaspase-3, the incomplete network, as well as the shorter side-chain of CP-N162, positions CP-N162 to interact directly with the P4 Asp and thus retains the hydrophilic character of the S4 pocket. Two of the five residues in the network are not found in AncCP-Ef1, so we completed the network by substituting the two sites (CP-S172D, CP-N162E), as well as introducing GP9-V01. In HsCaspase-6 and in AncCP-6An, the substitution of GP9-E01V results in rotation of the valine into the S4 pocket. Overall, the data for the triple mutant showed ∼80-fold increase in the activity against VEID substrate with little change in the activity for DEVD. The activity increase was due primarily to a large decrease in KM, suggesting that the substitutions resulted in improved binding of P4 Val without affecting the binding of P4 Asp. Thus, with only three substitutions in the ancestral effector caspase scaffold, the enzyme became selective for P4 Val versus Asp.
Extant genes accumulate mutations as described by the neutral mutation theory , and their probability of fixation into a population is described by models of genetic drift . A few outcomes can occur for nascent genes, but the majority of mutations are deleterious and will likely cause loss of function, or pseudogenization, which leads to purification from the population . Although there are forty-two evolutionary changes between the ancestral effector caspase and the ancestral caspase-6, only three vertical substitutions are sufficient to result in altered substrate specificity in the caspase-6 lineage. The remaining vertical mutations may contribute to protein epistasis or to the evolution of allosteric sites. Together the data show that enzyme specificity was established early in the evolution of caspase-6 and that allosteric regulation likely followed through subsequent evolution. Intriguingly, the data suggest that features of the conformational landscape in the ancestral effector caspase remain in extant enzymes. Because so few substitutions to the conserved scaffold were required for neofunctionalization of substrate specificity (three substitutions of ∼260 amino acids), subsequent mutations may then affect access to various conformational states through changes in the free-energy landscape. If this is true, then the unique coil-to-helix transition observed in caspase-6, for example, may still be present in the conformational landscapes of other caspases but is inaccessible due to evolutionary changes that introduced high barriers to the state.
Understanding the evolutionary changes in conformational landscapes that resulted in substrate selection and in allosteric regulation may provide strategies for re-engineering caspases with desired substrate selection coupled to unique conformational states. Our APR studies show that the methodology is effective for characterizing evolutionary changes in ancestral proteins in order to infer functional changes in the extant caspase proteases. By focusing on peptide mutations (substitutions, insertions, and deletions) to quantify the probability of evolutionary change, the APR methodology allows for targeted assays by examining the fewer positions that changed between evolutionary nodes, as compared with the larger substitutions between extant enzymes.
R.D.G., S.S., J.T.-M., P.D.S., P.T.H., and A.C.C. designed the experiments; R.D.G., S.S., and J.T.-M. carried out the experiments. R.D.G. and A.C.C. wrote the manuscript, and all authors contributed to data analysis. All authors have approved the manuscript.
This work was supported by a grant from the National Institutes of Health [grant number GM127654 (to A.C.C)] and by funds from UT Arlington [Office of the Vice President for Research (to A.C.C)]. Use of the Advanced Photon Source was supported by the U.S. Department of Energy, Office of Science, Office of Basic Energy Sciences, under contract number W-31-109-ENG-38.
The Authors declare that there are no competing interests associated with the manuscript.