Asparaginyl endopeptidases (AEPs) are versatile enzymes that in biological systems are involved in producing three different catalytic outcomes for proteins, namely (i) routine cleavage by bond hydrolysis, (ii) peptide maturation, including macrocyclisation by a cleavage-coupled intramolecular transpeptidation and (iii) circular permutation involving separate cleavage and transpeptidation reactions resulting in a major reshuffling of protein sequence. AEPs differ in their preference for cleavage or transpeptidation reactions, catalytic efficiency, and preference for asparagine or aspartate target residues. We look at structural analyses of various AEPs that have laid the groundwork for identifying important determinants of AEP function in recent years, with much of the research impetus arising from the potential biotechnological and pharmaceutical applications.
Asparaginyl endopeptidases (AEPs) are cysteine proteases of the C13 family (clan CD) that cleave at the carboxyl-terminal side of Asx (Asp or Asn). First discovered in legume plants , these enzymes were thus categorised as legumains (EC 22.214.171.124)  and are referred to as such, even for mammalian orthologues [3,4]. AEPs are widely distributed in land plants, where they are also sometimes called vacuolar processing enzymes due to their localisation in vacuoles and their role in processing vacuolar proteins [1,5].
The discovery of butelase-1, an extremely efficient AEP (catalytic efficiency of up to 1 340 000 M−1 s−1 on a non-native substrate) [6,7] that forms peptide bonds via a cleavage-coupled transpeptidation reaction, has led to a surge of interest in AEPs. Transpeptidation is an ATP-independent reaction. The putative mechanism in AEP involves an initial cleavage reaction, where an acyl-enzyme intermediate is first formed between the catalytic cysteine and the carbonyl carbon of the substrate P1 residue (nomenclature according to Schechter and Berger ). This intermediate is then resolved by nucleophilic attack by either a water molecule or a nearby N-terminal α-amine, resulting in hydrolysis or aminolysis (transpeptidation), respectively [9,10]. Transpeptidation by plant AEPs is often intramolecular, in that the free N-terminal nucleophile originates from the same molecule in which the initial cleavage reaction occurs, resulting in peptide macrocyclisation. Many plant AEPs are therefore often referred to as cyclases as well. The term AEP is used to emphasise inherent peptidase activity, but the discovery of AEPs with efficient transpeptidase activity has led to certain AEPs being referred to as ligases or Peptide Asparaginyl Ligases instead; this is useful in emphasising the most notable enzyme activity but introduces a grey area for AEPs that have different enzyme efficiency and dominant activity depending on the pH and substrate [10–15].
Perhaps fitting for an enzyme with so many names, AEPs are also involved in processing a wide range of proteins in plants. They have been proposed to activate other proteases involved in seed storage protein processing . Their role in vacuolar rupture to initiate a proteolytic cascade in programmed cell death has also been hypothesised to be through the activation of other vacuolar enzymes . AEPs were for some time believed to also play a role in protein degradation during plant senescence, a process closely related to programmed cell death, but that has been disproven . Here, we discuss the more well-characterised types of modifications performed by AEPs, including the maturation of seed storage proteins by hydrolysis, peptide processing and macrocyclisation, and the unique circular permutation of concanavalin A. Additionally, in light of knowledge generated from the recent wave of AEP structural analyses, we summarise recent studies on the structural features of AEP that influence their ability to perform transpeptidation.
Maturation of seed storage proteins by hydrolysis
Seed storage proteins are a source of amino acids and elemental nutrients for the plant during germination ; they make up to half of the seed protein content and have no apparent function during seed development [20,21]. As the first systematic study on seed storage proteins was carried out by a chemist , these proteins were broadly classified based on a chemical property — their solubility in various solvents, namely the water-soluble albumins, salt-soluble globulins, alcohol-soluble prolamins, and acid- or alkali-soluble glutelins. Depending on the plant species, different types of storage proteins predominate [20,21].
It was the investigation into the maturation process of globulins that led to the discovery of asparagine-specific thiol proteases [1,23], which we now refer to as AEPs. During seed development, AEPs are thought to degrade unassembled or misfolded proteins , and have been shown to cleave seed storage proteins to facilitate their oligomeric assembly for final deposition in seeds [25–27]. AEP involvement in seed storage maturation was demonstrated in vivo with an Arabidopsis thaliana AEP quadruple knockout line, which produced alternatively processed storage proteins , that could be fully or partially rescued by transgenic expression of native and even non-native AEPs  (Figure 1A). The altered storage protein profile in the absence of AEPs is a combination of failure to cleave at the usual cut sites and abnormal cleavage at other sites by other proteases . This is illustrated by the inability for 11S globulin to adopt the mature oligomeric complex conformation in the absence of AEP cleavage and is thus exposed to digestion by other proteases . Unexpectedly, the altered seed storage protein profile in the AEP knockout A. thaliana line appears not to affect plant viability or development in any detectable way [28,30]. It remains to be seen if phenotypic effects can be discerned in AEP-deficient mutant plants when exposed to various stresses.
Types of modifications by plant AEPs.
Peptide processing and macrocyclisation
Macrocyclic peptides are small backbone-cyclic peptides. These plant peptides are structurally diverse and typically less than 40 amino acids in size (Figure 1B). They are categorised into four broad classes: cyclotides or kalata-type cyclic peptides (e.g. kalata B1), which contain an embedded cystine knot of three disulfide bonds that can interact with lipid membranes ; cyclic knottins (e.g. MCoTI-II), often grouped with cyclotides as they also contain a cystine knot, but are more closely related to acyclic knottins and are protease inhibitors [33,34]; Preproalbumin with SFTI (PawS)-Derived Peptides (PDPs) (e.g. SFTI-1 and PDP-23), which are genetically embedded in the seed-storage albumin gene and contain one disulfide bond [35–37] while just PDP-23 contains two ; and orbitides (e.g. PLP-2), which are 5–16 amino acids in size and lack disulfide bonds [39,40]. The albumin-buried PLP orbitides appear to be evolutionary precursors of PDPs, and some PDPs have gained the ability to inhibit proteases as well [41,42]. Due to their observed in vitro bioactivity, many macrocyclic peptides are thought to confer resistance to pests and pathogens [32,35,43–46]. Macrocyclic peptides have also garnered considerable interest as drug scaffolds [47,48] as their small size, constrained rigid structure and absence of termini make them highly resistant to thermal, chemical, or enzymatic breakdown [33,36,49,50]. One of the most interesting scaffolds is that of the recently discovered and unusual PDP-23 (Figure 1B), which forms a symmetrical homodimer that dissociates upon interacting with membranes .
In nature, macrocyclic peptides are produced either by thioesterase-mediated non-ribosomal peptide synthesis  or by post-translational modification of ribosomally synthesised linear peptide precursors [9,10]. The latter process, which produces all four classes of macrocyclic peptides mentioned above, involves excising a short peptide from a protein chain via two cleavage reactions. AEP involvement in peptide macrocyclisation was initially inferred from a highly conserved Asx residue at the proto-C-terminus of peptide precursors [52–54]. This was confirmed in vivo by macrocyclisation of transgenic macrocyclic peptide precursors in plants devoid of macrocyclic peptides and demonstrating that inhibiting/silencing endogenous AEPs in the transformed plant results in failure to macrocyclise these precursors [34,37,55]. Several plant AEPs have since been characterised and shown to carry out peptide macrocyclisation in vitro [6,9,10,13,14,56]. Furthermore, a plant-based expression system has been recently developed, whereby transgenic AEPs and peptide precursors are co-expressed to produce macrocyclic peptides . Orbitides are a diverse class, and cyclisation of some have been shown to be mediated by the serine protease peptide cyclase 1 in some plant species [58,59], but analogy to the PDPs and sequence conservation in the recently discovered large PLP family of orbitides suggest AEP involvement for these . These findings suggest that macrocyclic peptides evolved by exploiting an already-existing biochemical machinery in plants . It was hypothesised that this evolutionary convergence on AEPs was due to the reactive thioester acyl-intermediate that AEPs form after cleavage of the scissile peptide bond, and how the active site is structured to encourage interactions with an N-terminal nucleophile instead of water . Cleavage at the proto-N-terminus of the peptide must therefore occur first so that it can participate in cleavage-coupled transpeptidation reaction at the proto-C-terminus [9,10,34,37,55]. Besides the conserved proto-C-terminal Asx residue, other hallmarks of AEP-mediated macrocyclisation are the highly conserved proto-N-terminal glycine, the small P1′ residue and the hydrophobic P2′ leucine .
The conserved Asx processing sites are found in acyclic peptides as well. PDP-10 (Figure 1B) is an example of a peptide similar to SFTI-1 but appears to undergo AEP-mediated hydrolysis rather than transpeptidation . A recently discovered large group of non-cyclic peptides known as vicilin-buried peptides (e.g. C2) also contain the conserved Asx processing sites . These peptides are characterised by disulfide bonds forming between CXXXC motifs. Over 20 years ago, the prototypic member of what would be later called the vicilin-buried peptide family was shown in vitro to be matured by AEP cleavages , but the trypsin-inhibitory activity it displayed in 1999 was not reproducible with a synthetic version .
Circular permutation of concanavalin A
Concanavalin A (conA), found in the seeds of jack bean plants (Canavalia ensiformis), is a carbohydrate-binding protein (lectin) that binds specifically and reversibly to mannose and glucose . Even though conA has been functionalised in many ways with widespread use especially in glycoprotein purification [63–69], conA function in vivo is far from certain. ConA has been hypothesised to be involved in seed storage and plant defence, and their likely roles are mostly inferred from studies on other legume lectins [70–73]. The biosynthesis of conA and the highly-similar conA-like lectins found in plants that, like C. ensiformis, belong to the Diocleinae subtribe, involve a maturation process that employs the only known example of post-translational circular permutation in nature .
ConA circular permutation represents not only one of the earliest examples of protein splicing observed in nature [75,76], but also offered the first hint of the existence of an enzyme capable of transpeptidation in higher organisms . Consequently, the jack bean AEP was one of the first AEPs to be studied extensively [78,79]. Circular permutation of conA involves cleaving the precursor of conA in the middle and ligating the two original termini via a transpeptidation reaction, resulting in a shuffling of the primary sequence such that the two halves of the protein are swapped  (Figure 1C). Circular permutation does not affect the general tertiary structure of conA or its functional sites, which is reflected in the apparently similar carbohydrate-binding abilities between conA and its precursor . However, conA was recently shown to be more thermal and pH stable than its precursor in vitro, with a difference in complex formation in the crystal structures providing a possible explanation, revealing a functional consequence for what has been thought to be an inconsequential modification since its discovery 35 years ago [73,75].
AEPs are synthesised in the inactive form consisting of the N-terminal ‘core’ domain, which contains the active site, and a smaller C-terminal auto-inhibitory ‘cap’ domain, which is also termed the Legumain Stabilisation and Activity Modulation domain as it blocks access to the active site, conferring enzymatic latency and protein stability at neutral pH conditions (Figure 2) . Conversion of AEP into the active form has been studied in exquisite detail in two isoforms of A. thaliana AEP (AtAEP2 and AtAEP3) [80–82]. AEP activation occurs via autocatalytic cleavage around a flexible linker region between the cap and core domains, and requires disruption of electrostatic interactions between the two domains under acidic conditions [80–82].
The catalytic triad (or dyad) of all AEPs are structurally conserved (Figure 3). The catalytic cysteine was confirmed by covalent linking with a peptidic chloromethylketone-based inhibitor in human AEP . The imidazole group of the catalytic histidine has been hypothesised to act as a general base. It has been proposed to activate the catalytic cysteine by deprotonating the cysteine thiol sidechain . However, all plant AEP structures solved so far show that the catalytic cysteine and histidine sidechains are more than 5.4 Å apart. The catalytic histidine more likely acts upon the carbonyl carbon of the substrate directly or via a catalytic water in a pH-dependent manner [3,84]. In either scenario, the result is an acyl-enzyme intermediate that is resolved by hydrolysis or transpeptidation. A nearby asparagine makes up the last residue of the catalytic triad. It was hypothesised to facilitate substrate deprotonation by influencing the orientation of the catalytic histidine , but was shown to be a non-crucial catalytic residue in other cysteine proteases . Studies on sunflower AEP1 (HaAEP1), a predominant hydrolase, showed that a catalytic cysteine to serine mutation abolished enzyme activity, a catalytic histidine to alanine mutation drastically reduced activity, and a catalytic asparagine to alanine mutation resulted in a higher transpeptidation to hydrolysis ratio .
Structural features that influence AEP activity.
Structural features that influence AEP function
Unlike the specialised enzyme PatG, which evolved an additional dedicated protein domain to facilitate transpeptidation in cyanobacteria by shielding its reaction intermediates from water , AEPs that are efficient at transpeptidation do not possess an obvious additional domain. The structural features that convert an AEP from a predominant protease to a predominant transpeptidase must be discerned from subtle differences in the substrate-binding pocket. Delineation of the substrate-binding sites was made possible by the structural elucidation of activated plant AEPs in covalent complex with various substrates [11,12].
To facilitate comparison of equivalent residues amongst different AEPs, all residue numberings used in this review are based on the butelase 1 sequence. Residues around the active site have been proposed to facilitate interaction between the acyl-enzyme intermediate and the N-terminal nucleophile. Asn59, which is part of the catalytic triad, and Glu208 (Figure 3) were shown to influence transpeptidation efficiency in HaAEP1 . Residue 59 was hypothesised to influence the conformational flexibility of the localised catalytic histidine, a smaller sidechain here may therefore reduce steric hindrance for nucleophilic attack on the acyl-enzyme intermediate by an incoming N-terminal residue . Glu208 was shown in docking studies with an AtAEP to form an ionic interaction with the N-terminal nucleophile  and was suggested to deprotonate the incoming N-terminus in the transpeptidation reaction .
The primed region of the AEP active site, i.e. AEP residues that interact with substrate residues proceeding the scissile peptide bond, has been hypothesised and corroborated with molecular dynamics analyses to promote transpeptidation by maintaining affinity with the cleaved substrate until they are displaced by an incoming N-terminal nucleophile [12–14]. Mutating residue 167 in the S1′ pocket (Figure 3) to an amino acid with a small sidechain resulted in a higher transpeptidation/hydrolysis ratio and increased catalytic efficiency in two AEPs from different plant species . It was suggested that a bulky sidechain at this position interferes with proper replacement of the leaving group by the N-terminal nucleophile, thus allowing water to enter the active site . The highly conserved hydrophobic P2′ residue in cyclic peptide precursors [10,52,88] and the conserved deep hydrophobic S2′ pocket (residues 170, 172, 178, 180) (Figure 3) indicate that the S2′-P2′ interaction is crucial for transpeptidation. The depth of the S2′ pocket is attributable to the lack of sidechain at residue 172 and a large aromatic sidechain at residue 178. Molecular dynamic simulations showed that non-hydrophobic P2′ residues resulted in poor retention of residues C-terminal of the scissile bond . Reduction in hydrophobicity at the S2′ pocket also negatively impacts cyclisation —HaAEP1 and AtAEP2, which have low to intermediate cyclisation efficiencies, contain a histidine rather than an aromatic amino acid at residue 178 [9,11,12]. The S2′ pocket was also hypothesised to shield the active site from water .
Several residues in the non-primed region, i.e. AEP residues that interact with substrate residues preceding the scissile peptide bond, have been shown to influence transpeptidation efficiency. Substrate affinity, hypothesised by some to positively influence transpeptidase activity, was attributed to the non-primed region . However, the complex interplay amongst the residues in this region makes it difficult to rationalise mechanistic explanations for their influence on AEP activity. Residue 237, the α5–β6 loop, and the proline-rich loop (also previously referred to as the ‘Gate-keeper residue’, ‘Marker of Ligase Activity’, and ‘poly-proline loop’, respectively) of the non-primed region have been the most widely discussed (Figure 3). One of the most prolific single point mutations was of that performed at residue 237 in Oldenlandia affinis AEP1 (OaAEP1), which improved cyclisation kinetics (kcat) by 160-fold . However, mutating this residue to the same residues that are found in other efficient transpeptidases resulted in poorer transpeptidation efficiency in OaAEP1 [6,13,89], but improved cyclisation ratio of a dominant hydrolase in a Clitoria ternatea AEP . In another study, mutating residue 237 did not improve transpeptidation in a petunia AEP unless coupled with mutations in the α5–β6 loop . This flexible α5–β6 loop was shown to influence the yield of macrocyclic peptides, but not the activity preference as the deletion of this loop in the dominant ligase OaAEP1 did not result in enhanced hydrolysis, indicating that it likely influences substrate affinity, but is not a determinant of AEP functional preference . The large difference in α5-β6 loop size in two extremely efficient AEP transpeptidases, butelase 1 (Figure 3A) and OaAEP1, (Figure 3B) suggests that transpeptidation efficiency is not necessarily correlated to the size of the α5-β6 loop. Furthermore, AEPs that are inefficient at transpeptidation, such as C. ensiformis AEP1 (Figure 3C), have a similar α5-β6 loop size to butelase 1. The more hydrophobic characteristic of the α5-β6 loop in butelase 1, as compared to other inefficient transpeptidases, was postulated to be a distinguishing feature of an efficient transpeptidase . The conserved proline-rich loop has been shown to interact with the substrate non-primed region . However, it is not known if substrate-recognition at the proline-rich loop influences transpeptidation efficiency.
AEP activity is also substrate- and condition-dependent [10–14]. AEPs such as HaAEP1 that have been structurally and biochemically characterised as predominant peptidases have turned out to also have high cyclic to linear product ratio when assayed with different substrates . Asparagine/aspartate preference is pH-dependent — aspartate was hypothesised to be accepted as a P1 residue below pH 5.5 because protonating the aspartate carboxylic sidechain makes it sterically and electrostatically similar to asparagine, which is the preferred P1 residue at higher pH conditions [3,81,91]. Studies on AtAEP3 showed that transpeptidation activity is typically confined to near-neutral pH conditions  where N-terminal amino groups are more likely deprotonated, either on their own or by a deprotonated catalytic histidine, and therefore able to perform nucleophilic attack on the acyl-enzyme intermediate . Zauner and colleagues  have hypothesised that the association of the cap and core domain in AtAEP3 results in the exclusion of water from the active site, and that a deprotonated N-terminal nucleophile can interfere with this association to resolve the acyl-enzyme intermediate.
Recent studies highlight the fact that the preferred AEP activity and catalytic efficiency are dependent not only on the structural features of a given AEP, but also on the nature of the substrate and the pH condition. This is illustrated in the structural diversity of both the peptide substrates (Figure 1B) and AEP substrate binding sites (Figure 3). Mutagenesis studies point to a complex interplay amongst residues of the non-primed region in the AEP substrate binding site, as single substitution of a residue or a loop in this region do not consistently result in the expected enhancement of transpeptidase activity, whereas multiple substitutions are more likely to do so [14,56]. Nevertheless, developments in understanding AEP function, especially that of butelase 1 from Clitoria ternatea and a mutant variant of OaAEP1 from Oldenlandia affinis, have led to their use as protein engineering tools especially for biotechnological and pharmaceutical applications [7,92–98]. These AEPs are efficient transpeptidases that leave only a P1 Asx residue and typically an aliphatic P2″ residue, such as isoleucine, leucine or valine, as processing artefacts [6,10,90,92]. The broad specificity for any nucleophilic P1″ amino acid except proline [6,99] in the transpeptidation reaction makes AEPs ideal enzymes for a variety of protein or peptide substrates. However, this promiscuity comes at a cost as ingenious workarounds are required to minimise nucleophilic attack by the cleaved by-product [92,100,101]. The study and genetic engineering of AEP provides an alternative to the chemical ligation of peptides and is therefore crucial for developing novel macrocyclic peptides for use in agriculture and medicine. Even though applications of synthetic macrocyclic peptides have to date been modest, much research continues to evaluate their potential as drug leads for human diseases [102–105].
Asparaginyl endopeptidases perform a variety of functions in plants. There is enormous potential to harness their versatility for protein engineering, especially for the development of peptidic compounds and drugs.
Recent discoveries and developments have improved our understanding of the structural features that can make an asparaginyl endopeptidase an efficient transpeptidase.
This knowledge will aid not only our search for better transpeptidases but also the development of the current known ones into effective biotechnological tools.
The authors declare that there are no competing interests associated with the manuscript.
This work was supported in part by Australian Research Council (ARC) grant DP160100107 to J.S.M.
J.H. was supported by an ARC Discovery Early Career Researcher Award DE180101445. S.G.N. was supported by the Australian Research Training Program.
Open Access Statement
Open access for this article was enabled by the participation of University of Western Australia in an all-inclusive Read & Publish pilot with Portland Press and the Biochemical Society under a transformative agreement with CAUL.
S.G.N. and J.S.M. conceptualised the manuscript. S.G.N. drafted the manuscript. J.H. assisted with generating figures. S.G.N., J.S.M., and J.H. revised the manuscript.
The authors thank Amy James for providing an SDS gel image of the seed protein profiles which is similar to the SDS gel image used in her published work in Plant Journal .