The first crystal trans-structure of a naturally occurring split intein has been determined for the Npu (Nostoc punctiforme PCC73102) DnaE split intein. Guided by this structure, the residues NArg50 and CSer35, well conserved in DnaE split inteins, are identified to be critical in the trans-splicing of Npu DnaE split intein. An in vitro splicing assay demonstrates that NArg50 and CSer35 play synergistic roles in modulating its intein activity. The C-terminal CAsn36 exhibits two orientations of its side chain and interacts with both NArg50 and CSer35 through hydrogen bonding. These interactions likely facilitate the cyclization of asparagine in the course of protein splicing. The mutation of either residue reduces intein activity, and correlates with the low activity of the Ssp (Cyanobacterium synechocystis sp. strain PCC6803) DnaE split intein. On the other hand, NArg50 also forms a hydrogen bond with the highly conserved F-block CAsp17, thus influencing the N–S acyl shift during N-terminal cleavage. Sequence alignments show that residues NArg50 and CSer35 are rather conserved in those split inteins that lack a penultimate histidine residue. The conserved non-catalytic residues of split inteins modulate the efficiency of protein trans-splicing by hydrogen-bond interactions with the catalytic residues at the splice junction.
Protein splicing is an intriguing autocatalytic process in which an intervening sequence (intein) excises itself from a precursor protein with concomitant ligation of the flanking protein sequences (extein) [1–4]. Over 550 inteins have been identified so far . Inteins can be classified into two types: those with continuous sequences that mediate cis-splicing and the others with split sequences capable of mediating trans-splicing. Both types of inteins have been widely used as tools in protein chemistry and molecular biology. The trans-splicing inteins appear to be more convenient to study protein interactions, as the two protein domains can be expressed separately [6–16].
Protein splicing involves a four-step mechanism, (i) an N–S (or N–O) acyl shift, (ii) trans-esterification, (iii) cyclization of asparagine and (iv) spontaneous S–N (or O–N) acyl migration, to yield the final spliced products [1,17]. The trans-splicing displayed by split inteins (Figure 1) requires the additional step of association of the two split domains. Inteins contain four blocks of conserved sequence, and several highly conserved residues from these blocks form a catalytic core in the intein structure (Supplementary Figure S1 http://www.biochemj.org/bj/461/bj4610247add.htm) . These residues are conserved in all inteins and are essential in protein splicing [1,18]. A systematic investigation of split inteins showed that residues distant from the catalytic core modulate splicing efficacy . These residues, including NGlu89 and CGly19 (where N and C that precede the residue designation refer to the splicing domain concerned), are conserved in the split inteins with high activity, but less so in the inteins with low activity (Supplementary Figure S2 http://www.biochemj.org/bj/461/bj4610247add.htm) . Although conserved residues in the catalytic core are required for protein splicing, residues in the non-catalytic core can modulate intein activity . Of all of the split inteins, two have been studied most extensively: the DnaE intein from Cyanobacterium synechocystis sp. strain PCC6803 (Ssp DnaE intein) and the DnaE intein from Nostoc punctiforme PCC73102 (Npu DnaE intein) [9,20–22]. In vitro splicing assays showed that the Npu DnaE intein catalyses trans-splicing highly efficiently (yield >98% and t1/2 ~60 s), whereas the Ssp DnaE intein showed much lower activity (t1/2 ~3 h) . Sequence comparisons confirm that the conserved non-catalytic residues in the high-activity intein are in fact present in the Npu DnaE intein, but are absent from the Ssp DnaE intein (Supplementary Figure S2) . Both the Npu DnaE intein and Ssp DnaE intein belong to a subfamily of non-canonical inteins characterized by the lack of the highly conserved penultimate histidine (Supplementary Figure S2). In most inteins, this histidine facilitates cyclization of asparagine in preparation for C-terminal cleavage in the third step of the splicing reaction .
The four step mechanism of protein splicing
The 3D structures of the Ssp DnaE intein and the Npu DnaE intein have been solved by X-ray crystallography and NMR respectively [25–27]. The two split domains were fused into a continuous sequence whereas attempts at co-crystallization of the two separate domains of the Ssp DnaE intein were unsuccessful . Notwithstanding their high degree of structural similarity, these comparisons do not provide many clues to understanding the difference in activity of the two inteins. On the other hand, the artificial fusion of two split domains might lead to subtle rearrangements of protein structure, although the overall structure of the two inteins may not be perturbed. Because non-catalytic residues can modulate intein activity , differences in protein splicing efficiency might result from subtle differences in protein structure.
We solved the crystal structure of Npu DnaE intein in the split form. This is the first trans-structure of split inteins. Guided by this structure, we identified two residues, NArg50 and CSer35, which play a significant role in modulating the protein splicing reaction. Sequence comparisons show these residues to be conserved in the DnaE split intein. The absence of CSer35 in the Ssp DnaE intein might explain its low activity.
MATERIALS AND METHODS
Plasmids and proteins for crystallization
Genes encoding the NpuN and NpuC intein domains were subcloned into a modified pETSG vector. The gene encoding NpuN (102 amino acids) with a preceding His6 tag was inserted between NdeI and EcolI restriction sites and the gene encoding NpuC (36 amino acids) was inserted between AlwNI and XhoI restriction sites. The construct contains the architecture of [RBS]-[His6-NpuN]-[RBS]-[NpuC]. The plasmid was transformed into Escherichia coli strain BL21 cells. The protein expression was induced by 0.4 mM IPTG at a D600 of 0.6–0.8 at 37°C for 5 h. The cells were harvested by centrifugation and resuspended in buffer A [50 mM Tris/HCl, 200 mM NaCl, 5 mM 2-mercaptoethanol and 10 mM imidazole (pH 8.0)]. Then cells was lysed by sonication on ice for 8 min. Cell lysates were centrifuged at 34500 g for 30 min and the supernatant was filtered through 0.45-μm and 0.22-μm membranes. The solution was loaded on to Ni-NTA (Ni2+-nitrilotriacetate; Qiagen) columns and was washed with buffer B [50 mM Tris/HCl, 200 mM NaCl, 5 mM 2-mercaptoethanol and 25 mM imidazole (pH 8.0)]. Then the target NpuN/NpuC protein complex was eluted with buffer C [50 mM Tris/HCl, 200 mM NaCl, 5 mM 2-mercaptoethanol and 250 mM imidazole, (pH 8.0)]. The elution containing NpuN/NpuC complex was further purified by gel filtration (Superdex-75 16/60; GE Healthcare) with the eluent of the crystallization buffer [20 mM Tris/HCl, 100 mM NaCl and 2 mM DTT (pH 8.0)]. The Se-Met (seleno-methionine) NpuN/NpuC protein complex was produced in minimal medium supplemented with 100 mg/l lysine, phenylalanine and threonine and 50 mg/l isoleucine, leucine, valine and Se-Met. The Se-Met protein production and purification were the same as described above. The concentration of NpuN–NpuC complex was determined by UV absorption at 280 nm.
Plasmids and proteins for cleavage and splicing assays
The pSKDuet01 plasmid encoding the His6–GB1 (Protein G B1 domain)–NpuN protein was used for the splicing assay . The protein expression was induced by IPTG. The plasmid encoding NpuC–CFN–GB1–His6 (where CFN is three amino acids from the native C-terminal extein sequence of DnaE intein) protein was modified from pSKBAD02 vector , and protein expression was induced by L-arabinose. Plasmids for different variants, including NArg50Ala, NLys73Ala and CSer35Ala mutants, were constructed using the QuikChange® Site-Directed Mutagenesis kit. Proteins were purified by affinity chromatography using Ni-NTA resin (Qiagen) followed by gel filtration (Superdex-75 16/60; GE Healthcare). The protein concentrations were determined by UV absorbance at 280 nm and by BCA assays. All plasmids used in the present study were verified by DNA sequencing.
Protein crystallization and structure determination
Initial crystallization conditions of Npu DnaE intein were obtained through utilization of several sparse matrix screens (Hampton Research) with the sitting-drop vapour-diffusion method at room temperature. The best crystal was obtained in the mixture solution [0.2 M ammonium nitrate and 20% (w/v) PEG 3350] for 3–4 days.
Diffraction data were collected on the beamline 1W2B of Beijing Synchrotron Radiation Facility. Before data collection, crystals were soaked for 5 s in a cryoprotectant consisting of 20% (v/v) glycerol in the crystal mother liquid and then flash-frozen in liquid nitrogen. The temperature was held at 100 K by liquid nitrogen during data collection. Data were processed with the program HKL2000.
The crystal structure of Npu DnaE intein was determined by single wavelength anomalous diffraction. Heavy-atom (Se) location and primary phasing were carried out using SOLVE  and the phase was improved by density modification in the program RESOLVE . The program Buccaneer  was used for model building through the improved phase. Refinement and manual correction were carried out using Phenix  and Coot  respectively. The final R and Rfree factors for Npu DnaE intein were 16.9% and 19.4% respectively. The stereochemistry of the structure was checked using MolProbity . Figures were prepared using PyMOL . The statistics of data collection and refinement are listed in Supplementary Table S1 (http://www.biochemj.org/bj/461/bj4610247add.htm).
In vitro protein trans-splicing assay
Purified His6–GB1–NpuN and NpuC–CFN–GB1–His6 precursor proteins were mixed in a 1:1 ratio (36 μM) at 25°C in splicing buffer [20 mM Tris/HCl, 300 mM NaCl and 10% (v/v) glycerol (pH 8.0)]. The splicing time courses started upon the addition of 5 mM TCEP [tris-(2-carboxyethyl)phosphine] to initiate the splicing. The reaction was quenched by dilution into one-sixth volume of 6× SDS/PAGE loading buffer. Splicing products were visualized by electrophoresis using a 15% SDS gel. Gels were stained with Coomassie Brilliant Blue. The scanned image was analysed for quantification using the program ImageJ (http://imagej.nih.gov/ij/). The percentage of the protein trans-splicing was calculated from the ratio of the splicing product to the sum of proteins.
In vivo protein trans-splicing assay
The in vivo splicing assay was conducted by using the plasmid encoding the His6–GB1–NpuN fusion protein along with the plasmid encoding the NpuC–CFN–GB1–His6 fusion protein. Both plasmids were transformed into E. coli BL21 cells. The expression of NpuC–CFN–GB1–His6 was induced with 0.05% L-arabinose for 35 min and then the expression of His6–GB1–NpuN protein was induced by 0.4 mM IPTG. After overexpression at 37°C for 5 h (starting from the addition of L-arabinose), cells were harvested by centrifugation and resuspended in buffer A. Then cells was lysed by sonication on ice for 8 min. Cell lysates were centrifuged at 34500 g for 30 min and then the supernatant was analysed by electrophoresis using a 15% SDS gel. Gels were stained with Coomassie Brilliant Blue.
DTT-induced N-terminal cleavage assay
To examine the N-terminal trans-cleavage, 36 μM N-terminal precursor, His6–GB1–NpuN was mixed with 12 μM NpuC (36 amino acid peptide) at 25°C in the cleavage buffer [20 mM Tris/HCl, 300 mM NaCl and 10% (v/v) glycerol (pH 8.0)]. The cleavage time courses started upon the addition of 200 mM DTT to induce the cleavage. The reaction was quenched by dilution into one-sixth volume of 6× SDS/PAGE loading buffer. Cleavage products were analysed using electrophoresis with a 15% SDS gel. Gels were stained with Coomassie Brilliant Blue.
Crystal structure of Npu DnaE Intein
The Npu DnaE intein consists of two separate domains: an N-terminal domain of 102 residues and a C-terminal domain of 36 residues. Because of the failure of earlier attempts at co-crystallization of two fragments of Ssp DnaE split intein , we established a co-expression system to obtain the two splicing domains simultaneously. This allowed the two domains to bind to each other and fold correctly in vivo. The plasmid that carries the two gene segments encoding NpuN and NpuC was equipped with two ribosome-binding sites. Both the N- and C-terminal domain of Npu DnaE intein were thus co-expressed simultaneously and assembled into a native complex in vivo . The N-terminal domain is preceded by a His tag to facilitate purification. Accordingly the two domains co-purified as a protein complex of NpuN–NpuC for crystallization.
The crystal structure of the Npu DnaE intein was determined by single wavelength anomalous diffraction and was refined to a 1.5 Å resolution (PDB code 4LX3). The statistics of data collection and structure refinement are listed in Supplementary Table S1. The protein complex consists of nine curved β-strands (β1–β9) and two short α-helices (α1 and α2) from two split domains of NpuN and NpuC (Figure 2a). It has been observed that a pair of additional β-strand close to the C-terminal tail of NpuN are always present in the cis-structure of the split inteins (Figure 2b) [23,26,27]. This difference is probably due to the artificial fusion of the split inteins. This structure resembles a HINT (Hedgehog/intein)-type fold, observed in all intein structures solved to date [35,36]. Although the two domains form a compact horseshoe structure, the C-terminal tail of NpuN (residues 95–102) is flexible as this region lacks electron density. On the basis of NMR spectroscopy this region is flexible also even in the fused intein . The two domains associate via two long anti-parallel β-strands (β5 and β7) involving a large number of hydrogen bonds and hydrophobic-packing interactions. The C-terminal β-strand of NpuC (β9) folds back on to the centre of the protein mainly through hydrophobic interactions. Together with N-terminal residues from NpuN, this strand forms the catalytic core.
The structure of the Npu DnaE intein
In spite of large variations in primary structure of different inteins, all intein structures solved are very similar, indicating that the conserved HINT fold is required for protein splicing. It is therefore not surprising that the overall fold of the native Npu DnaE intein is similar to those reported for the artificially fused split inteins (Figure 2b) [25,26]. Nevertheless, we observe significant differences at the catalytic site residues, in particular for those at the C-terminal active site. These differences help clarify the mechanism that underlies the high splicing efficiency of the Npu DnaE intein.
The role of the conserved arginine and serine in the asparagine cyclization
Cyclization of the C-terminal asparagine is the third step in protein splicing, and involves a nucleophilic attack of the side chain amide nitrogen of the C-terminal asparagine (glutamine in some cases) on the carbonyl carbon of the downstream scissile peptide bond [1,37–39]. Although cyclization of asparagine in inteins is rapid, asparagine cyclization in model peptides or other proteins may take several days at 37°C and pH 7.4 . Hence, inteins have evolved more efficient ways to promote asparagine cyclization. In most canonical inteins, the conserved penultimate histidine facilitates asparagine cyclization by stabilizing the intermediate through a hydrogen bond to the asparagine carbonyl oxygen and, thus, this residue is crucial for C-terminal cleavage . However, a number of non-canonical inteins, including Npu DnaE intein and Ssp DnaE intein, lack this penultimate histidine, and must therefore use a different mechanism to facilitate asparagine cyclization (see the Discussion section).
In the absence of the penultimate histidine in the Ssp DnaE intein, NArg73 might promote asparagine cyclization . The crystal structure shows that the NArg73-Nη atom forms a hydrogen bond with the C-terminal carboxy oxygen of CAsn36 (2.6 Å) in the excised Ssp DnaE intein (PDB code 1ZD7), and such interaction is believed to promote asparagine cyclization . Sequence alignment shows that NLys73 in the Npu DnaE intein is the residue that corresponds to NArg73 in the Ssp DnaE intein (Supplementary Figure S2). Although the structure of the C-terminal active core is well defined, the side chain of the C-terminal CAsn36 has two orientations (Figure 3, and Supplementary Figure S3 at http://www.biochemj.org/bj/461/bj4610247add.htm). In either orientation, the distance between Nη of NLys73 to the carbonyl oxygen of CAsn36 is too great to allow the formation of a hydrogen bond (8.6 Å or 10.6 Å) (Figure 3a) (PDB code 4LX3). Thus NLys73 is unlikely to stabilize the tetrahedral intermediate formed by CAsn36 during asparagine cyclization .
The structural details of Npu DnaE intein
To explore the role of NLys73 in the Npu intein, we performed protein splicing assays on the wild-type intein and the NLys73Ala mutant. The precursor proteins contain GB1 as the extein in the precursors (GB1–IN and IC–GB1), so that trans-splicing generates the product (GB1–GB1) (G–G). The kinetics of the reaction shows that the splicing efficiencies of the wild-type intein and the NLys73Ala mutant were very similar in vitro (Figure 4a). Furthermore, the in vivo splicing assay showed that the same product was formed in this NLys73Ala mutant (Supplementary Figure S4 at http://www.biochemj.org/bj/461/bj4610247add.htm). The NLys73Ala mutation thus does not influence splicing of the Npu intein, and confirms the lack of involvement of NLys73 in asparagine cyclization in the Npu intein.
If NLys73 is not involved in asparagine cyclization, this raises the question as to which residue does contribute to catalysis. Although the structure of the C-terminal active core of the Npu intein is well defined, the C-terminal CAsn36 has two possible orientations (Figure 3c). This unique feature, absent from other canonical inteins, probably contributes to asparagine cyclization in the Npu intein. The side chain of CAsn36 is oriented towards either the penultimate CSer35 or to NArg50. The C-terminal carbonyl oxygen forms a hydrogen bond with the hydroxy group of CSer35 (2.8 Å) in one orientation, and a hydrogen bond with the side chain nitrogen of NArg50 (Nη) through an ordered water molecule (Wat18) in another orientation (Figures 3c and 3d). This structural information suggests that CSer35 and/or NArg50 could be the residues that promote asparagine cyclization.
To verify this assumption, trans-splicing assays were performed on the Npu intein with CSer35Ala or/and NArg50Ala mutation. The kinetics of the reaction shows that either CSer35Ala or NArg50Ala mutation decreased the splicing rate, whereas the double mutation of both CSer35 and NArg50 reduced splicing even further (Figures 4b and 4c). NArg50 and CSer35 thus play a synergistic role in facilitating the asparagine cyclization in the Npu intein. The penultimate histidine in canonical inteins is hydrogen bonded to the carbonyl oxygen of asparagine, and stabilizes the negatively charged oxyanion hole in the tetrahedral intermediate [24,26]. In the case of the Npu DnaE intein, CSer35 and NArg50 could play a role similar to that of the penultimate histidine in other inteins. Thus the CAsn36 oxyanion is stabilized by hydrogen bonding to the side chain of CSer35 or NArg50 with different side-chain orientations of CAsn36 in the Npu DnaE intein (Figure 3d).
Time-dependent trans-splicing assay of the Npu DnaE intein
To explore whether this function of the CSer35 and NArg50 residues is present in other inteins as well, we carried out sequence alignments. Currently, 18 DnaE split inteins have been reported to lack the penultimate histidine . Sequence alignment shows that most of them (16 out of 18) have a conserved arginine at a position equivalent to NArg50 of the Npu intein, whereas the other two inteins have a glutamine at the corresponding position (Supplementary Figure S2). The residues corresponding to CSer35 of the Npu intein are primarily serine (14 out of 18 species) with four exceptions (alanine). Residues at the position of NArg50 and CSer35 of the Npu intein are thus highly conserved in the split inteins that lack the penultimate histidine, although NArg50 is not part of a conserved motif. In addition, CSer35 is not conserved in the canonical inteins. Our sequence alignments therefore suggest the significance of NArg50 and CSer35 particularly in the DnaE split inteins. This finding also supports the hypothesis that their hydrogen bonding with the C-terminal asparagine facilitates asparagine cyclization.
The role of conserved arginine in the N-terminal cleavage
As NArg50 is also close to the N-terminal splicing core residues, we further analysed the structural details to investigate its role in N-terminal cleavage. The N-terminal splicing domain, which catalyses the N-terminal N–S acyl shift , is formed by a series of conserved residues, including NCys1 and NLeu2 in the A-block, NThr69 and NHis72 in the B-block, and CPhe16 and CAsp17 in the F-block (Figure 5a) [1,26]. These residues are connected through a hydrogen-bonding network, forming the N-terminal catalytic core. In the Npu intein, the crystal structure shows that, NArg50 is very close to the F-block aspartate CAsp17.
The structural details of the Npu DnaE intein
The F-block aspartate is conserved and facilitates both the N–S acyl shift and trans-esterification [41–43]. In the course of protein splicing, the F-block aspartate sequentially hydrogen bonds with the N-terminal NCys1 or the C-extein Cys+1 at different steps in the reaction . Mutation of the F-block aspartate in Mtu (Mycobacterium tuberculosis) RecA intein enhances C-terminal cleavage, and so contributes to low splicing efficiency . The solution NMR structure of Mja (Methanococcus jannaschii) KlbA intein indicates that the carboxylate group of F-block aspartate (Oγ) forms a hydrogen bond (3.1 Å) to the preceding scissile peptide bond. This hydrogen bond is proposed to lower the activation energy for the N–S acyl shift and so facilitates the initial nucleophilic attack . This hydrogen bond is also observed in the crystal structure of the Npu intein: the CAsp17 Oγ is at a distance of 2.6 Å from the carbonyl oxygen at the splice junction. However, such interactions are absent from the crystal cis-structure of pre-Ssp (3.8 Å) (PDB code 1ZDE). These different interactions might also correlate with the differences in splicing efficiency.
As the F-block aspartate exhibits different hydrogen bond connections for the Npu intein compared with the Ssp intein, we further analysed the structural details of this residue in the two inteins. The superimposition of their protein structures shows the different orientations of the side chain of CAsp17 for the Npu DnaE intein, the pre-Ssp DnaE intein (PDB code 1ZDE) and the ex-Ssp DnaE intein (PDC code 1ZD7) (Figure 5b). NArg50 forms a hydrogen bond with CAsp17 in the Npu intein, but not in the ex-Ssp intein (PDB code 1ZD7). The CAsp17 carboxylate oxygen is 2.8 Å away from the Nε atom and 3.6 Å from the Nδ atom of the NArg50 side chain in the Npu intein, whereas in the ex-Ssp intein (PDB code 1ZD7) these distances are 5.1 Å and 5.3 Å respectively (Figure 5b). The hydrogen bonding and the electrostatic interaction between NArg50 and CAsp17 in the Npu intein could drive the orientation of the side chain of CAsp17, and thus influence the N–S acyl shift.
The stabilization of the orientation of CAsp17 by NArg50 in the Npu intein implies that this arginine could also influence the N–S acyl shift in the Npu DnaE intein. We performed a DTT-induced in vitro N-terminal cleavage assay, using an N-terminal precursor protein containing a GB1 domain as the N-extein (GB1–IN). For the C-terminal protein, the NpuC (36 amino acids) without a C-extein was used in order to eliminate protein splicing and to detect only the N-terminal cleavage. The N-terminal cleavage product (IN at 11.8 kDa) was generated more rapidly from the NArg50Ala mutant than from the wild-type intein, indicating the NArg50Ala mutation improved the rate of N-terminal cleavage (Figure 6).
Time-dependent N-cleavage of the wild-type Npu DnaE intein and the NArg50Ala mutant
Several structures of split inteins have been solved, but all of them represent fusions in a continuous sequence [25,26]. The overall structures of these fused split inteins are highly similar to those observed for naturally occurring cis-inteins. The HINT fold is required for protein splicing. Within this protein fold, several residues, including the A-block, B-block histidine and F-block aspartate, form the catalytic core and execute the four-step splicing reaction [1,2,44,45]. These residues are highly conserved in all intein sequences. Nevertheless, the significantly different activities of inteins from various species suggest that residues other than these conserved catalytic residues modulate intein efficacy. Sequence alignment indicates that the conserved non-catalytic residues are important for intein activities, although the role of these residues is not exactly clear .
In the present paper we report the first X-ray crystallography structure of a naturally occurring split intein. Although the catalytic core residues in this intein are similar to those in other inteins, the native structure of the split intein allows a more precise analysis of the role of these non-catalytic residues. On the basis of our data, the proposed role of NLys73 in promoting asparagine cyclization was excluded for the Npu intein, further supported by activity assays. On the other hand, the side-chain orientation and the hydrogen-bond interactions suggest that CSer35 and NArg50 modulate splicing activity by facilitating the cyclization of asparagines.
As CSer35 and NArg50 together serve the function performed by the penultimate histidine in canonical inteins, this synergy may enhance the intein's efficiency. Interestingly, among the 18 inteins used for comparison, the four inteins with an alanine at the penultimate position are all low-activity inteins, ranking 13–16 among the 18 inteins, with the Ssp DnaE intein being one of them (Supplementary Figure S2) . In addition to the role of conserved non-catalytic residues reported in the literature , the conserved CSer35 and NArg50 residues modulate protein trans-splicing by direct interactions with catalytic residues. This finding might explain the different splicing activities observed for the Npu DnaE and the Ssp DnaE inteins.
The enhanced rate of N-terminal cleavage of the NArg50Ala mutant implies that disruption of the hydrogen bond between NArg50 and CAsp17 could enhance the N–S acyl shift. This could be correlated to the lower splicing efficiency of the NR50A mutant in the splicing assay (Figures 4b and 4c). It has been proposed that the co-operation and co-ordination of each step during protein splicing is crucial for protein splicing to proceed in an orderly fashion [44–47]. Therefore a proper N–S acyl shift rate is required to perform the trans-esterification step for the protein splicing, and the overly rapid N–S acyl shift prevents protein splicing by forming cleavage products. The Mtu RecA intein, for instance, contains a conserved F-block aspartate (Asp422). By altering the protonation status, Asp422 sequentially forms a hydrogen bond with Cys1 and C-extein Cys+1 during the first two steps of protein splicing . The mutation of this residue (D422G) results in a cleavage mutant (CM), which possesses much greater C-cleavage activity , as well as significantly higher activity when assayed for DTT-induced N-cleavage (Supplementary Figure S5 at http://www.biochemj.org/bj/461/bj4610247add.htm). Similar to the function of Asp422 in the RecA intein, CAsp17 in the Npu DnaE intein also determines the N–S acyl shift rate by hydrogen bonding to the carbonyl oxygen of the preceding scissile peptide bond at the splice junction. NArg50 could stabilize the side-chain orientation of CAsp17 by a hydrogen bond, therefore the mutation of NArg50 leads to enhanced N-cleavage, similar to the CM mutant of RecA intein. The effect of the N–S acyl shift on protein splicing through the NArg50–CAsp17 interaction cannot be experimentally tested, since NArg50 also influences the C-terminal splicing reaction. Interestingly, the two residues, NArg50 and CAsp17, do not form a hydrogen bond in the ex-Ssp intein (PDB code 1ZD7), and more cleavage products were generally formed in the Ssp intein than in the Npu intein .
It is presumed that the penultimate histidine is needed for protein splicing for most canonical inteins. However, some inteins that lack this penultimate histidine may utilize an alternative mechanism to perform the third reaction step [24,49]. Although the mutation of the penultimate histidine often leads to a decrease in activity, for inteins that naturally lack the penultimate histidine, introduction of a histidine residue does not always enhance activity [24,49]. In addition, even in some canonical inteins, the penultimate histidine may not actively participate in the cyclization of asparagine when the distance of the imidazole group to the C-terminal asparagine [40,49] is too great. Although the interactions involving conserved residues are crucial for protein splicing, subtle alterations in protein structure affect the side-chain interactions of these residues and hence alter intein activity.
In the present study we have determined the first crystal trans-structure of a naturally occurring split intein, the Npu DnaE intein. The structural analysis indicates that the C-terminal asparagine (CAsn36) can display two orientations of its side chain. With these different geometries, CAsn36 could interact through hydrogen bonds with either NArg50 or CSer35, that could therefore facilitate the cyclization of asparagine, and thus assume the function of the penultimate histidine in other canonical inteins. The in vitro splicing assay shows that NArg50 and CSer35 play a synergistic role in modulating trans-splicing. In addition, NArg50 also couples with CAsp17 to modulate the participation of the NCys1 thiol in the N–S acyl shift and thereby prevents the N-cleavage side reaction. NArg50 and CSer35 are highly conserved in those DnaE inteins that lack the penultimate histidine. Combined our results add to our understanding of the mechanism of protein trans-splicing.
Yangzhong Liu conceived and designed the project; Qin Wu and Yuchuan Zheng performed the protein expression experiments and in vitro assays; Zengqiang Gao, Yong Wei and Yuhui Dong obtained the single crystal X-ray data and solved the protein structure; and all authors analysed and interpreted the data and contributed to the writing of the paper.
We thank Professor Hidde L. Ploegh (MIT, Cambridge, MA, U.S.A.) for critical reading of the paper and valuable comments before submission.
This work was supported by the National Science Foundation of China [grant numbers U1332210 and 21171156], the National Basic Research Program of China (973 Program) [grant number 2012CB932502], the China Postdoctoral Science Founded Projects [grant number 2013M 541831] and the Fundamental Research Funds for the Central Universities.
The structure factors of Npu DnaE intein have been deposited in the PDB under code 4LX3.