N-terminal signal peptides direct secretory proteins into the ER (endoplasmic reticulum) of eukaryotes or the periplasmic space of prokaryotes. A hydrophobic core (h-region) is important for signal sequence function; however, the mechanism of h-region action is not resolved. To gain new insight into signal sequences, bioinformatic analysis of h-regions from humans, Saccharomyces cerevisiae, Trypanosoma brucei and Escherichia coli was performed. Each species contains a unique set of peptide motifs (h-motifs) characterized by identity components (i.e. sequence of conserved amino acids) joined by spacers. Human h-motifs have four identity components, whereas those from the other species utilize three identity components. Example of h-motifs are human Hs3 {L-x(2)-[AGILPV]-L-x(0,2)-L}, S. cerevisiae Sc1 [L-x(0,2)-S-x(0,3)-A], T. brucei Tb2 {L-x(1,2)-L-[AILV]} and E. coli Ec1 [A-x(0,2)-L-x(0,3)-A]. The physiological relevance of h-motifs was tested with a T. brucei microsomal system for translocation of a VSG (variant surface glycoprotein)-117 signal peptide. Disruption of h-motifs by scrambling of sequences in h-regions produced defective signal peptides, although the hydrophobicity of the peptide was not altered. We conclude that: (i) h-regions harbour h-motifs, and are not random hydrophobic amino acids; (ii) h-regions from different species contain unique sets of h-motifs; and (iii) h-motifs contribute to the biological activity of ER signal peptides. h-Regions are ‘scaffolds’ in which functional h-motifs are embedded. A hypothetical model for h-motif interactions with a Sec61p protein translocon is presented.

INTRODUCTION

Secretory and integral membrane proteins within the secretory system are important for organelle biogenesis and cell communication with the environment. In eukaryotes, targeting of these proteins to the appropriate intracellular location involves Sec61p translocon-mediated import of the polypeptides into the ER (endoplasmic reticulum). In bacteria, archaea and eukaryotes, signal sequences are important for secretion of proteins (reviewed in [1,2]). Newly discovered functions of signal sequences include interorganelle (ER to nucleus) protein transport [3].

ER signal sequences are composed of three subregions: (i) a positively charged n-region; (ii) a non-charged h-region; and (iii) a polar c-region. Mutations in the h-region abolish gating of the Sec61p translocon [4]. Therefore the h-region is vital for the function of a signal sequence. Despite its importance in gating a protein-conducting channel, the h-region is widely believed to be formed from random hydrophobic amino acids [5], a proposal that is at odds with three observations. First, the composition of subregions of signal peptides differs among species [68]. If h-regions were random hydrophobic residues, one would not expect signal sequences from different biological families to vary in amino acid composition (reviewed in [2,9]). Secondly, most proteins that perform their biological functions by interacting with other proteins recognize specific sequences (or degenerate) peptide motifs. Given the importance of correctly selecting the right proteins for entry into the ER, it would be extraordinarily unusual if the Sec61p translocation pore were ‘gated’ by random hydrophobic peptides. Finally, signal peptides from one biological family are sometimes not recognized by translocons from other species [1013]. If h-regions were random amino acids, a Sec61p translocon from one cell would always recognize signal peptides from all species. Given these observations, we hypothesized that h-regions are not random hydrophobic sequences.

Hydrophobicity of signal peptides contributes to their biological function [12,1416], but does not appear to be sufficient to explain the activity of signal sequences [15,17]. For example, signal sequences from the protozoan parasite Trypanosoma brucei are not imported by the canine ER microsomal system. Yet some of the trypanosome signal sequences are as hydrophobic as non-trypanosomatid signal peptides that were imported into the canine ER [11]. In other biological systems, several investigators have shown that high hydrophobicity is not the most important feature of signal sequences [12,15,17].

To explore other properties of signal peptides that might influence their biological activity, we have employed bioinformatics and experimental approaches. Briefly, h-regions were predicted with PSORT II [18], and patterns of amino acids in the h-regions were extracted with PRATT [19]. For our purposes, the minimum length of a peptide motif that was deemed capable of ‘conveying biological information’ was set at three amino acids. This decision is based on the fact that a large number of proteins use short peptide-recognition motifs containing three conserved residues. Examples include oligosaccharyltransferase (NXS/T), casein kinase I (SXX[ST]) and retinoblastoma protein ([LI]XCX[DE]) [20].

In the present paper, we report the discovery of peptide motifs in h-regions (h-motifs) of eukaryotic (human, Saccharomyces cerevisiae and T. brucei) as well as prokaryotic (Escherichia coli) signal peptides. The h-motifs are conserved within a species, and are detectable in ~75–90% of signal sequences. We find that h-motifs contribute to biological activity of a signal peptide of the African trypanosome T. brucei.

MATERIALS AND METHODS

Signal sequences

Human and E. coli signal peptides were obtained from the SignalP server (http://www.cbs.dtu.dk/services/SignalP/) [21,22]. S. cerevisiae and T. brucei signal peptides were obtained from geneDB (http://www.genedb.org) [23]. For T. brucei, the sequences analysed had been experimentally documented to be residents of the secretory pathway. ‘Hypothetical proteins’ documented in the genome sequence were not used, since it is not easy to determine the N-terminus of such proteins in the absence of experimental data. Furthermore, one cannot be sure that those coding sequences are translated into proteins in vivo.

Bioinformatic analysis

H-regions of signal peptides were predicted with PSORT II (http://psort.nibb.ac.jp/form2.html) [18] and were analysed using PRATT (http://www.ebi.ac.uk/pratt/) [19] with the following initial parameters: C% (minimum percentage of sequences to match)=50; PL (maximum pattern length)=50; PN (maximum number of pattern symbols)=50; PX (maximum number of consecutive Xs)=5; FN (maximum number of flexible spacers)=5; FL (maximum flexibility)=2; FP (maximum flexibility product)=20; E (search greediness)=3. For each set of signal peptides, a refined set of conditions used for PRATT analysis is presented in the legend to Figure 1.

Peptide motifs in h-regions of signal sequences

Figure 1
Peptide motifs in h-regions of signal sequences

(A) Human h-motifs. A total of 200 human h-regions were analysed with PRATT using the parameters (see the Materials and methods section for details) C%=48, PL=50, PN=50, PX=5, FN=5, FL=6, FP=20, E=3, in an initial run. Thereafter, a minority (50) signal peptides that did not contain major h-motifs were re-analysed with the following settings: C%=25, PL=50, PN=50, PX=5, FN=5, FL=6, FP=20, E=3. (B) h-Motifs obtained from analysis of 106 S. cerevisiae signal sequences. Settings used for PRATT analysis were C%=40; PL=50; PN=50; PX=5; FN=5; FL=6; FP=20; E=3. (C) h-Motifs from T. brucei peptide patterns obtained from analysis of 49 trypanosome signal sequences. Conditions for PRATT analysis were; C%=50, PL=50, PN=50, PX=5, FN=5, FL=6, FP=20, E=3. These parameters produced motifs Tb1–Tb4 in 36 signal sequences. The remaining 13 sequences were analysed separately using the same PRATT settings to produce the minor motifs Tb5–Tb7. (D) h-Motifs found in 112 signal sequences from E. coli. PRATT analysis was executed with the following conditions C%=50, PL=50, PN=50, PX=5, FN=5, FL=6, FP=20, E=3, in an initial run.

Figure 1
Peptide motifs in h-regions of signal sequences

(A) Human h-motifs. A total of 200 human h-regions were analysed with PRATT using the parameters (see the Materials and methods section for details) C%=48, PL=50, PN=50, PX=5, FN=5, FL=6, FP=20, E=3, in an initial run. Thereafter, a minority (50) signal peptides that did not contain major h-motifs were re-analysed with the following settings: C%=25, PL=50, PN=50, PX=5, FN=5, FL=6, FP=20, E=3. (B) h-Motifs obtained from analysis of 106 S. cerevisiae signal sequences. Settings used for PRATT analysis were C%=40; PL=50; PN=50; PX=5; FN=5; FL=6; FP=20; E=3. (C) h-Motifs from T. brucei peptide patterns obtained from analysis of 49 trypanosome signal sequences. Conditions for PRATT analysis were; C%=50, PL=50, PN=50, PX=5, FN=5, FL=6, FP=20, E=3. These parameters produced motifs Tb1–Tb4 in 36 signal sequences. The remaining 13 sequences were analysed separately using the same PRATT settings to produce the minor motifs Tb5–Tb7. (D) h-Motifs found in 112 signal sequences from E. coli. PRATT analysis was executed with the following conditions C%=50, PL=50, PN=50, PX=5, FN=5, FL=6, FP=20, E=3, in an initial run.

Description of peptide motifs

A motif consists of ICs (identity components) that are joined by linker amino acids. Each IC can be represented either by a single amino acid (fixed component) or multiple (alternative) residues (i.e. ambiguous component) (see Figure 1 for an illustration).

Primer construction for VSG (variant surface glycoprotein)-117 h-region mutants

PCR mutagenesis was used to create VSG-117 h-region mutants from the template pVSG-117, using variations of the forward primer 5′-TAATACGACTCACTATAGGGaggagggtttttaccATGGACTGCCATACAAAGGAGACACTGGGGTCACACAATGGAGGCGATCAACGATGttcACACTATCAttcttcTACttcATCACTCCAGCG-3′. Each primer contained a T7 promoter (upper case italicized) and a translational enhancer (lower case underlined) [24]. The coding region of VSG-117 starts at position 36 and ends at primer position 83 (upper case); it includes the n-region of VSG-117. This is followed by primer positions 84–128, which encodes the h-region of the VSG-117 protein (upper case, italicized and underlined). All VSG-117 h-region mutants contained changes within the forward primer as exemplified above in the (Phe)4-VSG-11786 mutant (lower case, italicized, underlined nucleotide sections contained within the h-region). All VSG-117 proteins were truncated to 86 amino acids, using the reverse primer 5′-cgaacaacgaaggggttcTTATAGTGCGTAGATCGTAGCTTCGTTTC-3′, which has a stop codon (upper case underlined) after the 27 nucleotides priming for nucleotides 295–321 (upper case italicized) of the pVSG-117 sequence.

Generation of VSG-117 mutant DNA

VSG-117 PCR products were generated in 100 μl volume reaction mixtures, purified and used for mRNA synthesis as described in [25].

Microsome preparation from bloodstream-form T. brucei

Microsomes were prepared by modification of a previous procedure [25]. Bloodstream-form T. brucei Lister 427 (1010 cells/ml) were resuspended in 5 ml of fresh homogenization buffer [25]. In a pre-chilled Dounce homogenizer, 2.5 ml of resuspended cells were lysed on ice by two repetitions of 40 strokes with a tight-fitting pestle, with a 1 min break after 40 strokes. The homogenates were pooled, divided into 1 ml aliquots in microcentrifuge tubes, and centrifuged at 400 g for 10 min at 4°C. The supernatants from all tubes were pooled, divided into 1 ml aliquots in a new microcentrifuge tube, and centrifuged at 13400 g for 20 min at 4°C. The supernatants from this second centrifugation step were recovered and pooled. The pellets from the second centrifugation step were each resuspended in 50 μl (total volume) of fresh rough microsome buffer (RMB) [250 mM sucrose, 50 mM Hepes/KOH, 50 mM potassium acetate, 1 mM DTT (dithiothreitol), 0.5 μg/ml Tos-Lys-CH2Cl (tosyl-lysylchloromethane, ‘TLCK’) and 2.5 μg/ml leupeptin]. Microsome stocks were diluted, divided into 25 μl aliquots and quick-frozen [25].

In vitro translation and import of VSG into TbRMs (T. brucei rough microsomes)

RNA (250–500 ng) in DEPC (diethyl pyrocarbonate)-treated water (7 μl) was incubated at 65°C for 3 min. After cooling on ice for 1 min, 1.5 μl (75 μM final concentration) of an amino acid mixture lacking methionine and cysteine (Promega), 1.5 μl of [35S]cysteine and [35S]methionine (2.5 mCi total activity; MP Biomedicals) and 10 μl of rabbit reticulocyte lysate (Promega) were added to the RNA mixture, bringing the final volume to 20 μl. This reaction mixture was incubated at 37°C for 1 h, and translation was stopped with cycloheximide (1 mM final concentration). The translation reaction was then divided into four portions of 5 μl each. Two of these aliquots were left untreated, while the remaining two portions were supplemented with 1 μl of TbRMs (1 μl=1 equivalent, which has a D260 of 50). The four reaction aliquots were incubated at 37°C for 1.5 h and then transferred to ice. Next, one of each sample (translation mixture with or without TbRMs) was treated with a proteolysis mixture containing 3 M urea and 0.5 mg/ml PK (proteinase K) (final concentrations) on ice for 1 h. PMSF (34 mM final concentration) was added to quench the proteolysis reaction. Proteins were precipitated in 60% (NH4)2SO4 on ice for 15 min and centrifuged at 16100 g for 7 min at 4°C. The pellets were resuspended in 20 μl of 2.5× SDS sample buffer and resolved by SDS/PAGE in a Tris/Tricine gel system (16% resolving gel; 3% stacking gel). Radioactive polypeptides were detected using a phosphoimager, and bands were quantified.

Quantification of VSG import

Using Quantity-One software (version 4.6.5, Bio-Rad Laboratories), phosphoimages were adjusted so that gel bands corresponding to PK-protected VSG-11786 were visible. Next, background noise signals were subtracted using the ‘filter wizard’ application. A TIFF image preserving these adjustments to the gel image was acquired for purposes of presenting the data visually. However, band intensities were calculated from the original gel scan, which preserved the ‘raw data’. Gel bands in each lane were quantified by measuring the volume (counts/mm2) of the band representing VSG-11786. For background signal, the volume of an area in each lane of the gel image that best represented the average background pixel intensity was measured within an area equivalent to the dimensions of the box that was being quantified. Next, a volume analysis report was generated to calculate the ‘adjusted volume’ (volume of VSG-11786 band–volume of average background bands) of the VSG-11786 bands. VSG import percentages were obtained as follows: the ‘adjusted volume’ of a band detected in the presence of TbRM and PK was divided by the ‘adjusted volume’ corresponding to a band detected in the presence of TbRMs, but without PK. This quotient was multiplied by 100 to obtain the percentage of VSG import. Experiments were performed with groups of signal peptides. Each set-up contained an unmutated VSG-117 as control, and the percentage of imported signal peptide reported is that obtained in comparison with the amount of PK-protected unmutated VSG-117.

RESULTS

Hypothesis: h-regions are not random hydrophobic residues

H-regions of signal sequences are critical for ‘gating’ a Sec61p protein translocation pore at the ER [2628]. Curiously, it has been proposed, on the basis of genetic selection experiments, that h-regions are random hydrophobic amino acids [5]. The conclusion is surprising because a Sec61p protein translocation pore has significant specificity for signal peptides [29,30]. Furthermore, protein–protein (or protein–peptide) interactions are typically governed by binding domains in the receptor and peptides (or motifs) in the ligand. Signal peptides are ligands for the Sec61p translocon [31]. Given the specificity of Sec61p for functional signal peptides [28], it seems counterintuitive that the translocon is gated by random hydrophobic amino acids. Secondly, signal sequences from different species appear to be incompatible, in general, with translocation systems in other biological families [1113,3234], despite a well-publicized case of secretion of E. coli β-lactamase by Xenopus oocytes [35]. If h-regions were random, one would not expect species-specificity in signal sequence organization and/or function as numerous investigators have reported [7,1113,3234].

A two-part hypothesis could explain most of the published data on this issue (above). First, h-regions may not be random; they may contain ‘cryptic’ information that is not discerned by direct protein sequence alignments, the strategy used previously to compare the peptides [5]. This theory would explain why a consensus sequence for ER signal peptides has not emerged as yet. Secondly, different species contain variants of the ‘cryptic’ information. This part of the theory provides a rationale for the general non-interchangeability of signal peptides as mentioned above. Our hypotheses can be tested by attempts to (i) identify the ‘cryptic information’ in signal peptides, and (ii) determine whether the ‘cryptic’ information varies in different biological families.

Discovery of cryptic peptide motifs in human signal peptides

Peptide motifs are one possible type of ‘cryptic’ information in signal sequences. To discover such motifs in human signal sequences, we focused on h-regions since that ‘subdomain’ is essential for translocation of protein into the ER [15,3638]. Some 200 human h-regions were analysed using PRATT [19] to extract patterns of amino acids from the protein dataset.

Peptide motifs consist of ICs, which can be ‘fixed’ or ‘ambiguous’ and are separated by spacers. For example, in the motif L-[AGLV]-x(1,2)-L-x(0,1)-L (Prosite nomenclature) the three leucine residues are fixed ICs, whereas the bracketed amino acids ([AGLV]) are ambiguous. Spacers [e.g. x(1,2)], which indicate one or two unspecified residues, separate ICs. Motifs are characterized by three features: (i) composition of ICs; (ii) order of residues in the ICs; and (iii) length of spacers.

Human h-regions have four ICs. The three major h-motifs were Hs1 {L-[AGLV]-x-L-x(0,1)-L}, Hs2 {L-[AGLV]-L-x(1,2)-L} and Hs3 {L-x(2)-[AGILPV]-L-x(0,2)-L} (Figure 1A). Remarkably, 75% of human signal peptides in the dataset contained either Hs1 or Hs2. The remaining 25% of the h-regions contained minor motifs Hs4, Hs5, Hs6 or Hs7 (Figure 1A). The major h-motifs contain three leucine residues as fixed ICs with the ambiguous ICs leucine, alanine, valine, glycine, proline, phenylalanine and isoleucine (Figure 1A). Spacer length varied from one to three amino acids. An alignment of 25 h-regions containing a motif L-[AGLV]-x-L-x(0,1)-L is presented in Supplementary Figure S1A (http://www.BiochemJ.org/bj/426/bj4260135add.htm).

Discovery of motifs in human signal peptides proves that h-regions are not formed by random hydrophobic amino acids as proposed previously [5]. The fixed components are predominantly leucine (see Hs1, Hs2 and Hs3) and occasionally phenylalanine, alanine and valine (see Hs4, Hs5, Hs6 and Hs7) (Figure 1A). Leucine-richness of the human motifs is consistent with evidence that the amino acid has a role in gating a vertebrate Sec61p translocon ER [12]. Use of phenylalanine as an ambiguous IC {as in L-[FILV]-L-x(0,2)-F} might be unique, because phenylalanaine was not detected in h-motifs of other organisms (see Figures 1B–1D). Finally, polar amino acids (e.g. threonine, serine, tyrosine and tryptophan) are not used as ICs in human h-motifs.

S. cerevisiae h-motifs

Analysis of 106 signal peptides produced ten h-motifs in S. cerevisiae (Figure 1B). The four predominant peptide patterns (i.e. found in the largest number of unique signal sequences) were Sc1 [L-x(0,2)-S-x(0,3)-A], Sc6 [L-x(0,1)-L-x(0,3)-L], Sc4 [L-x(1,3)-S-x(0,3)-A] and Sc7 [L-x(0,1)-L-x(1,4)-L] (Figure 1B). In total, 90% of S. cerevisiae signal peptides contained at least one of the motifs described in Figure 1(B). An alignment of 25 h-regions containing a motif L-x(0,2)-S-x(0,3)-A is shown in Supplementary Figure S1(B).

S. cerevisiae h-motifs have three ICs. The fixed components are leucine, serine and alanine, whereas the ambiguous ICs are alanine, glycine, isoleucine, leucine and valine (AGILV) (Figure 1B). S. cerevisiae h-motifs are generally different from those found in human signal sequences (Figure 1A). First, S. cerevisiae h-motifs have three ICs, whereas humans have four conserved components. Secondly, the sets of amino acids used as fixed ICs are different, in general, between yeast and humans. Finally, frequent use of serine as a fixed IC is surprising because that residue is absent from human h-motifs (Figure 1A).

T. brucei signal peptides contain conserved motifs

Trypanosomatid signal peptides are generally not compatible with the ER protein import systems of vertebrates [10,13]. One hypothesis to explain these data is that T. brucei signal sequences contain h-motifs that differ from those in vertebrates (see Figure 1). To test this theory, 49 T. brucei signal sequences were analysed.

Seven h-motifs, the most popular of which are Tb4 [L-x(0,1)-L-x(0,5)-L], Tb2 {L-x(1,2)-L-[AILV]}, Tb3 {L-x(2,3)-L-[AILPV]} and Tb1 {L-L-x-[AILV]} (Figure 1C), were discovered in T. brucei (Figure 1C). The remaining signal peptides yielded three minor motifs, Tb5 [A-x(0,3)-T-x(0,2)-L], Tb6 [A-x(1,3)-V-x(0,3)-L] and Tb7 [A-x(0,6)-I-x(1,5)-A(Tb7)]. A total of 98% of T. brucei signal sequences contained at least one of the T. brucei h-motifs shown in Figure 1(C). An alignment of 37 sequences containing the motif L-L-[AGILPV] is presented in Supplementary Figure S1(C).

Several general features of T. brucei h-motifs are apparent. First, the T. brucei h-motifs have three ICs. Secondly, the fixed ICs in the major motifs were leucine. Thirdly, in the major motifs, the ambiguous ICs were alanine, isoleucine, leucine, proline and valine. Fourthly, in the minor h-motifs, alanine, threonine, valine and isoleucine emerged as fixed ICs. Fifthly, the minor motifs were alanine-rich and lacked variable ICs. Surprisingly, Tb7 did not contain leucine, a residue which was present in all other T. brucei h-motifs. Sixthly, phenylalanine, cysteine and methionine, although hydrophobic, are not found in h-motifs of T. brucei.

We conclude that signal peptides of T. brucei have h-motifs. Furthermore, the set of trypanosome h-motifs is distinct from the set present in human and S. cerevisiae signal peptides (compare Figures 1A–1C).

h-Motifs in E. coli

A significant fraction of protein secretion in bacteria depends on N-terminal signal sequences and a SecY translocon (reviewed in [8,39]). We hypothesized, in line with failure of some the E. coli signal peptides to function with eukaryote translocons [12,32], that E. coli h-motifs may be different from human h-motifs. Our hypothesis was evaluated with 112 E. coli signal sequences.

A total of 11 h-motifs were discovered in E. coli (Figure 1D), including Ec1 [A-x(0,2)-L-x(0,3)-A], Ec2 [A-x(2,4)-L-x(0,5)-A], Ec3 [A-x(1,3)-L-x(0,5)-A], Ec4 {L-x(1,2)-A-[AGILV]}, Ec5 [L-x(0,1)-A-x(0,5)-L], Ec6 [L-x(1,2)-A-x(0,5)-L], Ec7 [L-x(0,1)-A-x(0,5)-A], Ec8 [L-x(3,5)-A-x(0,5)-A], Ec9 [L-x(0,1)-L-x(0,5)-A], Ec10 [L-x(2,4)-L-x(0,4)-A] and Ec11 [L-x(3,5)-L-x(0,5)-A] (Figure 1D).

E. coli h-motifs have three ICs and they are alanine-rich. The fixed identity residues are alanine and leucine (Figure 1D). Ambiguous ICs are rare in E. coli h-motifs. Finally, tri-leucine h-motifs were not detected in E. coli, unlike h-motifs from eukaryotes (see Figures 1A–1C). An alignment of 25 h-regions containing the motif A-x(0,2)-L-x(0,3)-A is presented in Supplementary Figure S1(D).

Factors besides peak hydrophobicity affect VSG import into trypanosome microsomes

An h-region is required for biological activity of an ER signal peptide (reviewed in [40]). Hydrophobicity of h-regions influences their activity [5,14,15,4144], but signal sequences with similar hydrophobicities are not functionally equivalent ([11,12,33,34] and reviewed in [2,40]). Therefore we considered the possibility that signal sequence h-motifs, in addition to or instead of hydrophobicity, have an important role in the activity of signal peptides. Our hypothesis was examined with a T. brucei microsomal translocation system [25], and trypanosome VSG (VSG-117) as a substrate (VSG-117 is not imported into canine microsomes [11]). Amino acids (Leu4, Leu6, Leu8, Leu9 and Ala11, in the h-region numbering scheme) that constitute four h-motifs in VSG-117 are depicted in Figure 2. All four T. brucei h-motifs can be disrupted genetically in two or more ICs by simultaneous replacement of Leu4, Leu8, Leu9 and Ala11 with different amino acids.

T. brucei h-motifs in VSG-117 signal peptide

The importance of h-region hydrophobicity or motifs in ER protein import was tested with variants of VSG-117 in which Leu4, Leu8, Leu9 and Ala11 in the native sequence were all changed to glycine [named (Gly)4-VSG-117h] or serine [named (Ser)4-VSG-117h] (Figure 3A). The peak hydrophobicity of the VSG-117 h-region was reduced from 2.08 in unmutated VSG-117 (VSG-117h) to 0.43 [(Gly)4-VSG-117h] or 0.30 [(Ser)4-VSG-117h] respectively (Figure 3C). VSG import into TbRMs was expected to decrease in both (Gly)4-VSG-11784 and (Ser)4-VSG-11786, because of a decrease in hydrophobicity. Surprisingly, 60% of (Ser)4-VSG-117 was imported into TbRM, whereas translocation of (Gly)4-VSG-11786 into TbRM was undetectable (Figures 3B and 3C). Thus the hydrophobicities of (Gly)4-VSG-117 and (Ser)4-VSG-117 do not correlate with their signal peptide activity, indicating that biological activity of a signal peptide is not strictly dependent on its hydrophobicity.

h-Region variants of VSG-117 signal sequences are translocated to different extents into TbRMs

Figure 3
h-Region variants of VSG-117 signal sequences are translocated to different extents into TbRMs

(A) Sequences of h-regions for experiments described in (B) and (C). (B) Import of VSG-11786, (Gly)4-VSG-11786, (Ser)4-VSG-11786, (Ala)7-VSG-11786, (Ala)6-Ile12-VSG-11786 and (Ala)4-VSG-11786 into TbRM. mRNAs of VSG-117 variants were translated in rabbit reticulocyte lysate and imported into TbRMs, and import of the protein into TbRM was validated with (resistance to) PK digestion. Reaction mixtures were precipitated with ammonium sulfate, resolved by Tris/Tricine SDS/PAGE and detected using a phosphoimager and Quantity One software (Bio-Rad Laboratories). Gels pictured are representative of multiple experiments with similar results. Regions of the gels that contained the results for VSG-117 are presented. Lanes 1: no TbRM, no PK; lanes 2: no TbRM, with PK; lanes 3: with TbRM, no PK; lanes 4: with TbRM, with PK. Brackets denote lanes that were compared with each other in order to obtain a percentage of VSG imported into TbRM. (C) Peak hydrophobicity (black bars) and percentage import (grey bars) of VSG-117s. Peak hydrophobicity values were obtained by performing Kyte–Doolittle hydrophobicity analysis (window=seven amino acids), using LASERGENE (DNASTAR) (version 4.0.3). Percentages of VSG imported into TbRMs were calculated from data in (B) by dividing the amount of the residual VSG detected after protease treatment (lane 4) by the amount of the VSG detected after addition of TbRM with no PK treatment (lane 3) and multiplying the quotient by 100. Quantification of percentage VSG import may not correspond to the gel image because data for quantification were from raw data, whereas the images presented are adjusted to reveal the protease-protected bands, a move that leads to saturation of the control undigested band.

Figure 3
h-Region variants of VSG-117 signal sequences are translocated to different extents into TbRMs

(A) Sequences of h-regions for experiments described in (B) and (C). (B) Import of VSG-11786, (Gly)4-VSG-11786, (Ser)4-VSG-11786, (Ala)7-VSG-11786, (Ala)6-Ile12-VSG-11786 and (Ala)4-VSG-11786 into TbRM. mRNAs of VSG-117 variants were translated in rabbit reticulocyte lysate and imported into TbRMs, and import of the protein into TbRM was validated with (resistance to) PK digestion. Reaction mixtures were precipitated with ammonium sulfate, resolved by Tris/Tricine SDS/PAGE and detected using a phosphoimager and Quantity One software (Bio-Rad Laboratories). Gels pictured are representative of multiple experiments with similar results. Regions of the gels that contained the results for VSG-117 are presented. Lanes 1: no TbRM, no PK; lanes 2: no TbRM, with PK; lanes 3: with TbRM, no PK; lanes 4: with TbRM, with PK. Brackets denote lanes that were compared with each other in order to obtain a percentage of VSG imported into TbRM. (C) Peak hydrophobicity (black bars) and percentage import (grey bars) of VSG-117s. Peak hydrophobicity values were obtained by performing Kyte–Doolittle hydrophobicity analysis (window=seven amino acids), using LASERGENE (DNASTAR) (version 4.0.3). Percentages of VSG imported into TbRMs were calculated from data in (B) by dividing the amount of the residual VSG detected after protease treatment (lane 4) by the amount of the VSG detected after addition of TbRM with no PK treatment (lane 3) and multiplying the quotient by 100. Quantification of percentage VSG import may not correspond to the gel image because data for quantification were from raw data, whereas the images presented are adjusted to reveal the protease-protected bands, a move that leads to saturation of the control undigested band.

Further evidence that the physiological activity of a signal peptide does not always correlate with the hydrophobicity of a sequence was obtained by studying a VSG-117 variant named (Ala)7-VSG-117h. In this h-region, all seven hydrophobic amino acids in the VSG-117 h-region (i.e. Met3, Leu4, Leu6, Leu8, Leu9, Ala11 and Ile12) were changed to alanine (Figure 3A). (Ala)7-VSG-117h has a peak hydrophobicity greater than twice that of (Ser)4-VSG-117h (Figure 3C). Unexpectedly, (Ala)7-VSG-11786 failed to import VSG into TbRM (Figures 3B and 3C) in contrast with (Ser)4-VSG-11786.

We attempted to produce ‘gain-of-function’ mutants by re-introducing amino acids from the unmutated VSG-117 into the non-functional (Ala)7-VSG-117h signal peptide. Amino acids (i) Ile12, or (ii) Met3, Leu6 and Ile12 (together) were introduced into (Ala)7-VSG-117h, resulting in h-regions named (Ala)6-Ile12-VSG-117h and (Ala)4-VSG-117h respectively (Figure 3A). Changing Ala12 to Ile12 in (Ala)7-VSG-117h raised the hydrophobicity from 0.89 in (Ala)7-VSG-117h to 1.21 in (Ala)6-Ile12-VSG-117h. Replacement of Ala3, Ala6 and Ala12 in (Ala)7-VSG-117h with Met3, Leu6 and Ile12 in (Ala)4-VSG-117h increased hydrophobicity from 0.89 in (Ala)7-VSG-117h to 1.41 in (Ala)4-VSG-117h (Figure 3C). Interestingly, (Ala)6-Ile12-VSG117 was not imported into TbRM, whereas 82% of (Ala)4VSG-117 was translocated into TbRM (Figures 3B and 3C).

We conclude that h-region hydrophobicity does not reliably predict the translocon-gating activity of an N-terminal peptide at a trypanosome ER. Therefore some other property of the h-region is also very important for signal sequence activity of a hydrophobic N-terminal peptide. This conclusion is in line with those of other investigators who have evaluated the importance of hydrophobicity in the function of ER signal sequences [17,43].

The sequence of amino acids in the h-region influences VSG import into TbRM

We observed that h-region hydrophobicity alone could not explain the failure of T. brucei microsomes to import several trypanosome signal sequences (Figure 3). A hypothesis to explain these data is that the order of amino acids (i.e. peptide motifs) in an h-region is important for signal peptide activity.

Scrambling the sequence of an h-region is the best way to test whether h-motifs embedded in a signal sequence are biologically relevant, because hydrophobicity of the h-region remains virtually unchanged. It is important, however, that scrambled h-regions lose the original h-motifs, but do not acquire new motifs that were absent from the original sequence. With these criteria in mind, we considered a scrambled version of unmutated VSG-117 h-region. Unfortunately, that h-region contains multiple h-motifs and several leucine residues: it proved impossible to scramble that h-region without generating a new T. brucei h-motif. Consequently, we could not use the native VSG h-region for these experiments.

An ideal substrate to test the functional relevance of h-motifs would be a functional h-region that contained only one h-motif that could be scrambled without revealing new T. brucei h-region patterns. (Ala)4-VSG-117h and (Ser)4-VSG-117h proved excellent for this purpose. (Ala)4-VSG-117h contains a single Tb5 motif [A-x(0,3)-T-x(0,2)-L] (Figure 1C), which could be scrambled without generating new motifs. Similarly, (Ser)4-VSG-117h contains a variant of Tb5 where the alanine residue in Tb5 is replaced by serine [i.e. S-x(0,3)-T-x(0,2)-L]. Scrambling of (Ser)4-VSG-117h did not yield new T. brucei motifs. From these considerations, (Ala)4-VSG-117h and (Ser)4-VSG-117h were suitable for testing our h-motif hypothesis.

To test the theory that h-motifs were important for N-terminal hydrophobic peptides to operate as signal peptides, import into TbRM of VSG bearing (Ala)4-VSG-117h or (Ser)4-VSG117h sequences was compared with the translocation of VSG containing scrambled (Ala)4-VSG-117h or scrambled (Ser)4-VSG-117h sequences (Figure 4). (Ala)4-VSG-117h or (Ser)4-VSG-117h both contain Tb5 at positions 4, 5 and 6 of the h-region. Rearranging (Ala)4-VSG-117h reduced its peak hydrophobicity by approx. 0.3 unit, whereas a ‘scrambled (Ser)4-VSG-117h’ differed from (Ser)4-VSG-117h by 0.01 unit (Figure 4C). In the scrambled h-regions, Ala4, Thr5 and Leu6 which form the Tb5 motif in (Ala)4-VSG-117h are replaced by Thr4, Ala5 and Met6 in scrambled (Ala)4-VSG-117h and Thr4, Ser5 and Met6 in scrambled (Ser)4-VSG-117h. In a control variant of (Ala)4-VSG-117h, a tyrosine residue at position 10 was replaced by alanine, producing (Ala)4-Tyr9-VSG-117h (Figure 4); hydrophobicity of the two h-regions are basically identical (Figure 4C). Furthermore, (Ala)4-Tyr9-VSG-117h retains the Tb5 h-motif (see residues 4, 5, and 6). We emphasize that in all of these new constructs, the amino acid composition of each h-region was not altered in order to keep the hydrophobicity of the peptide unchanged. Consequently, changes in activity of the peptides as signal sequences between the scrambled and unscrambled pairs of h-regions can be attributed to presence or absence of h-motifs.

Scrambling of h-motifs inhibits VSG translocation into TbRM

Figure 4
Scrambling of h-motifs inhibits VSG translocation into TbRM

(A) Sequences of h-regions used for experiments described in (B) and (C). For scrambled h-regions, the sequences were checked computationally to ensure that a new h-motif had not been inadvertently created when the amino acid residues were rearranged. (B) mRNAs of VSGs were translated in a rabbit reticulocyte lysate, and imported into TbRM. After PK digestion, the reaction mixture was precipitated with ammonium sulfate, resolved by Tris/Tricine SDS/PAGE, and analysed using a phosphoimager and Quantity One software. Gels pictured are representative of multiple experiments with similar data. Lanes 1: no TbRM, no PK; lanes 2: no TbRM, with PK; lanes 3, with TbRM, no PK; lanes 4, with TbRM, with PK. Brackets specify lanes that were compared directly to obtain the percentage of VSG imported into TbRM for each experiment. h-Regions studied were VSG-11786, (Ala)4-VSG-11786, scrambled (Ala)4-VSG-11786, scrambled (Ser)4-VSG-11786 and (Ala)4-Tyr9-VSG-11786. (C) Peak hydrophobicity (black bars) and percentage import (grey bars) of VSG-117s. Peak hydrophobicity values were obtained by Kyte–Doolittle hydrophobicity analysis (window=seven amino acids), using LASERGENE (version 4.0.3). Percentages of VSG imported into TbRM are from (B); it was calculated by dividing the amount of the residual VSG detected after protease treatment by the amount of the VSG detected after addition of TbRM without the protease treatment, and multiplying the proportion by 100.

Figure 4
Scrambling of h-motifs inhibits VSG translocation into TbRM

(A) Sequences of h-regions used for experiments described in (B) and (C). For scrambled h-regions, the sequences were checked computationally to ensure that a new h-motif had not been inadvertently created when the amino acid residues were rearranged. (B) mRNAs of VSGs were translated in a rabbit reticulocyte lysate, and imported into TbRM. After PK digestion, the reaction mixture was precipitated with ammonium sulfate, resolved by Tris/Tricine SDS/PAGE, and analysed using a phosphoimager and Quantity One software. Gels pictured are representative of multiple experiments with similar data. Lanes 1: no TbRM, no PK; lanes 2: no TbRM, with PK; lanes 3, with TbRM, no PK; lanes 4, with TbRM, with PK. Brackets specify lanes that were compared directly to obtain the percentage of VSG imported into TbRM for each experiment. h-Regions studied were VSG-11786, (Ala)4-VSG-11786, scrambled (Ala)4-VSG-11786, scrambled (Ser)4-VSG-11786 and (Ala)4-Tyr9-VSG-11786. (C) Peak hydrophobicity (black bars) and percentage import (grey bars) of VSG-117s. Peak hydrophobicity values were obtained by Kyte–Doolittle hydrophobicity analysis (window=seven amino acids), using LASERGENE (version 4.0.3). Percentages of VSG imported into TbRM are from (B); it was calculated by dividing the amount of the residual VSG detected after protease treatment by the amount of the VSG detected after addition of TbRM without the protease treatment, and multiplying the proportion by 100.

In microsomal VSG import assays, a drastic reduction in signal peptide activity of the scrambled signal peptides was observed. First, 86% of (Ala)4-VSG-11786 was imported into TbRM, whereas scrambled (Ala)4-VSG-11786 was not translocated into TbRM (Figures 4B and 4C). Secondly, approx. 60% of (Ser)4-VSG-11786 was imported into TbRM but scrambled (Ser)4-VSG-11786 was not translocated into the microsomes (Figures 4B and 4C). In the positive control, (Ala)4-Tyr9-VSG-11786, where two amino acids were switched in the h-region without eliminating the Tb5 motif, VSG was imported into TbRM (Figures 4B and 4C).

Together, these data prove that the sequence of amino acids in an h-region is an important determinant of signal peptide activity. In our test h-region (Ala)4-VSG-117h, the Tb5 motif is essential for the hydrophobic N-terminal peptide to act as a signal sequence.

The VSG-117 h-region does not require leucine residues for biological activity

Human h-regions are leucine-rich [6,9,45] (Figure 1A), and leucine is important for an E. coli or S. cerevisiae signal peptide to be imported into canine microsomes [12,33]. Leucine is found in all but one of the h-motifs identified in T. brucei (Figure 1C). However, the importance of leucine in trypanosome signal peptide function is not known.

Data from (Gly)4-VSG-11786, scrambled (Ala)4-VSG-11786 and scrambled (Ser)4-VSG-11786 showed that leucine is not sufficient for efficient VSG import into TbRM, because they all contain a leucine residue at position 3 of the h-region, but they were not imported into TbRM (Figures 3B and 3C). From these observations, we hypothesized that, although leucine is not sufficient for signal peptide activity, it could facilitate efficiency of h-region signal peptide activity at TbRM. To investigate this concept, we made new variants of (Ala)4-VSG-117h where its single leucine residue at position 6 was changed to other hydrophobic amino acids, namely phenylalanine, valine or isoleucine. The resulting h-region variants were (Ala)4-Phe6-VSG-117h, (Ala)4-Val6-VSG-117h and (Ala)4-Ile6-VSG-117h respectively (Figure 5). Since (Ala)4-VSG-117h, (Ala)4-Phe6-VSG-117h, (Ala)4-Val6-VSG-117h and (Ala)4-Ile6-VSG-117h only differed in one non-polar amino acid at position 6, their h-region peak hydrophobicity values were almost identical (Figure 5C). Therefore any difference in the extent of VSG import among these h-regions is not attributable to peak hydrophobicity.

Effect of replacing Leu6 in (Ala)4-VSG-117 h-region with phenylalanine, valine or isoleucine on VSG signal sequence function

Figure 5
Effect of replacing Leu6 in (Ala)4-VSG-117 h-region with phenylalanine, valine or isoleucine on VSG signal sequence function

(A) Sequences of h-regions used in experiments described in (B) and (C). (B) Import of (Ala)4-VSG-11786, (Ala)4-Phe6-VSG-11786, (Ala)4-Val6-VSG-11786 and (Ala)4-Ile6-VSG-11786 into TbRMs. mRNAs of VSG-117 variants were translated in rabbit reticulocyte lysate. Assays for VSG-117 import into TbRMs were performed as described in the legend to Figure 4. Gels pictured are representative of multiple experiments with similar results. Lanes 1, translated VSG-117 with TbRM, no PK; lanes 2, translated VSG-117 with TbRM and treated with PK. Brackets mark pairs of bands that were compared directly in order to obtain the percentages of VSG-117 imported into TbRMs. (C) Peak hydrophobicity (black bars) and percentage import (grey bars) of VSG-117s.

Figure 5
Effect of replacing Leu6 in (Ala)4-VSG-117 h-region with phenylalanine, valine or isoleucine on VSG signal sequence function

(A) Sequences of h-regions used in experiments described in (B) and (C). (B) Import of (Ala)4-VSG-11786, (Ala)4-Phe6-VSG-11786, (Ala)4-Val6-VSG-11786 and (Ala)4-Ile6-VSG-11786 into TbRMs. mRNAs of VSG-117 variants were translated in rabbit reticulocyte lysate. Assays for VSG-117 import into TbRMs were performed as described in the legend to Figure 4. Gels pictured are representative of multiple experiments with similar results. Lanes 1, translated VSG-117 with TbRM, no PK; lanes 2, translated VSG-117 with TbRM and treated with PK. Brackets mark pairs of bands that were compared directly in order to obtain the percentages of VSG-117 imported into TbRMs. (C) Peak hydrophobicity (black bars) and percentage import (grey bars) of VSG-117s.

h-Region variants at Leu6 of (Ala)4-VSG-117 were imported into TbRM with different efficiencies. Although 59% (Ala)4-VSG-117 was translocated into the microsomes, 72, 46 and 49% of (Ala)4-Phe6-VSG-117, (Ala)4-Val6-VSG-117 and (Ala)4-Ile6-VSG-117 respectively was imported into TbRM (Figures 5B and 5C).

We conclude that Leu6 in the (Ala)4-VSG-117 h-region is not essential for signal peptide activity. However, the amino acids that replaced leucine are not functionally equivalent; the hierarchy of their translocation effectiveness is phenylalanine>leucine>valine=isoleucine. These findings are consistent with the bioinformatic analysis that not all h-motifs in T. brucei contain leucine (e.g. Tb7) (Figure 1C).

DISCUSSION

Signal sequences from different species contain unique sets of peptide motifs

In the course of their physiological actions, signal sequences are recognized by specific proteins including SRP54p, Sec62p and Sec61p [29,46]. The h-region is essential for signal sequence ‘gating’ of a Sec61p translocon at the ER (reviewed in [47,48]). More importantly, not every N-terminal hydrophobic peptide can gate a Sec61p translocon at the ER [1113,32,33]. Despite a requirement for specific protein interactions in their biological functions, h-regions are thought to consist of random hydrophobic amino acids [5]. The parameters that confer translocon-gating activity on a select group of hydrophobic N-terminal peptides are not completely understood (reviewed in [1,2]).

We propose an ‘h-motif hypothesis’ to explain how signal sequences are organized. The hypothesis explains the absence of a ‘consensus sequence’ for ER signal peptides, although their biological functions call for specific interactions with proteins (e.g. Sec61p). We propose, first, that h-regions of signal sequences contain peptide motifs (h-motifs). A single h-region may contain multiple h-motifs. Since h-motifs are not identical in different proteins or even within a single h-region with multiple motifs, a traditional consensus sequence for signal sequences cannot be derived from sequence alignments. Secondly, we suggest that each h-motif can independently gate a Sec61p translocon at the ER membrane. Thirdly, each species has h-motifs that are unique ‘as a set’ for the biological family. Finally, we infer that h-motifs are optimized to function best within that species. The concept of a single receptor (e.g. Sec61p) binding to multiple peptide motifs is not new; adaptor protein complexes or protein kinases, for example, recognize different peptide patterns as ligands (reviewed in [49,50]). If our h-motif hypothesis were correct, new approaches will be needed to characterize ER signal peptides because a ‘consensus sequence’ cannot be discovered from an alignment of all signal sequences [79].

We have attempted to decipher peptide motifs in signal sequences, by using tools that were not available when signal peptides were originally described. We performed bioinformatic analysis of h-regions, since it (alone) is essential for translocation of proteins into the ER [12,28,33]. Furthermore, we analysed signal peptides from different species, because the h-motif hypothesis predicts that signal sequences from different biological families may have different motifs, which could also explain (at least in part) why, except in rare cases [35], signal peptides from different species are not functionally interchangeable [11,12,32,33].

Amino acids in h-regions are not random because conserved peptide motifs (h-motifs) exist (Figure 1). Differences in h-motifs are discerned from consideration of three features: (i) the number of ICs (i.e. conserved amino acids); (ii) the constituent amino acids; (iii) the sequence of amino acids; and (iv) the spacers that separate ICs. For example, human h-motifs have four ICs {e.g. L-[AGLV]-x-L-x(0,1)-L} (Figure 1A), whereas all other species examined in the present study have three ICs [e.g. L-x(0,4)-S-x(0,4)-A of a yeast] (Figure 1B).

In general, each species has a unique set of h-motifs, and the most popular motifs are not identical among humans, S. cerevisiae, E. coli and T. brucei (see Figure 1). The composition of h-motifs is normally species-specific. For example, phenylalanine is found in human, but not T. brucei, h-motifs (compare Figures 1A and 1C). Similarly, serine is an IC only in S. cerevisiae h-motifs (Figure 1B). Finally, tri-leucine was present in different variations in eukaryotes, but it was not found in E. coli h-motifs (Figure 1). Tri-leucine might be a universal eukaryotic h-motif, if the spacer regions overlap in different species. For example, a T. brucei tri-leucine motif L-L-[AGILPV] could be included in a subset of the motifs from S. cerevisiae that satisfy the pattern L-x(0,1)-L-x(0,3)-L. However, not all tri-leucine motifs overlap. For example, the motif L-x(0,1)-L-x(0,3)-L does not overlap with L-x(4,5)-L-[AGILV] because the spacing of the first pair of leucine residues cannot be merged.

An h-motif could be viewed as a trio (or quartet) of positions with appropriate spacing in the h-region that must be occupied by amino acids with certain characteristics in order for a Sec61p (or SecYEG) translocon to be gated. The implication of the findings above is that ER signal peptide activity may not be generally interchangeable between species, because the set of h-motifs in different species is not identical. From this perspective, one would expect that, in general, signal sequences from one organism are not functionally interchangeable with those from other species, as reported by many investigators [1113,32,33]. However, in those rare instances where h-motifs overlap, for example the tri-leucine motifs (see the previous paragraph), a signal peptide may function, even if sub-optimally, in different biological families [35].

h-Motifs are important for biological functions of signal peptides

Detection of h-motifs using bioinformatics approaches (Figure 1) does not prove their biological relevance. Therefore we sought to test the physiological function of h-motifs experimentally. The logical path to investigating this problem is by mutagenesis of h-motif ICs to different residues, and investigating their effect on protein import into ER microsomes. However, changes of amino acids in h-motifs need to take the properties of the replacement amino acids into consideration. In vertebrates, high hydrophobicity of signal sequences correlates with their co-translational import into the ER [14,37]. In the yeast S. cerevisiae, where co-translational and post-translational pathways coexist, high hydrophobicity of a signal peptide correlates with steering of the signal sequence towards the co-translational pathway for entry into the ER [44]. For these reasons, it was important to design h-regions whose h-motifs could be eliminated without significant changes in the hydrophobicity of the signal peptide.

A rigid test of the biological importance of h-regions is to scramble the sequence and then evaluate whether the resulting (scrambled) h-region is translocation-competent. Scrambling of the sequence maintains the amino composition and introduces minimal changes in hydrophobicity of the h-region. The challenge with this approach is to make sure that the scrambled h-region did not acquire new h-motifs that were non-existent in the native sequence. In our studies, it was impossible to use the native VSG-117 sequence for our experiments because it contained multiple motifs that could not be scrambled without introducing (by accident) new T. brucei h-motifs. Fortunately, (Ala)4-VSG-117h could be used for the present study because it contains only Tb5 h-motif [A-x(0,3)-T-x(0,2)-L], which was successfully scrambled without introducing new h-motifs of T. brucei. The related h-region (Ser)4-VSG-117h contained a variant of the Tb5 motif where the alanine in the motif is replaced with serine [S-x(0,3)-T-x(0,2)-L]. This h-region was also scrambled without introduction of new h-motifs into the signal peptide.

In ER protein translocation studies, VSGs containing the scrambled h-motifs were less active when compared with the unscrambled versions (Ser)4-VSG-117h and (Ala)4-VSG-117h. Scrambled (Ala)4-VSG-117h had a peak hydrophobicity that was essentially identical with that of (Ala)4-VSG-117h. However, 80% of (Ala)4-VSG-11786 was translocated into TbRM, whereas import of scrambled (Ala)4-VSG-11786 was undetectable (Figures 4B and 4C). Similarly, whereas 65% of (Ser)4-VSG-11786 was imported into TbRM, its scrambled variant was not imported into the microsomes (Figure 4C). These data make a compelling case that h-motifs are important determinants of the protein translocation activity of a signal peptide.

Collectively, our data support the concept that h-motifs confer signal peptide activity on signal peptides. This documented physiological importance of h-motifs indicates that the primary structure (i.e. sequence) of the h-region plays a role in signal peptide activity. As an extension of these principles, it follows that translocation-competent h-regions do not need to be very hydrophobic; they will be active as long as the h-motifs are present (Figures 1, 3 and 4). An h-motif could be viewed as a trio (or quartet) of positions with appropriate spacing in the h-region that must be occupied by amino acids with certain characteristics in order for the signal peptide to gate the Sec61p translocon. However, some h-regions may be very hydrophobic if the ICs of the h-motifs are hydrophobic amino acids (e.g. leucine), as in the human signal sequences (Figure 1A). For human h-regions, leucine richness clearly increases the probability that an h-motif occurs in the peptide.

Hypothesis: h-motifs interact with TM (transmembrane domain) 2b and/or TM7 of Sec61α

If h-motifs are important for biological activity of signal sequences, how might they interact with the Sec61p receptor/translocon at the ER membrane? We do not focus on h-region interactions with SRP54 because not all proteins that bind to that protein are imported into the ER lumen [51]. Furthermore, signal peptides that are defective for translocation into the ER can bind SRP54 [28,52]. Two facts may inform our discussion of Sec61p interactions with an h-region. First, signal peptides can form α-helices [53]. Secondly, the receptor for signal peptides, Sec61p/SecY, may use TM2b and TM7 to bind signal sequences [29,31]. However, the parts of an h-region that bind directly to Sec61p are not known. Similarly, amino acid residues of Sec61p that bind to signal peptides have not been identified. We used a helical-wheel model of a VSG-117 h-region to suggest possible interactions with TM7 and TM2b of trypanosome Sec61p [31,54].

h-Motif ICs in Tb1–Tb5 (Figure 1C) correspond to positions 4, 5, 6, 8, 9 and 11 in a VSG-117 h-region. In the hypothetical helical-wheel model, ICs 4, 8 and 11 are placed on one side of an h-region α-helix, whereas ICs at positions 5, 6 and 9 are on the other side of the helix (Figure 6). Interestingly, transmembrane helices TM2b and/or TM7 of Sec61p (Figures 6B and 6C) are predicted to be on different sides of a helix formed by a signal peptide [31]. h-Motif IC amino acids on opposite sides of the α-helix might make three contacts with TM2b on one side of the h-region helix, and make three other contacts with TM7 on the opposite side of the helix (Figure 6C). We suggest that a single h-motif only makes three (or four, in the case of human h-motifs) obligatory contacts with either TM2b and/or TM7 of Sec61p.

Helical-wheel analysis of VSG-117 h-region and its proposed interaction with Sec61p of T. brucei

Figure 6
Helical-wheel analysis of VSG-117 h-region and its proposed interaction with Sec61p of T. brucei

(A) Helical-wheel model of VSG-117 h-region was generated with Protean (version 4.0.3; part of LASERGENE). Arrows point to ‘ICs’ (IC1, IC2, IC3, IC4, IC5 and IC6) in the h-region (see Figure 1C). (B) Illustrative view (top) of a VSG-117 signal peptide positioned between TM2b and TM7 of T. brucei Sec61α (TbSec61α). A homology model of TbSec61 translocon based on the X-ray structure of Sec61p from Methanococcus jannaschii [31] was constructed by Clyde F. Hardin (personal communication). Location of TM2b (blue) and TM7 (yellow) relative to a trypanosome signal peptide is presented, similar to that proposed for a mammalian signal sequence [31]. The h-region of the trypanosome signal peptide is shown in red and black. (C) Proposed interaction of ICs from an h-region with unidentified amino acids in TM2b (blue) and TM7 (green) of TbSec61α (not to scale; see text for details).

Figure 6
Helical-wheel analysis of VSG-117 h-region and its proposed interaction with Sec61p of T. brucei

(A) Helical-wheel model of VSG-117 h-region was generated with Protean (version 4.0.3; part of LASERGENE). Arrows point to ‘ICs’ (IC1, IC2, IC3, IC4, IC5 and IC6) in the h-region (see Figure 1C). (B) Illustrative view (top) of a VSG-117 signal peptide positioned between TM2b and TM7 of T. brucei Sec61α (TbSec61α). A homology model of TbSec61 translocon based on the X-ray structure of Sec61p from Methanococcus jannaschii [31] was constructed by Clyde F. Hardin (personal communication). Location of TM2b (blue) and TM7 (yellow) relative to a trypanosome signal peptide is presented, similar to that proposed for a mammalian signal sequence [31]. The h-region of the trypanosome signal peptide is shown in red and black. (C) Proposed interaction of ICs from an h-region with unidentified amino acids in TM2b (blue) and TM7 (green) of TbSec61α (not to scale; see text for details).

How could the model for h-region–Sec61p interactions attempt to explain the apparent selectivity of the protein translocation machinery in a named species for signal peptides from that biological family? We propose that Sec61p from different species have co-evolved to function optimally with native signal sequences by functionally maximizing interactions with h-regions from that species. This co-evolution of h-regions and the signal sequence-binding site of Sec61p could occur by optimizing the interacting amino acids in TM2 and TM7 with ICs in the h-region. In cases where the sets of amino acids are sterically incompatible, the Sec61p translocon may not be gated by the signal peptide.

h-Motifs can predict compatibility of signal peptide use between organisms

Some apicomplexan and cryptophyte proteins are imported into canine microsomes [5561], unlike trypanosome proteins [11,13]. These data raised the possibility that, among eukaryotes that are distantly related phylogenetically, signal sequences from some phyla may be more compatible with vertebrate ER protein import apparatus than others. To explore this concept, we tested whether human h-motifs (Figure 1A) could be used to predict whether signal sequences from apicomplexans (Plasmodium falciparum, Plasmodium lophurae, Toxoplasma gondii) and a cryptophyte (Guillardia theta) enter a vertebrate ER: signal sequences imported into canine ER are predicted to contain human h-motifs (Figure 1A).

From apicomplexan and cryptomonad sequences [5561], a database of 23 signal sequences was compiled, and the h-regions were determined using SignalP 3.0 [21]. When the database was queried for human h-motifs (Figure 1A) using 3of5 pattern search [62], ten out of the 23 h-regions contained at least one human h-motif. Altogether, human h-motifs were present 15 times in the ten apicomplexan signal peptides (see Supplementary Figure S2 at http://www.BiochemJ.org/bj/426/bj4260135add.htm). Specifically, P. falciparum and T. gondii possessed motifs Hs1, Hs4, Hs5 and Hs7 (Supplementary Figure S2), whereas G. theta, a cryptophyte, contained all the human h-motifs, except for Hs5 (Supplementary Figure S3 at http://www.BiochemJ.org/bj/426/bj4260135add.htm). Although P. falciparum Exp-1 and SERP I, P. lophurae HsRP, T. gondii NTPase, T. gondii GRA4 and G. theta UGG-transferase enter canine microsomes [55,56,60], only Exp-1 and NTPase contained known human h-motifs (Figure 1A). We envisage that these proteins possess other human h-motifs that were not discovered in our analysis because they belong to a minority group of human h-motifs. Such a situation will be understandable, since the signal peptides are most likely to be optimized to function best in apicomplexans and cryptophytes.

Abbreviations

     
  • ER

    endoplasmic reticulum

  •  
  • IC

    identity component

  •  
  • PK

    proteinase K

  •  
  • TbRM

    Trypanosoma brucei rough microsome

  •  
  • TM

    transmembrane domain

  •  
  • VSG

    variant surface glycoprotein

AUTHOR CONTRIBUTION

Josh Duffy developed the in vitro system for VSG-117 import into trypanosome microsomes. He designed and performed experiments to test the importance of h-motifs in the function of signal sequences and analysed data that he obtained. He also wrote parts of the manuscript. Bhargavi Patham assembled datasets of signal sequences and performed bioinformatic analysis to discover h-motifs in h-regions of ER signal peptides. She wrote parts of the manuscript. Kojo Mensa-Wilmot sought and guided the development of the microsomal system for import of trypanosome VSG-117. He envisioned h-motifs in ER signal sequences, directed all bioinformatic analysis and guided the design of experiments. He analysed data and wrote the manuscript.

FUNDING

The study was performed with support from the National Institutes of Health [grant number AI070141].

References

References
1
Martoglio
B.
Dobberstein
B.
Signal sequences: more than just greasy peptides
Trends Cell Biol.
1998
, vol. 
8
 (pg. 
410
-
415
)
2
Zheng
N.
Gierasch
L. M.
Signal sequences: the same yet different
Cell
1996
, vol. 
86
 (pg. 
849
-
852
)
3
Dultz
E.
Hildenbeutel
M.
Martoglio
B.
Hochman
J.
Dobberstein
B.
Kapp
K.
The signal peptide of the mouse mammary tumor virus Rem protein is released from the endoplasmic reticulum membrane and accumulates in nucleoli
J. Biol. Chem.
2008
, vol. 
283
 (pg. 
9966
-
9976
)
4
Oliver
J.
Jungnickel
B.
Gorlich
D.
Rapoport
T.
High
S.
The Sec61 complex is essential for the insertion of proteins into the membrane of the endoplasmic reticulum
FEBS Lett.
1995
, vol. 
362
 (pg. 
126
-
130
)
5
Kaiser
C. A.
Preuss
D.
Grisafi
P.
Botstein
D.
Many random sequences functionally replace the secretion signal sequence of yeast invertase
Science
1987
, vol. 
235
 (pg. 
312
-
317
)
6
Von Heijne
G.
Abrahmsen
L.
Species-specific variation in signal peptide design: implications for protein secretion in foreign hosts
FEBS Lett.
1989
, vol. 
244
 (pg. 
439
-
446
)
7
Von Heijne
G.
Signal sequences: the limits of variation
J. Mol. Biol.
1985
, vol. 
184
 (pg. 
99
-
105
)
8
Rusch
S. L.
Kendall
D. A.
Protein transport via amino-terminal targeting sequences: common themes in diverse systems
Mol. Membr. Biol.
1995
, vol. 
12
 (pg. 
295
-
307
)
9
Izard
J. W.
Kendall
D. A.
Signal peptides: exquisitely designed transport promoters
Mol. Microbiol.
1994
, vol. 
13
 (pg. 
765
-
773
)
10
Button
L. L.
Wilson
G.
Astell
C. R.
McMaster
W. R.
Recombinant Leishmania surface glycoprotein GP63 is secreted in the baculovirus expression system as a latent metalloproteinase
Gene
1993
, vol. 
134
 (pg. 
75
-
81
)
11
Al-Qahtani
A.
Teilhet
M.
Mensa-Wilmot
K.
Species-specificity in endoplasmic reticulum signal peptide utilization revealed by proteins from Trypanosoma brucei and Leishmania
Biochem. J.
1998
, vol. 
331
 (pg. 
521
-
529
)
12
Zheng
T.
Nicchitta
C. V.
Structural determinants for signal sequence function in the mammalian endoplasmic reticulum
J. Biol. Chem.
1999
, vol. 
274
 (pg. 
36623
-
36630
)
13
Ramirez
M. I.
Boscardin
S. B.
Han
S. W.
Paranhos-Baccala
G.
Yoshida
N.
Kelly
J. M.
Mortara
R. A.
Da Silveira
J. F.
Heterologous expression of a Trypanosoma cruzi surface glycoprotein (gp82) in mammalian cells indicates the existence of different signal sequence requirements and processing
J. Eukaryotic Microbiol.
1999
, vol. 
46
 (pg. 
557
-
565
)
14
Ryan
P.
Robbins
A.
Whealy
M.
Enquist
L. W.
Overall signal sequence hydrophobicity determines the in vivo translocation efficiency of a herpesvirus glycoprotein
Virus Genes
1993
, vol. 
7
 (pg. 
5
-
21
)
15
Tomilo
M.
Wilkinson
K. S.
Ryan
P.
Can a signal sequence become too hydrophobic?
J. Biol. Chem.
1994
, vol. 
269
 (pg. 
32016
-
32021
)
16
Von Heijne
G.
Signal sequences are not uniformly hydrophobic
J. Mol. Biol.
1982
, vol. 
159
 (pg. 
537
-
541
)
17
Matoba
S.
Ogrydziak
D. M.
Another factor besides hydrophobicity can affect signal peptide interaction with signal recognition particle
J. Biol. Chem.
1998
, vol. 
273
 (pg. 
18841
-
18847
)
18
Nakai
K.
Kanehisa
M.
A knowledge base for predicting protein localization sites in eukaryotic cells
Genomics
1992
, vol. 
14
 (pg. 
897
-
911
)
19
Jonassen
I.
Efficient discovery of conserved patterns using a pattern graph
Comput. Appl. Biosci.
1997
, vol. 
13
 (pg. 
509
-
522
)
20
Puntervoll
P.
Linding
R.
Gemund
C.
Chabanis-Davidson
S.
Mattingsdal
M.
Cameron
S.
Martin
D. M.
Ausiello
G.
Brannetti
B.
Costantini
A.
, et al. 
ELM server: a new resource for investigating short functional sites in modular eukaryotic proteins
Nucleic Acids Res.
2003
, vol. 
31
 (pg. 
3625
-
3630
)
21
Bendtsen
J. D.
Nielsen
H.
von Heijne
G.
Brunak
S.
Improved prediction of signal peptides: SignalP 3.0
J. Mol. Biol.
2004
, vol. 
340
 (pg. 
783
-
795
)
22
Nielsen
H.
Brunak
S.
von Heijne
G.
Machine learning approaches for the prediction of signal peptides and other protein sorting signals
Protein Eng.
1999
, vol. 
12
 (pg. 
3
-
9
)
23
Hertz-Fowler
C.
Peacock
C. S.
Wood
V.
Aslett
M.
Kerhornou
A.
Mooney
P.
Tivey
A.
Berriman
M.
Hall
N.
Rutherford
K.
, et al. 
GeneDB: a resource for prokaryotic and eukaryotic organisms
Nucleic Acids Res.
2004
, vol. 
32
 (pg. 
D339
-
D343
)
24
Teilhet
M.
Rashid
M. B.
Hawk
A.
Al-Qahtani
A.
Mensa-Wilmot
K.
Effect of short 5′ UTRs on protein synthesis in two biological kingdoms
Gene
1998
, vol. 
222
 (pg. 
91
-
97
)
25
Patham
B.
Duffy
J.
Lane
A.
Davis
R. C.
Wipf
P.
Fewell
S. W.
Brodsky
J. L.
Mensa-Wilmot
K.
Post-translational import of protein into the endoplasmic reticulum of a trypanosome: an in vitro system for discovery of anti-trypanosomal chemical entities
Biochem. J.
2009
, vol. 
419
 (pg. 
507
-
517
)
26
McGeoch
D. J.
On the predictive recognition of signal peptide sequences
Virus Res.
1985
, vol. 
3
 (pg. 
271
-
286
)
27
Haeuptle
M. T.
Flint
N.
Gough
N. M.
Dobberstein
B.
A tripartite structure of the signals that determine protein insertion into the endoplasmic reticulum membrane
J. Cell Biol.
1989
, vol. 
108
 (pg. 
1227
-
1236
)
28
Jungnickel
B.
Rapoport
T. A.
A posttargeting signal sequence recognition event in the endoplasmic reticulum membrane
Cell
1995
, vol. 
82
 (pg. 
261
-
270
)
29
Plath
K.
Mothes
W.
Wilkinson
B. M.
Stirling
C. J.
Rapoport
T. A.
Signal sequence recognition in posttranslational protein transport across the yeast ER membrane
Cell
1998
, vol. 
94
 (pg. 
795
-
807
)
30
Pilon
M.
Romisch
K.
Quach
D.
Schekman
R.
Sec61p serves multiple roles in secretory precursor binding and translocation into the endoplasmic reticulum membrane
Mol. Biol. Cell
1998
, vol. 
9
 (pg. 
3455
-
3473
)
31
Van den Berg
B.
Clemons
W. M.
Jr
Collinson
I.
Modis
Y.
Hartmann
E.
Harrison
S. C.
Rapoport
T. A.
X-ray structure of a protein-conducting channel
Nature
2004
, vol. 
427
 (pg. 
36
-
44
)
32
Pines
O.
Lunn
C. A.
Inouye
M.
Defective Escherichia coli signal peptides function in yeast
Mol. Microbiol.
1988
, vol. 
2
 (pg. 
209
-
217
)
33
Bird
P.
Gething
M. J.
Sambrook
J.
Translocation in yeast and mammalian cells: not all signal sequences are functionally equivalent
J. Cell Biol.
1987
, vol. 
105
 (pg. 
2905
-
2914
)
34
Laforet
G. A.
Kaiser
E. T.
Kendall
D. A.
Signal peptide subsegments are not always functionally interchangeable: M13 procoat hydrophobic core fails to transport alkaline phosphatase in Escherichia coli
J. Biol. Chem.
1989
, vol. 
264
 (pg. 
14478
-
14485
)
35
Wiedmann
M.
Huth
A.
Rapoport
T. A.
Xenopus oocytes can secrete bacterial β-lactamase
Nature
1984
, vol. 
309
 (pg. 
637
-
639
)
36
Hatsuzawa
K.
Tagaya
M.
Mizushima
S.
The hydrophobic region of signal peptides is a determinant for SRP recognition and protein translocation across the ER membrane
J. Biochem. (Tokyo)
1997
, vol. 
121
 (pg. 
270
-
277
)
37
Bird
P.
Gething
M. J.
Sambrook
J.
The functional efficiency of a mammalian signal peptide is directly related to its hydrophobicity
J. Biol. Chem.
1990
, vol. 
265
 (pg. 
8420
-
8425
)
38
Hikita
C.
Mizushima
S.
Effects of total hydrophobicity and length of the hydrophobic domain of a signal peptide on in vitro translocation efficiency
J. Biol. Chem.
1992
, vol. 
267
 (pg. 
4882
-
4888
)
39
Driessen
A. J.
Manting
E. H.
van der Does
C.
The structural basis of protein targeting and translocation in bacteria
Nat. Struct. Biol.
2001
, vol. 
8
 (pg. 
492
-
498
)
40
Hegde
R. S.
Bernstein
H. D.
The surprising complexity of signal sequences
Trends Biochem. Sci.
2006
, vol. 
31
 (pg. 
563
-
571
)
41
Ryan
P.
Edwards
C. O.
Systematic introduction of proline in a eukaryotic signal sequence suggests asymmetry within the hydrophobic core
J. Biol. Chem.
1995
, vol. 
270
 (pg. 
27876
-
27879
)
42
Doud
S. K.
Chou
M. M.
Kendall
D. A.
Titration of protein transport activity by incremental changes in signal peptide hydrophobicity
Biochemistry
1993
, vol. 
32
 (pg. 
1251
-
1256
)
43
Chou
M. M.
Kendall
D. A.
Polymeric sequences reveal a functional interrelationship between hydrophobicity and length of signal peptides
J. Biol. Chem.
1990
, vol. 
265
 (pg. 
2873
-
2880
)
44
Ng
D. T.
Brown
J. D.
Walter
P.
Signal sequences specify the targeting route to the endoplasmic reticulum membrane
J. Cell Biol.
1996
, vol. 
134
 (pg. 
269
-
278
)
45
Von Heijne
G.
On the hydrophobic nature of signal sequences
Eur. J. Biochem.
1981
, vol. 
116
 (pg. 
419
-
422
)
46
Mothes
W.
Jungnickel
B.
Brunner
J.
Rapoport
T. A.
Signal sequence recognition in cotranslational translocation by protein components of the endoplasmic reticulum membrane
J. Cell Biol.
1998
, vol. 
142
 (pg. 
355
-
364
)
47
Johnson
A. E.
van Waes
M. A.
The translocon: a dynamic gateway at the ER membrane
Annu. Rev. Cell Dev. Biol.
1999
, vol. 
15
 (pg. 
799
-
842
)
48
Osborne
A. R.
Rapoport
T. A.
van den Berg
B.
Protein translocation by the Sec61/SecY channel
Annu. Rev. Cell Dev. Biol.
2005
, vol. 
21
 (pg. 
529
-
550
)
49
Heilker
R.
Spiess
M.
Crottet
P.
Recognition of sorting signals by clathrin adaptors
BioEssays
1999
, vol. 
21
 (pg. 
558
-
567
)
50
Traub
L. M.
Sorting it out: AP-2 and alternate clathrin adaptors in endocytic cargo selection
J. Cell Biol.
2003
, vol. 
163
 (pg. 
203
-
208
)
51
Abell
B. M.
Pool
M. R.
Schlenker
O.
Sinning
I.
High
S.
Signal recognition particle mediates post-translational targeting in eukaryotes
EMBO J.
2004
, vol. 
23
 (pg. 
2755
-
2764
)
52
Belin
D.
Bost
S.
Vassalli
J. D.
Strub
K.
A two-step recognition of signal sequences determines the translocation efficiency of proteins
EMBO J.
1996
, vol. 
15
 (pg. 
468
-
478
)
53
Laxma Reddy
G.
Nagaraj
R.
Circular dichroism studies on the signal sequence of E coli. alkaline phosphatase indicate the presence of both α-helix and β- structure in hydrophobic environments
FEBS Lett.
1986
, vol. 
202
 (pg. 
349
-
352
)
54
Meyer
T. H.
Menetret
J. F.
Breitling
R.
Miller
K. R.
Akey
C. W.
Rapoport
T. A.
The bacterial SecY/E translocation complex forms channel-like structures similar to those of the eukaryotic Sec61p complex
J. Mol. Biol.
1999
, vol. 
285
 (pg. 
1789
-
1800
)
55
Gould
S. B.
Sommer
M. S.
Hadfi
K.
Zauner
S.
Kroth
P. G.
Maier
U. G.
Protein targeting into the complex plastid of cryptophytes
J. Mol. Evol.
2006
, vol. 
62
 (pg. 
674
-
681
)
56
Karsten
V.
Hegde
R. S.
Sinai
A. P.
Yang
M.
Joiner
K. A.
Transmembrane domain modulates sorting of membrane proteins in Toxoplasma gondii
J. Biol. Chem.
2004
, vol. 
279
 (pg. 
26052
-
26057
)
57
Sinai
A. P.
Joiner
K. A.
The Toxoplasma gondii protein ROP2 mediates host organelle association with the parasitophorous vacuole membrane
J. Cell Biol.
2001
, vol. 
154
 (pg. 
95
-
108
)
58
Ragge
K.
Arnold
H. H.
Tummler
M.
Knapp
B.
Hundt
E.
Lingelbach
K.
In vitro biosynthesis and membrane translocation of the serine rich protein of Plasmodium falciparum
Mol. Biochem. Parasitol.
1990
, vol. 
42
 (pg. 
93
-
100
)
59
Howard
R. F.
Schmidt
C. M.
The secretary pathway of Plasmodium falciparum regulates transport of p82/RAP1 to the rhoptries
Mol. Biochem. Parasitol.
1995
, vol. 
74
 (pg. 
43
-
54
)
60
Gunther
K.
Tummler
M.
Arnold
H. H.
Ridley
R.
Goman
M.
Scaife
J. G.
Lingelbach
K.
An exported protein of Plasmodium falciparum is synthesized as an integral membrane protein
Mol. Biochem. Parasitol.
1991
, vol. 
46
 (pg. 
149
-
157
)
61
Feder
R.
Blobel
G.
In vitro biosynthesis and core glycosylation of the histidine-rich protein of Plasmodium lophurae
Mol. Biochem. Parasitol.
1983
, vol. 
9
 (pg. 
351
-
362
)
62
Seiler
M.
Mehrle
A.
Poustka
A.
Wiemann
S.
The 3of5 web application for complex and comprehensive pattern matching in protein sequences
BMC Bioinform.
2006
, vol. 
7
 pg. 
144
 

Author notes

1

These authors contributed equally to this work.

2

Present address: Department of Medical Education, Paul L. Foster School of Medicine, Texas Tech University Health Sciences Center, 5001 El Passo Drive, El Passo, TX 79905, U.S.A.

Supplementary data