Common and unique features of glycosylation and glycosyltransferases in African trypanosomes

Eukaryotic protein glycosylation is mediated by glycosyl- and oligosaccharyl-transferases. Here, we describe how African trypanosomes exhibit both evolutionary conservation and significant divergence compared with other eukaryotes in how they synthesise their glycoproteins. The kinetoplastid parasites have conserved components of the dolichol-cycle and oligosaccharyltransferases (OSTs) of protein N-glycosylation, and of glycosylphosphatidylinositol (GPI) anchor biosynthesis and transfer to protein. However, some components are missing, and they process and decorate their N-glycans and GPI anchors in unique ways. To do so, they appear to have evolved a distinct and functionally flexible glycosyltransferases (GT) family, the GT67 family, from an ancestral eukaryotic β3GT gene. The expansion and/or loss of GT67 genes appears to be dependent on parasite biology. Some appear to correlate with the obligate passage of parasites through an insect vector, suggesting they were acquired through GT67 gene expansion to assist insect vector (tsetse fly) colonisation. Others appear to have been lost in species that subsequently adopted contaminative transmission. We also highlight the recent discovery of a novel and essential GT11 family of kinetoplastid parasite fucosyltransferases that are uniquely localised to the mitochondria of Trypanosoma brucei and Leishmania major. The origins of these kinetoplastid FUT1 genes, and additional putative mitochondrial GT genes, are discussed.


Introduction
The protozoan parasite Trypanosoma brucei cause African animal trypanosomiasis (or nagana) in cattle and human African trypanosomiasis (HAT) in humans. These diseases are generally fatal if not treated and the available therapeutics, while improving for HAT, are far from optimal. Currently, with tsetse fly control and test-and-treat surveillance, reported cases of HAT in sub-Saharan Africa are, thankfully, low. However, the disease burden in cattle significantly affects economic output and agricultural productivity [1].
Many African trypanosome species, for example, T. brucei spp., T. congolense and T. suis, have complex life cycles that involve obligate differentiation events between proliferative (colonising) and non-proliferating (transmissible) stages to occupy and pass between their mammalian hosts and tsetse fly vectors. Others have lost the ability to infect tsetse vectors and transmit by either venereal (T. equiperdum) or contaminative routes via haematophagous fly bites (T. evansi); T. vivax, which has a limited lifecycle in tsetse, is also transmitted primarily by venereal and hematophagous fly bite routes.
The bloodstream form (BSF) trypomastigote and procyclic form (PCF) promastigote lifecycle stages that occupy mammalian host and insect vector niches, respectively, exhibit dramatic changes in cellular metabolism, morphology and cell surface molecular architecture. The latter is illustrated for T. brucei in (Figure 1).
Eukaryotic protein N-glycosylation involves the transfer of an oligosaccharide from a lipid-linked oligosaccharide (LLO), made by the dolichol-cycle, to NXS/T acceptor sequons in proteins sequestered into the endoplasmic reticulum (ER). This transfer is mediated by an oligosaccharyltransferase (OST) [2,3]. GPI membrane Figure 1. Summary of the glycan and GT repertoires of T. brucei.The tsetse midgut-dwelling procyclic form (PCF) cells express GPI-anchored and N-glycosylated procyclin glycoproteins with simple Man 5 GlcNAc 2 oligomannose N-glycans and GPI anchors with extensively modified GPI-anchor glycans. The mammalian host-dwelling bloodstream form (BSF) cells express variant surface glycoproteins (VSGs) that can contain oligomannose (triantennary Man 9 GlcNAc 2 to Man 5 GlcNAc 2 ), paucimannose (biantennary Man 5 GlcNAc 2 to Man 3 GlcNAc 2 ) and small complex N-glycans. Some VSGs are also O-glycosylated, as indicated. In addition, other flagellar pocket and endosomal/lysosomal glycoproteins, such as p67, bear giant poly-LacNAc-containing N-glycans in the BSF lifecycle stage. In contrast, the BSF GPI-anchor sidechains are smaller than those of PCF cells, containing up to 6 Gal residues.
anchors are also pre-assembled in the ER and transferred en bloc via GPI-transamidase in the lumen of the ER to a subset of proteins bearing a C-terminal peptide extension [4]. The core N-glycan and GPI structures are then processed in the ER and in the Golgi apparatus to mature structures. Most of the genes encoding GTs involved in the dolichol-cycle (ALG genes) and GPI precursor biosynthesis (GPI genes), which belong to the GT-C fold class [5], have been found in the T. brucei genome by predicted amino acid sequence homology [6]. In a few cases, the functionalities of ALG and GPI GT genes have been experimentally confirmed [7][8][9][10]. However, the dolichol-cycle glucosyltransferases ALG6, ALG8 and ALG10 are notably missing, as are homologues of GTs that elaborate N-glycans and GPI-side chains in other organisms. This is despite many of the processed parasite N-glycan structures being identical with those of higher eukaryotes. Thus, for example, one cannot easily find orthologues of the genes GnTI and GnTII (which add GlcNAc to the 3-and 6-branches of the conserved Man 3 GlcNAc 2 core and thus initiate the formation of complex N-glycans) or of B4GalT that makes Galβ1-4GlcNAc (LacNAc) structures.
Such processing GTs are generally type-2 membrane proteins localised to the Golgi apparatus and are defined by their nucleotide sugar donor, the structure of their aglycone acceptor, the anomericity of the transferred sugar (α or β) and the inter-sugar glycosidic linkage (e.g. 1-2, 1-3, 1-4 or 1-6). For example, a UDP-Gal : βGlcNAc β1-4 Gal-transferase makes Galβ1-4GlcNAcβ1-O-R from a UDP-Gal donor and a GlcNAcβ1-O-R acceptor. The genes encoding GTs often exhibit significant expansion through evolutionary pressure to catalyse a repertoire of related glycosidic linkages [11]. Most GTs are classified based on their 3D fold topology, where GT-As have two closely adjoining β/α/β Rossmann domains whilst GT-Bs consist of two facing β/α/β Rossmann domains that are linked flexibly [11]. Overall fold classification is not a predictor of the catalytic mechanism, as both inverting (β) and retaining (α) GTs have been characterised with GT-A and GT-B topologies. Phylogenetic analyses of GT-A fold GTs indicate that inverting and retaining mechanisms emerged independently many times during evolution [12]. The GTs are further categorised into distinct GT families in the carbohydrate enzyme (CAZy) database, based on protein sequence and structural similarities [13]. Experimental data on GT family members with respect to their inverting or retaining transfer mechanisms and nucleotide sugar specificities allows conservative predictions on the specificities of unstudied fellow GT family members. The recent application of GT sequence deep mining and machine learning approaches are showing promise in predicting the mechanistic function of GTs based on alterations to their common core [14]. Nevertheless, mechanistic predictions from primary protein sequence remain tentative.
The CAZy database contains over a hundred sequence-based GT families and many or all families are encoded by most organisms. However, this is not the case in T. brucei where several common GT families are missing and where a particular GT family (GT67), unique to kinetoplastids, has emerged and expanded. There are twenty of these genes in the T. brucei genome and, so far, the functions of five of them have been studied [15][16][17][18][19][20].
In this review we summarise our current knowledge of protein glycosylation in T. brucei, discuss the kinetoplastid-specific GT67 family, and highlight recent discoveries on a novel kintepolastid-specific mitochondrial fucosyltransferase (FUT) and other putative mitochondrial GTs.

Glycan structures in BSF T. brucei
In the mammalian host, the proliferative BSFs reside within the bloodstream, lymphatics and sub-cutaneous and adipose tissue niches [21,22]. BSF trypanosomes survive by expressing around 5 million variant surface glycoprotein (VSG) homodimers tethered to the cell surface via GPI anchors [23,24]. The VSGs produce a dense, yet mobile [23], proteinaceous coat that protects the plasma membrane from components of the innate immune response, such as complement, whilst enabling the diffusion of small nutrient molecules for uptake into the cell via transmembrane transporters [23,25,26]. These VSG molecules are immunogenic and the parasite survives the adaptive immune response by antigenic variation, whereby parasites express alternative VSGs from a large repertoire of genes [27]. VSGs are classified on amino acid sequence motifs, and these sub-types generally share glycosylation features, such the attachment of one, two or three N-linked oligomannose, paucimannose or complex N-glycans and GPI-anchor sidechains of between zero to five α-linked Gal residues and zero or one β-linked Gal residue ( Figure 1) [25,26,28]. The N-glycosylation of VSG is an important modification which insulates the protein core from intermolecular interactions with adjacent surface proteins, enabling dense packing to occur at a level approaching the molecular crowding threshold [29]. Additionally, O-glycosylation of certain VSGs has been identified as a further mechanism by which African trypanosomes generate additional antigenic variation. Here, a serine residue at the top of the VSG molecule bears an αGlc residue that can be further modified by 1 or 2 hexose residues to generate heterogeneity that delays the onset of a sterilising host immune response [30].
There are several other less-abundant glycoproteins expressed by BSF T. brucei, but only the flagellar pocket VSG-like transferrin receptor (TfR) and the lysosomal/endosomal p67 hydrolase have been analysed in any detail for carbohydrate structure. The TfR contains a VSG-type GPI anchor [31], albeit on only one of its two subunits, and both TfR and BSF p67 contain oligomannose, paucimannose [32] and poly-N-acetyllactosamine, i.e. poly-Galβ1-4GlcNAc ( poly-LacNAc), containing complex N-glycans [33]. The latter include the largest neutral N-glycan structures yet described in eukaryotes [34]. Thus, in contrast with their relatively short GPI sidechain glycans, BSF T. brucei can express extremely large complex N-glycans. The poly-LacNAc N-glycans have been suggested to play a role in endocytosis [35], but their exact function is unknown.

Glycan structures in PCF T. brucei
The PCF cell surface contains a partially characterised high-molecular mass glycoconjugate [36], abundant GPI-anchored glycoproteins called procyclins [37] and free GPI glycolipids [38,39]. The procyclins are composed of rod-like polyanionic dipeptide (EP) or pentapeptide (GPEET) repeats with or without a single triantennary Man 5 GlcNAc 2 N-linked oligomannose glycan [40] and without or with threonine phosphorylation [41], respectively. Both types of procyclin share the largest and most complex GPI-anchor sidechains characterised to date. These glycans are composed of branched poly-LacNAc and poly-lacto-N-biose (LNB; Galβ1-3GlcNAc) containing structures terminating in βGal [40,42] that can be further modified by α2-3-linked sialic acid residues by the action of cell surface GPI-anchored trans-sialidase [43]. Surface sialylation with host blood meal-derived sialic acids plays a role in efficient tsetse fly colonisation [44] whilst the rod-like procyclins are thought to shield susceptible surface proteins from proteolytic attack in the tsetse midgut [45]. Therefore, in contrast with the densely packed, proteinaceous VSG coat of BSF cells, the PCF cells express a surface glycocalyx composed of elaborate GPI sidechain glycans overlayed by polyanionic peptidic rods and interlaced with high-molecular mass glycoconjugates. Significantly, while wild-type PCF parasites express extremely complex GPI sidechains, they only express simple oligomannose N-glycans [28,40] ( Figure 1).
Of note, the tsetse midgut PCF of T. congolense expresses a different family of glycoproteins, called glutamic acid and alanine-rich glycoproteins (GARPs), in place of procyclins. The GARPs are also GPI-anchored molecules, but with small GPI sidechains, no N-linked glycans and Gal and Man containing glycans linked through phosphate to Thr residues [46].

Glycan structures in other lifecycle stages
In T. brucei, the other lifecycle stages are less accessible than BSF and PCF but T. brucei epimastigote forms are known to express a GPI-anchored alanine-rich protein called BARP [47,48] and metacyclic trypomastigote forms express a related GPI-anchored metacyclic invariant surface protein (MISP) [49]. However, as for most BSF and PCF glycoproteins, there are no structural data on their GPI sidechains and/or N-linked glycans.
Conserved and divergent aspects of protein N-glycosylation and protein quality control in T. brucei Eukaryotic OSTs are generally hetero-oligomeric complexes where the catalytic-subunit (STT3) is associated with seven or eight additional subunits. These OSTs generally transfer Glc 3 Man 9 GlcNAc 2 from the mature LLO of the dolichol-cycle to NXS/T sequons in the lumen of the ER [2,3,50]. The transferred glycans then undergo processing by ER glucosidase I and II and ER α-mannosidase I to generate oligomannose structures. When these arrive in the Golgi apparatus, they can be further processed to complex and hybrid structures through the action of Golgi α-mannosidase II and a variety of GTs [3,51].
In T. brucei this canonical pattern of protein N-glycosylation is modified, as first noted by Bangs, Englund and colleagues who observed that some N-glycosylation sites in T. brucei VSGs are occupied by endoglycosidase-H resistant N-glycans immediately after VSG synthesis in the ER [52]. The anomalies of protein N-glycosylation in T. brucei are: Firstly, its OST activity is provided by STT3 gene products alone [53].
Secondly, the largest LLO made by the parasite is Man 9 GlcNAc 2 [8,54]. Thirdly, its three STT3 genes encode OSTs with significantly different donor and acceptor specificities [8,10,53,55,56]. The consequence of this is a radically different N-glycosylation system, whereby N-glycosylation efficiency is very high and the type of mature glycans on specific N-glycosylation sites is primarily controlled by the net charge around the glycosylation site. Thus, TbSTT3A prefers sequons in an acidic environment and specifically transfers biantennary Man 5 GlcNAc 2 from the LLO Man 5 GlcNAc 2 -PP-dolichol, and TbSTT3B acts on all remaining sequons and specifically transfers triantennary Man 9 GlcNAc 2 from the LLO Man 9 GlcNAc 2 -PP-dolichol. Since the organism does not contain a Golgi α-mannosidase II activity, the consequence is that only TbSTT3A acidic sequon sites are destined to be processed to paucimannose and/or complex N-glycans whereas TbSTT3B sequon sites are destined to contain only oligomannose structures. Interestingly, neither TbSTT3A nor TbSTT3B appears to be essential for BSF parasites in vitro, but both are essential in vivo [56]. The role of TbSTT3C is unclear as it has not been detected at the protein level in BSF or PCF cells. However, its effects on protein glycosylation when transferred to yeast suggest it has a hybrid specificity, preferring acidic sequons like TbSTT3A but transferring Man 9 GlcNAc 2 like TbSTT3B [57].
The preponderance of oligomannose N-glycans in PCF glycoproteins, versus both oligomannose and paucimannose/complex N-glycans in BSF glycoproteins, is easily understood when the expression of TbSTT3A and TbSTT3B are compared at the protein level ( Figure 3). Thus, TbSTT3B is highly expressed in PCF and BSF, whereas TbSTT3A is only highly expressed in BSF cells.
Since the T. brucei LLOs do not contain glucose, the parasites do not have an ER glucosidase I. However, they do have UDP-Glc: glycoprotein glucosyltransferase (UGGT), ER glucosidase II and calreticulin so that newly synthesised glycoproteins in the ER can undergo quality control cycles of N-glycan α-glucosylation (via UGGT), attempts at protein folding via calreticulin and its associated oxidoreductases, and de-glucosylation (via ER glucosidase II). The T. brucei UGGT shares a typical GT24 family domain at its C-terminus [58] but the activity of TbUGGT has diverged from a strict specificity for Man 9 GlcNac 2 to a broad specificity for any N-glycan structure containing an intact A-branch [59]. In common with other organisms, TbUGGT plays a role in protection from heat shock, such that BSF TbUGGT null mutants cannot tolerate a 37-40°C temperature shift [59]. However, unlike other organism such as Schizosaccharomyces pombe, the TbUGGT null mutant does not up-regulate chaperones such as Grp78 or BiP upon heat shock or following tunicamycin treatment. It appears, therefore, that the accumulation of unfolded proteins upon ER stress is not sensed in BSF T. brucei, a conclusion also reached by Tiengwe et al. [60]. This 'chaperones always on' condition is likely a consequence of the extremely high glycoprotein flux required to export VSG in BSF T. brucei to form a dense surface coat. Consistent with this proposal of a T. brucei-specific adaptation, the T. cruzi UGGT null mutant does up-regulate ER chaperones Grp78 and BiP [61].
An interesting phenomenon is that both BSF and PCF T. brucei show plasticity in protein N-glycosylation when challenged with toxic lectins or other carbohydrate-binding agents. For example, PCF cells express small hybrid N-glycans in place of oligomannose glycans when challenged with Concanavalin-A [54] and BSF cells alter the expression of their TbSTT3 genes and create TbSTT3 chimeric genes when challenged with lectins and other agents [62][63][64].

Divergent and convergent evolution of GT67 family GTs
As mentioned previously, despite conserved glycan structural motifs between T. brucei and other eukaryotes one cannot find orthologues of the genes GnTI and GnTII (which add GlcNAc to the 3-and 6-branches of the conserved Man 3 GlcNAc 2 core and thus initiate the formation of complex N-glycans), or of B3GnTI that makes GlcNacβ1-3Gal structures, or β3GalTI that makes Galβ1-3GlcNAc structures, or GCNT2 that makes GlcNacβ1-6Gal structures. Each of these belong to distinct CAZy GT families. Instead, the genes encoding these five activities belong to the kinetoplastid-unique GT67 family [13]. This GT-A fold family has three motifs that are very similar to the (I/L)RXXWG, (F/Y)(V/L/M)XXX-DXD, (ED)D(A/V)(Y/F)XGX(C/S) motifs conserved among members of the mammalian β3GT superfamily [16]. The comparable motifs in the T. brucei genes are WG, Y(I,V,F)XKXDDD, and ED(A/V/I/L/M)(M/L)X(G/A). The GT67 family, therefore, appears to have diverged from the normal eukaryotic β3GT lineages and then expanded and evolved such that GT67 GTs can take the place of GT13 [17], GT14 [16], GT16 [18], GT31 [19] and GT49 [15] (and probably more) GT family members.
Phylogenetic analysis of the GT67 gene family reveals that it separates into two distinct clades, one for the Leishmania (not discussed here but reviewed in [65]) and one for the trypanosomatids [66], indicative of the disparate evolutionary pressures exerted on these parasite groups.
Fifteen of the twenty T. brucei GT67 gene products remain uncharacterised, and specificity predictions based on sequence analyses alone are of limited value. In contrast, the elegant interspecies comparisons of GT67 family members amongst trypanosomatids performed by Pereira and Jackson, referred to by them as UDP-dependent GTs (UGTs) [66], provides some useful clues to TbGT function. Thus, we might assume that GT genes in T. brucei that are shared with T. vivax are more likely to encode those required for the synthesis of BSF structures, whereas those not shared with T. vivax (which does not have an obligate transmission through the tsetse fly) are more likely to encode those required for the synthesis of PCF structures.
We should note that while these inferred likelihoods of GT requirements in BSF and PCF lifecycle stages are useful guides, they do not preclude predominantly BSF-or PCF-expressed GT activities appearing in the other lifecycle stage. For example, under selective pressure from the lectin Concanavalin-A [54] or mutagenesis of the TbALG12 gene [7], PCF cells stop making oligomannose N-glycans and the action TbGnTI or TbGnTII on the paucimannose structures that replace them can be detected. Conversely, similar GPI-anchor sidechains to those found in abundance on PCF procyclins can be found on certain substrates, like the ESAG6 subunit of the transferrin receptor, in BSF cells when steric constraints around the GPI anchor are relaxed [33].
The phylogenetic analysis of Pereira and Jackson revealed seven distinct trypanosomatid GT gene lineages within the GT67 family. Here, we have performed similar analysis and included T. evansi [67], which does not differentiate to the PCF stage due to the absence of a mitochondrial genome, and T. gambiense, the human pathogenic strain closely related to T. b. brucei (Figure 2). We overlay and interpret this phylogenetic analysis with the available data on T. brucei GT67 functions [15][16][17][18][19] and on TbGT, TbOST and glycan-processing enzyme protein expression data from quantitative proteomics [68]. The latter expressed graphically in (Figure 3) using the tools in described in [69].

Lineage 1
There is a single gene of this lineage in T. vivax which seems to have undergone expansion in T. b. brucei to generate the TbGT4 (three copies) and TbGT16 (two copies) gene sub-families. These sub-families are also present in T. b. gambiense and T. evansi. The greater expression at the protein level of this lineage in BSF compared with PCF cells (Figure 3) suggests they may play a BSF-specific function. Given the notable absence of UDP-Gal : βGlcNAc β1-4 Gal-transferases needed to synthesise the abundant N-linked poly-LacNAc structures in BSF T. brucei [34], we postulate that one or more of these lineage 1 genes may encode GTs with GT7 β4GalTI-like activity.

Lineage 2
This is a T. vivax-specific gene family and structural data on T. vivax glycans is scant [70], making inferences about the activity of the GTs they encode difficult. Furthermore, sequence analysis indicates they may be fragment rather than functional sequences. We postulate that these gene products, if functional, may encode activities similar to their closest relatives in lineage 3.

Lineages 3 and 4
Lineage 3 contains the gene encoding TbGT11, which has been experimentally shown to be in the Golgi and to perform the same function as GT13 family GnTI in other eukaryotes [17]. In other words, it adds βGlcNAc in 1-2 linkage to the 3-arm of the conserved Man 3 GlcNAc 2 core. However, TbGnTI does so with unique acceptor specificity: Whereas canonical GT13 GnTIs work on Man 5 GlcNAc 2 , which is then processed to GlcNAc 1 Man 3 GlcNAc 2 by Golgi α-mannosidase II (the latter is absent from T. brucei), GT67 TbGnTI works directly on Man 3 GlcNAc 2 .
Lineage 4 contains the gene encoding TbGT15, a Golgi enzyme that performs the same function as GT16 family GnTII, adding βGlcNAc in 1-2 linkage to the 6-arm of the conserved Man 3 GlcNAc 2 core [16]. However, whereas canonical GT16 GnTIIs work on GlcNAc 1 Man 3 GlcNAc 2 , GT67 TbGnTII can operate with nothing or anything substituting the 3-arm αMan residue. This also means that TbGnTI and TbGnTII can lineages within the GT67 family are according to [66] and the T. brucei GT sub-families within those lineages (e.g. TbGT1 to TbGT15) are according to [15,18]. Those TbGTs that appear in proteomics data are shown in Figure 3. operate independently of each other, unlike the strict GnTI followed by Golgi α-mannosidase II followed by GnTII sequence in other eukaryotes [71].
In mice, homozygous null mutants of GnTI (Mgat1 −/− ) or GnTII (Mgat2 −/− ) do not survive beyond embryonic day 10 or postnatal week 1, respectively [72,73]. In contrast, T. brucei BSF TbGnTI and TbGnTII null mutants both survive in culture and in mice. However, the absence of either leads to compensation by extension and elaboration of the opposing arm. Thus, deletion of TbGnTI increases the decoration of the 6-arm [17] and deletion of TbGnTII increases the decoration of the 3-arm [16]. Attempts to inhibit the formation of complex N-glycans entirely in BSF T. brucei by generating a double null mutant for TbGnTI and TbGnTII have been unsuccessful (unpublished data), suggesting that complex N-glycans per se are likely to be essential in BSF T. brucei. Consistent with this, we were also unable to make a BSF TbSTT3A null mutant [74], even though we could knock down TbSTT3A substantially by RNAi in a heterozygote [56]. These data also suggest that some expression of TbSTT3A, and therefore at least some capacity to make complex and/or paucimannose N-glycans, is essential.
Finally, lineage the 3 and 4 TbGnTI and TbGnTII enzymes are expressed predominantly in BSF cells, consistent with their requirement for TbSTT3A co-expression to make complex N-glycans (Figure 3).
There is interplay between TbGT8 and TbGT10 in that elimination of either not only reduces -4GlcNAcβ1-6(-4GlcNAcβ1-3)Galβ1-branch points but also elicits compensatory linear glycosylation by the other [17], reminiscent of the compensatory redundancy between TbGnTI and TbGnTII described above.
Analysis of the BSF TbGT10 null mutant [18] showed that impairment of Galβ1-4GlcNAcβ1-6Galβ1-4GlcNAc poly-LacNAc synthesis also perturbs the proteolytic processing of the essential [75] lysosomal/endosomal glycoprotein p67. While this had a minor growth phenotype in culture, these parasites were still infectious to mice. Thus, there does not appear to be any crucial role for Galβ1-4GlcNAcβ1-6Galβ1-4GlcNAc poly-LacNAc synthesis, or indeed for wild-type p67 processing, in BSF T. brucei.

Lineage 7
Lineage 7 has undergone significant expansion in the genomes of T. brucei, T. b. gambiense and T. evansi but it is absent from T. vivax. A single lineage 7 member (TbGT3) has been analysed to date and shown to be expressed in both BSF and PCF cells (Figure 3). In BSF cells it plays some (undefined) role in glycoprotein processing, as judged by wheat germ agglutinin (WGA) lectin blotting [19]. However, it is well characterised in PCF T. brucei as a UDP-Gal : βGlcNAc-GPI β1-3 Gal-transferase elaborating GPI-anchor sidechains [19]. TbGT3 is, therefore, functionally related to GT31 family B3GALT1 that also makes LNB (Galβ1-3GlcNAc) structures.
The absence of lineage 7 in T. vivax, the defined function of TbGT3 in PCF cells and the higher protein expression of the lineage 7 TbGT2 sub-family (encoded by seven closely related genes) in PCF cells (Figure 3), suggest that at least some lineage 7 GTs may be primarily involved in the decoration of the PCF stage GPI-anchor sidechains. We further postulate that lineage 7 PCF expression-specific TbGT2 sub-family genes may encode the elusive UDP-Gal : βGlcNAc-GPI β1-4 Gal-transferases required for the -6Galβ1-4GlcNAcβ1-poly-LacNAc chains found in PCF GPI-anchor sidechains. cell proteomes [68] using the tool described in [69].
The presence of lineage 7 genes in T. evansi, which does not have a PCF stage, runs counter to this argument but may be a function of the phylogenetic proximity of T. evansi to T. brucei, such that lineage 7 GTs may yet to be lost from the T. evensi genome, and/or it may reflect some need for lineage 7 GTs in T. evansi (but not T. vivax), BSF cells.
The lineage 7 TbGT1 sub-family has been detected by proteomics in both BSF and PCF cells [68], but the absolute expression levels are very low (Figure 3), and their specificities and functions are unknown.
Taken together, these data suggest the evolution of a large trypanosome GT67 gene family to generate the UDP-GlcNAc/UDP-Gal glycosyltransferase repertoire necessary for the biosynthesis of substantial parts of their uniquely complex N-linked and GPI-anchor side chain glycans. This serves as is a prime example of convergent evolution, whereby the GT67 enzymes exhibit functional relatedness with several families of metazoan GTs yet derive from an ancestrally distinct β3GT. Our assessments of lineages 3, 4, 5 and 6 indicate that they encode BSF-specific trypanosomatid GTs necessary for complex N-linked glycan synthesis in BSF cells. Furthermore, we propose that life cycle stage-dependent β1-4 Gal-transferases are encoded by lineages 1 (in BSF cells) and 7 (in PCF cells). Reverse genetics experiments are required to confirm this hypothesis.

GT11 family FUT1
The discovery that the biosynthesis of the nucleotide sugar GDP-fucose was required for parasite growth in T. brucei [76] and in Leishmania major [77], suggested that these organisms contain one or more essential FUT genes. A single GT11 family FUT, TbFUT1, was identified in the T. brucei genome and shown to be essential in BSF and PCF cells [78]. Similarly, the L. major orthologue (LmjFUT1) is essential for cell growth [79]. The most curious feature of TbFUT1 and LmjFUT1 is their localisation to the parasite mitochondria [78,79]. Recombinant TbFUT1 has GDP-Fuc : βGal α1-2-fucosyltransferase activity, common amongst GT11 family enzymes, with an apparent preference for acceptor substrates containing a terminal LNB (Galβ1-3GlcNAc) motif [78]. The origin of FUT1 is unlike that of the GT67 family which expanded from a common eukaryotic β3GT ancestor. Instead, phylogenetic analysis of FUT1 genes in kinetoplastids indicates it was inherited from a bacterial FUT1 by horizontal gene transfer via a nucleocytoplasmic large DNA virus [80]. The fact that FUT1 is both mitochondrial and essential in both organisms studied to date, and displays highly conserved sequence similarity and genomic synteny across the kinetoplastea, suggests that the αFucT activity it encodes may play a common and crucial role in mitochondrial function in this group of organisms.
Exactly what role(s) mitochondrial fucosylation might play in kinetoplastids remains to be determined. Whereas recombinant TbFUT1 and LmjFUT1 have been shown to fucosylate exogenous peptide and/or glycan substrates, their native mitochondrial substrates have not yet been identified. Phenotypic analysis of a TbFUT1 conditional null mutant in BSF cells reveals that depleting TbFUT1 causes a loss of mitochondrial membrane potential, linked to the activity of the F o F 1 -ATP synthase [78]. BSF T. brucei lack the proton pumping respiratory chain complexes normally expressed in mitochondria; instead the F o F 1 -ATP synthase works in reverse mode such that catalytic rotation of the F 1 moiety via ATP hydrolysis actively pumps protons out of the mitochondrial matrix via the membrane multimeric F o c-ring pore. The effect of TbFUT1 depletion in insect stage PCF appears to be different. Here, cell death upon TbFUT1 conditional depletion takes longer to manifest than in BSF cells [78]. However, the PCF form of T. brucei uses F o F 1 -ATP more conventionally, generating ATP from the mitochondrial proton gradient, in conjunction with fully expressed and functional respiratory chain composed of complexes I-IV [81]. Thus, the differences in the kinetics of cell death upon TbFUT1 depletion in the two lifecycle stages may be dependent on their different mitochondrial bioenergetics and/or protein expression requirements. Furthermore, an LmFUT1-null segregant mutant (Δfut1 s ) exhibits impaired mitochondrial function, including a reduced mitochondrial membrane potential, abnormalities in mitochondrial structure and impaired biosynthesis of the kinetoplast DNA network [79]. Together, these findings implicate a general requirement of FUT1 for normal mitochondrial function in kinetoplastids. Work is ongoing to tease apart the functions of TbFUT1 in BSF and PCF T. brucei and LmFUT1 in L. major.

Putative mitochondrial GT25 family enzymes
Since LNB is a substrate for recombinant TbFUT1 [78], we postulate that there may be other GTs that assemble Galβ1-3GlcNAc on mitochondrial acceptor molecules ( protein, lipid or other) on which TbFUT1 can act. These putative preceding GTs could be cytosolic, such that Galβ1-3GlcNAc primed molecules could be imported into the mitochondrion for fucosylation, or they could be mitochondrial themselves. Significantly, a GT25 family enzyme (TbGTX) is predicted to be mitochondrial by mitochondrial import analysis [82] and both TbGTX and another GT25 family member (TbGTZ) have been localised to the mitochondrion in T. brucei by C-terminal GFP tagging and fluorescence microscopy [83]. Our own C-terminal epitope tagging studies also localise TbGTX and TbGTZ to the mitochondrion and RNAi studies in BSF cells show that, like TbFUT1, they are both essential (unpublished data). We also find proteomic evidence for the expression of TbGTX in both BSF and PCF cells (Figure 3). Thus, we hypothesise that TbGTX, TbGTZ and TbFUT1 may be involved in the biosynthesis of Fucα1-2Galβ1-3GlcNAc trisaccharide motifs attached to mitochondrial molecules in T. brucei. Further studies are required to test this.
Phylogenetic analysis indicates that syntenic orthologues of TbGTX and TbGTZ occur in most kintoplastids, with the exception that GTZ is absent from the ancestral, free-living and bacteriophagic kinetoplastid Bodo saltans [84]. This might suggest that the GTX sequence has undergone duplication and evolution in the parasitic kinetoplastids. Given the similarity of TbGTX and TbGTZ to bacterial GT25 family members, we suggest that they may have been inherited in a similar manner to TbFUT1, yielding enzymes with conserved function but unique localisation.

Base J glucosyltransferase
Kinetoplastids, uniquely, contain small amounts of β-D-glucopyranosyloxymethyluracil (base J) in their DNA. The synthesis of base J involves the formation of 5-hydroxymethyluracil (hmU) and its subsequent glucosylation. The nuclear-localised, base J-specific GT ( JGT) [85] has a GT-A fold; however, it is not currently assigned to any CAZy family. Base J does not appear to be essential for trypanosome survival [85], but it appears to play a regulatory role in Pol II-mediated polycistronic gene regulation [86].

UDP-sugar: polypeptide GTs
In T. cruzi, a small gene family (TcOGNT1, TcOGNT2 and TcOGNTL), belonging to the CAZy GT60 family, has been described [87,88]. TcOGNT1 has been shown to be a Golgi located UDP-GlcNAc : polypepetide αGlcNAc-transferase, making GlcNAcα1-O-Thr linkages in the abundant surface GPI-anchored mucin-like molecules of this parasite [87,88]. In these mucin-like glycoproteins, the O-linked GlcNAc residue can be variously substituted with βGal and β-galactofuranose (βGalf ) residues, where terminal βGal residues can be capped with α2-3 linked sialic acid via trans-sialidase activity [89]. However, no such mucin-like molecules or similar O-linked glycans have been found in T. brucei, begging the question as to what the GT60 TbOGNT1 gene product, which is lowly expressed in BSF and PCF cells (Figure 3), might be doing in this organism. One possibility worth exploring is whether it might be involved in the formation of the novel Glcα1-O-Ser linkage observed in several T. brucei VSGs [30], for which no GT gene has been assigned thus far. Similarly, the TbGT (s) responsible for adding up to two more hexoses to Glcα1-O-Ser remain to be identified.

α-Galactosyltransferases
Since GTs within a given CAZy GT family generally encode either inverting (β) or retaining (α) GTs, and since all GT67 family members thus far have proven to be β-glycosyltransferases, we most likely need to look outside of the GT67 group for BSF-specific αGal-transferases. Such αGal-transferases must exist to cap small complex N-glycans with Galα1-3Gal motifs, as found in some VSGs [90], and for the αGal-transferases that decorate the BSF GPI anchors [24,91] (Figure 1). Searches for the former using CAZy GT6, GT8 and GT77 family sequences fail to return convincing hits. There are no precedents for αGal-transferases that decorate GPI anchors, nor do the trypanosome genomes contain anything like PGAP4 that encodes a mammalian Golgi UDP-GalNAc : GPI β1-4GalNAc-transferase [92]. Although a putative UDP-Gal : GPI αGaT activity was previously partially purified from T. brucei whole cell lysates [93], the protein and gene sequences were not identified. Thus, all of the trypanosome αGalT genes remain to be identified and may constitute new GT families.

Summary
In this review, we have discussed • The inversion of glycosylation complexity between the BSF cells (simple GPI sidechains and complex N-glycans) and PCF cells (complex GPI sidechains and simple N-glycans) (Figure 1). • How the selection of complex and/or simple N-glycans is made through the expression TbSTT3A and/or TbSTT3B OSTs, and how ER quality control has been adapted in T. brucei to cope with its prodigious flux of surface VSG molecules. • The acquisition of an ancestral eukaryotic β3GT gene and its expansion to create the kinetoplastid-specific GT67 family that has acquired numerous UDP-Gal/GlcNAc βGal/ GlcNAc-transferase functions, providing a clear example of GT convergent evolution. • The speculation that certain GT67 sub-families may encode the elusive UDP-Gal : βGlcNAc β1-4 Gal-transferases required for poly-LacNAc synthesis in BSF and PCF T. brucei. • The identification of essential mitochondrial GTs of bacterial origin in the kinetoplastids, with some speculation as to their function. • The existence of a glucosyltransferase to make the kinetoplastid-specific modified DNA nucleotide base J. • The T. brucei GT67, GT25 and GT60 family TbGTs still looking for a function, and glycosidic linkages in T. brucei still looking for the TbGTs that catalyse them (including several αGal-transferases).
These elements indicate the progress that has been made in understanding protein glycosylation and glycosylation machinery in trypanosomes in recent years. Progress that has both unearthed potential drug targets and led to the discovery of novel biology in T. brucei and its pathogenic relatives.