The demonstration more than a decade ago that glycoproteins could be produced in Escherichia coli cells equipped with the N-linked protein glycosylation machinery from Campylobacter jejuni opened the door to using simple bacteria for the expression and engineering of complex glycoproteins. Since that time, metabolic engineering has played an increasingly important role in developing and optimizing microbial cell glyco-factories for the production of diverse glycoproteins and other glycoconjugates. It is becoming clear that future progress in creating efficient glycoprotein expression platforms in bacteria will depend on the adoption of advanced strain engineering strategies such as rational design and assembly of orthogonal glycosylation pathways, genome-wide identification of metabolic engineering targets, and evolutionary engineering of pathway performance. Here, we highlight recent advances in the deployment of metabolic engineering tools and strategies to develop microbial cell glyco-factories for the production of high-value glycoprotein targets with applications in research and medicine.
Protein glycosylation is the process of covalently linking a complex carbohydrate known as a glycan to an amino acid side chain of an acceptor protein . It is one of the most common post-translational protein modifications and is present in all domains of life including bacteria  and archaea . In the case of bacteria, the existence of enzymatic pathways for attaching glycans to asparagine residues (N-linked glycosylation) or to serine or threonine residues (O-linked glycosylation) is firmly established and reviewed extensively [4–7]. For both types, two distinct modes of glycan attachment have been observed in bacteria (Figure 1): (i) en bloc transfer where the glycan is first assembled on a lipid carrier molecule as a lipid-linked oligosaccharide (LLO) and then transferred from the lipid to an acceptor protein  and (ii) sequential glycosylation whereby the glycan is assembled directly on the protein by successive addition of monosaccharides by glycosyltransferase (GT) enzymes, typically occurring in the cytoplasm [8–12]. For the former mode, glycan assembly is catalyzed by GTs on the cytoplasmic side of the inner membrane, while glycan transfer to protein is catalyzed by an integral membrane protein known as an oligosaccharyltransferase (OST) on the periplasmic side of the inner membrane. Importantly, both sequential and en bloc glycosylation mechanisms from diverse bacterial species have been at least partially if not completely reconstituted in laboratory strains of Escherichia coli [14–17], thereby providing important experimental proof-of-concept that this genetically tractable host organism can be leveraged for engineering novel glycoproteins and gaining access to unexplored regions of glycoprotein structural space as described in the sections that follow.
Protein glycosylation pathways reconstituted in E. coli.
Before discussing specific examples, it is important to consider the distinct advantages that arise from producing glycoproteins in E. coli compared with eukaryotic expression hosts. For one, the viability of E. coli (or most bacteria for that matter), which do not natively possess glycosylation pathways, does not depend on protein glycosylation. This is not the case in eukaryotes where glycosylation is generally important for growth and development and, in the specific cases of N-glycosylation and O-mannosylation, essential for viability . Hence, manipulation of these native glycosylation pathways (i.e. genetic deletion or overexpression of glycosyltransferase genes) in eukaryotic cells can severely compromise the overall fitness of the host [19,20]. Another major advantage that stems from the absence of native glycosylation systems is that E. coli provides a ‘clean chassis’ in which orthogonal protein glycosylation pathways can be installed without interference from endogenous GTs, thereby leading to more uniformly glycosylated protein products [21–23].
It should also be pointed out, however, that the absence of natural glycosylation pathways also presents significant engineering challenges that relate to: (i) assembling functional pathways of GTs that catalyze the biosynthesis of a desired glycan either on a lipid carrier or on a protein target; (ii) accumulating sufficient levels of necessary building blocks (i.e. nucleotide-activated sugars, acceptor proteins); and (iii) conjugating glycan to protein with high transfer efficiency so as to maximize major performance indices, namely titer, yield, and productivity. In this review, we discuss how an emerging marriage between bacterial glycoprotein engineering and contemporary metabolic engineering is furnishing an armamentarium of new tools and strategies for overcoming the challenges associated with developing industrially competitive microbial cell glyco-factories. Of particular interest are recent advances and opportunities in the construction of artificial glycosylation pathways, genome-wide identification of metabolic engineering targets, creation of synthetic circuits for fine-tuning and controlling glyco-enzyme expression, and multiplex genome editing and evolutionary engineering for optimizing desirable glyco-phenotypes. Just as metabolic engineering has been instrumental in developing microorganisms for the production of various chemicals and materials to achieve a sustainable chemical industry, we anticipate that metabolic engineering of designer glycosylation pathways will similarly advance bacterial production platforms for rapidly and economically supplying structurally uniform glycoprotein therapeutics and vaccines for a wide range of human and animal diseases.
Design and assembly of orthogonal glycosylation pathways
The first bacterial glycoprotein engineering platform was described by Aebi and co-workers who functionally transferred the entire protein glycosylation (pgl) locus discovered in Campylobacter jejuni, which encodes a complete N-glycosylation pathway, into a laboratory strain of E. coli . The OST in this pathway, C. jejuni PglB (CjPglB), catalyzed en bloc transfer of the preassembled C. jejuni glycan from the native lipid carrier in E. coli, undecaprenyl pyrophosphate (Und-PP), to an asparagine residue of an acceptor protein (Figure 1a). This pioneering system has since been leveraged as a model to mechanistically dissect the bacterial glycosylation mechanism, which shares important functional similarities with the eukaryotic process , and as a starting point for further glycoprotein engineering. For example, the compatibility of PglB with different glycan substrates has been explored by heterologous expression of diverse lipopolysaccharide biosynthesis pathway that also proceed via an Und-PP intermediate . These studies revealed that PglB has a relaxed specificity towards its glycan substrate [25,26], minimally requiring an acetamido group at the C2 position of the reducing-end sugar . The substrate protein preferences for PglB have similarly been explored using E. coli cells engineered with the pgl locus. These studies identified a short consensus sequence (sequon) D/E-X1-N-X2-S/T (where X1, X2 ≠ Pro) that is strictly required for substrate recognition by CjPglB  and that can be engineered into flexible secondary structures within a protein [27,28] or at the N- or C-termini of the protein , enabling glycosylation of a wide range of heterologous proteins by simple introduction of a glycosylation tag. An important prerequisite for the glycosylation of these proteins is targeting to the periplasmic space, although substrate proteins that subsequently localize to the outer membrane or extracellular medium are also suitable candidates for PglB-mediated glycosylation .
Importantly, these mechanistic studies provided the requisite insight for making customized recombinant bacterial glycoproteins, where potentially any protein of interest could be modified with any glycan moiety at a desired position simply by co-expression of CjPglB, a pathway for assembling sugars of interest on Und-PP, and the desired acceptor protein modified to contain one or more glycosylation sequons. Indeed, one of the first applications to leverage the unique ability of PglB to tolerate a broad range of glycan substrates and transfer these to a short, specific motif embedded in heterologous proteins was the production of glycoconjugate vaccines (for reviews, see refs [30,31]). Specifically, by transferring the locus encoding an O-polysaccharide antigen or a capsular polysaccharide into E. coli cells carrying a copy of CjPglB and an appropriate antigenic carrier protein, glycoproteins can be produced that elicit an immune response to the corresponding polysaccharide epitope found on the surface of a pathogenic bacterium [32–36] and some of these have even advanced into human clinical trials [37–39]. It is also possible to transfer CjPglB and a carrier protein into a pathogenic host strain, resulting in rerouting of the pathogen's naturally produced Und-PP-linked polysaccharide onto the carrier [40–42], which is a viable alternative in cases where the genetic locus encoding the polysaccharide pathway is undetermined or locus is recalcitrant to cloning.
The aforementioned examples take advantage of glycan structures that are built through the enzymatic potential of a single organism, either in the native host or through heterologous transfer of intact gene clusters to E. coli. In a notable departure from this approach, several groups have reported the rational design of artificial operons comprising GTs from different organisms that co-ordinate the assembly of desired glycan structures — including those produced by human cells — on Und-PP. For example, engineering of a heterologous pathway comprising GTs from four different bacterial species enabled production of a glycoprotein bearing the human Lewis X (Lex) antigen . Along similar lines, our group has engineered a synthetic pathway comprising four GTs that enabled the production of the eukaryotic mannose3-N-acetylglucosamine2 (Man3GlcNAc2) glycan , which serves as the core structure common to all eukaryotic N-glycans including those produced in humans . In a notable departure from previous studies that relied on bacterial GTs for N-glycan biosynthesis, we expressed four GTs that were sourced from Saccharomyces cerevisiae and functionally reconstituted in the cytoplasm of E. coli. To increase the pool of GDP-Man required by the yeast mannosyltransferases, Alg1 and Alg2, we performed genome editing using the λ Red recombineering system  to delete the gene encoding GDP-mannose dehydratase (GMD), which converts GDP-Man to GDP-4-keto-6-deoxymannose in the first step of GDP-l-fucose (GDP-Fuc) synthesis. In cells producing Und-PP-linked Man3GlcNAc2, CjPglB was capable of transferring the trimannosyl core glycan to a range of acceptor proteins including bovine RNase A and a human Fc domain when the native acceptor site in these glycoproteins was changed to a bacterial sequon (DQNAT). Importantly, this was the first report to describe the engineering of an orthogonal eukaryotic N-glycosylation pathway in E. coli. While the initial product yields were low (∼50 μg/l), this has now been improved through metabolic engineering strategies discussed below.
To date, the majority of PglB-mediated glycoprotein engineering studies employ a two- or even three-plasmid system to transfer all of the requisite machinery along with the acceptor protein into E. coli. A consequence of this multi-plasmid approach is the metabolic burden placed on the host that can cause slow growth, plasmid loss, and product degradation by up-regulation of stress responses. To remedy this issue, efforts to integrate PglB into the genome have been described albeit with glycosylation efficiency that is well below 50% . We recently performed a systematic glyco-recoding of the E. coli genome using a targeted genome editing strategy to site-specifically insert PglB and the glycosyation pathway enzymes in place of native oligosaccharide operons, namely the O-polysaccharide antigen and enterobacterial common antigen, that are unnecessary for viability or glycosylation . By recombineering the complete N-glycosylation machinery to these two genomic loci, our host strain only required transformation with a single plasmid encoding the acceptor protein. As a result, cells reached significantly higher final densities and volumetric productivity was greatly increased. Moreover, glycosylation efficiency in these strains was at or near 100%, rivaling or exceeding the best glycosylation that could be obtained from plasmid-based systems.
While PglB-mediated glycosylation remains the archetypical glycoprotein engineering platform, several non-cononical modes of glycosylation have been functionally reconstituted in E. coli. Most extensively studied among these is periplasmic O-oligosaccharyltransferase (O-OST)-mediated glycosylation, which is mechanistically similar to bacterial N-glycosylation including glycan assembly on Und-PP and en bloc transfer by the O-OST to acceptor proteins (Figure 1b). The first periplasmic O-glycoprotein engineering platforms in E. coli involved the O-OSTs PilO from Pseudomonas aeruginosa and PglL from Neisseria meningitides . In contrast to CjPglB, which prefers glycans having an acetamido group at the C2 position of the reducing-end sugar  and that lacks a β1–4 linkage between the first two sugars , PglL appears able to transfer a wider range of sugars , suggesting that it may be useful for a broader range of applications. For example, direct transfer of PglL along with a suitable carrier protein into attenuated pathogens resulted in O-linked glycoproteins bearing antigenic polysaccharide structures including some that were known to be incompatible with the more restrictive PglB enzyme [51,52]. Importantly, several of the O-glycoprotein conjugates evoked specific and protective immune responses in laboratory animals. These same authors identified a minimal optimal O-linked recognition (MOOR) motif comprising eight residues that, when flanked by additional hydrophilic flanking sequences, was able to direct PglL-mediated glycosylation to heterologous proteins . It is worth mentioning that the O-OSTs from diverse Bacteroidetes appear to modify proteins at a minimal 3-residue motif, D-(S/T)-(A/L/V/I/M/T) , which is enticing from a glycoprotein engineering standpoint; however, the broad utility of these particular enzymes will depend on their glycan substrate specificity, which has yet to be determined.
Perhaps, the simplest platform for glycoprotein engineering is based on sequential N-glycosylation (Figure 1c) whereby E. coli cells expressing the N-glycosyltransferase from Actinobacillus pleuropneumoniae (ApNGT) use nucleotide-activated glucose (Glc) to glycosylate asparagines of target proteins expressed in the cytoplasm . A useful feature of ApNGT is its relaxed substrate specificity, requiring only a minimal N-X-S/T sequon that allows glycosylation of heterologous glycoproteins such as human erythropoietin at their native N-glycosylation sites. While a single N-linked Glc is of limited utility on its own, several groups have used this monosaccharide for priming the biosynthesis of glycan polymers including dextran  and polysialic acid  that might find use in vaccine applications or half-life extension of protein drugs.
Cytoplasmic pathways for sequential O-glycosylation in E. coli have also been described (Figure 1d). In one instance, the complete pathway for stepwise glycosylation of Streptococcus parasanguinis Fap1 was transferred to E. coli, including an initiating O-glycosyltransferase (OGT) complex comprising GtfA and GtfB that generates O-linked GlcNAc and several additional GTs that elaborate the glycan . Efforts to re-engineer the glycan have yet to be described, but a 25-residue glycosylation sequence has been identified that might permit glycosylation of heterologous proteins bearing this motif . In a related example, human UDP-GalNAc/polypeptide UDP-GalNAc transferase (GalNAc-T), an OGT that transfers N-acetylgalactosamine (GalNAc) from UDP-GalNAc to certain serines or threonines was expressed in the cytoplasm of E. coli to execute the initial step of human, mucin-type O-glycosylation . Because nonpathogenic E. coli strains produce UDP-GlcNAc, but not UDP-GalNAc, co-expression of the UDP-GlcNAc C-4 epimerase WbpP from P. aeruginosa was required to convert UDP-GlcNAc to UDP-GalNAc. In cells producing UDP-GalNAc, GalNAc-T was able to glycosylate several fusion proteins containing GalNAc-T recognition sequences derived from eukaryotic O-glycoproteins including the human mucin-based peptide, CDNKPAPGSTAPPAA, and the Sus scrofa plasminogen-based peptide, PPPTSGPT. The authors subsequently used the O-linked GalNAc as a handle for in vitro chemical modification with poly-ethylene glycol (PEG), yielding a glyco-PEGylated protein product. While not the focus of the study, one can envision the installation of GalNAc as a starting point for the bacterial biosynthesis of more elaborate mucin-type glycans, including the tumor-associated T-antigen disaccharide (Galβ1,3GalNAcα) and sialic acid-containing versions thereof. A key feature of this approach is the ability to form fully human glycan linkages, which is not currently possible with the native bacterial NGT and OGT enzymes described above and thus limits their biotechnological utility.
Optimization of glycosylation pathway performance
In tandem with the creation of orthogonal glycosylation pathways in E. coli, significant efforts have been made towards optimizing the overall glycolipid and glycoprotein yield, glycosylation efficiency (or site occupancy), and product homogeneity of these platforms. These can be broadly categorized into efforts that (i) improve bioprocess conditions through manipulation of media compositions, temperature, and pathway induction; (ii) optimize host strains and pathway enzyme expression levels to increase flux towards protein, lipid, and glycan synthesis; and (iii) engineer enzymes that constitute the pathway. Each of these approaches has been shown to improve glycosylation pathway performance by relieving bottlenecks in specific steps, paving the way to a combined, holistic approach to make new bacterial glycosylation pathways commercially viable.
In the case of optimizing bioprocess conditions, many relatively straightforward approaches have been reported to improve glycosylation pathway performance. For example, during the production of a glycoconjugate vaccine, it was found that biomass production was particularly sensitive to induction of the CjPglB OST, revealing a clear need for separation of biomass growth and glycosylation pathway expression . In addition, a 30–50-fold improvement in glycosylation of carrier proteins (up to 24 mg/l of glycosylated carrier protein) was achieved by optimizing a fed-batch reactor feeding strategy and medium composition. A subsequent study provided further evidence that induction timing of CjPglB is critical for glycosylation efficiency, with a delay between PglB induction and carrier protein expression proving to be optimal . It was suggested that functional expression and proper insertion of CjPglB at the time of carrier protein translocation may allow better glycan transfer, possibly before carrier folding. These authors also showed that optimizing Mg2+ in the media had a small effect on glycosylation efficiency. Along similar lines, the use of autoinduction instead of IPTG-based induction modestly improved CjPglB-mediated glycosylation of a different target protein with either C. jejuni or Haemophilus inﬂuenzae glycans . It is widely recognized that the use of autoinduction media helps overcome the inherent instability of inducible strains, thereby reducing metabolic burden . This burden can cause the levels of endogenously produced nucleotide-activated sugars to be limiting, in which case supplementation of the media with the corresponding sugar can help to alleviate the bottleneck . In certain instances, supplementation is required because the sugar is natively absent, such as with sialic acid in K-12 strains of E. coli . Supplementing bacterial growth medium with monosaccharides also presents opportunities for metabolic labeling with azido-sugars, resulting in the incorporation of azide functional groups into newly synthesized polysaccharides. For example, metabolic labeling of Helicobacter pylori with an acetylated azido GlcNAc analog was used to identify glycoproteins in this bacterium  while labeling of Bacteroides fragilis with an acetylated azido GalNAc analog enabled imaging of bacterial polysaccharides and tracking of host–microbiota interactions . Such labeling strategies could also be used in high-throughput screens to optimize protein glycosylation efficiency in E. coli.
In addition to bioprocess considerations, optimization of host strains and pathway enzyme expression levels offers a powerful route to improving glycosylation pathway performance. This can be accomplished by adding, deleting, up-regulating, or knocking down the expression of host genes. A simple yet effective strategy is to rationally identify host enzymes that are known to catabolize desired substrates and delete the genes encoding these enzymes, as illustrated by deletion of gmd to increase GDP-Man pools in E. coli , discussed above. Other examples include deletion of waaL, which encodes the E. coli O-antigen ligase, to eliminate unwanted transfer of Und-PP-linked substrates to lipid A-core  and deletion of the E. coli nanA gene to remove sialic acid aldolase activity and ensure that sialic acid supplemented in the media is not catabolized and instead remains available for activation by the CMP-sialic acid synthetase and subsequent incorporation into polysialic acid . The same rational approach has been used to identify target genes whose overexpression improves glycosylation. For instance, overexpression of the genes encoding the initiating glycosyltransferase WecA has been reported to improve glycosylation efficiency with the C. jejuni glycan by PglB  while overexpression of the genes encoding phosphomannomutase (ManB) and mannose-1-phosphate guanylyl transferase (ManC) in the GDP-Man pathway helped to increase production of the Man3GlcNAc2 glycan by nearly 50-fold to 13.9 μg/l . As discussed above, placement of glycosylation enzymes including CjPglB into the genome has been shown to improve glycosylation pathway performance by reducing burden on the host [47,48]. Codon optimization has also been used to improve expression of PglB, which positively affected glycosylation efficiency with the C. jejuni glycan in E. coli . In a more nuanced approach, a recent study by our group has shown that manipulating expression level by promoter engineering, in combination with overexpression of ManB and ManC, led to production of up to 10 times more glycosylated acceptor protein (to ∼14 mg/l) . This large improvement represented a two-orders-of-magnitude improvement compared with our earlier report  and rivaled the yield (25 mg/l) reported for E. coli-based production of carrier proteins glycosylated with bacterial polysaccharides .
To identify less obvious metabolic engineering targets that affect glycosylation, several groups have turned to computer-aided flux balance analysis and high-throughput experimental approaches. For example, we used a constraint-based model of E. coli metabolism in combination with heuristic optimization to design gene knockout strains that overproduced the C. jejuni glycan precursor . Predicted knockouts were subsequently constructed in E. coli and experimentally verified to improve glycan production. In a related example, shotgun proteomics was combined with metabolic network analysis to forward engineer E. coli for improved glycosylation of the C. jejuni AcrA glycoprotein . This approach revealed that enhancing flux through the glyoxylate cycle, for example by up-regulating isocitrate lyase expression, could increase the amount of glycosylated protein by up to 3-fold. The same group described an inverse metabolic engineering strategy in which screening of genomic DNA libraries was used to identify genes (malQ, rseP, ptsA, rffE, and dxs) whose overexpression in E. coli improved production of the heterologously expressed C. jejuni glycan and overall glycoprotein yield, albeit without significantly enhancing glycosylation efficiency .
Efforts to improve glycosylation pathway performance have more recently expanded to include enzyme engineering strategies. Both rational design and random mutagenesis-based approaches have proved to be particularly useful for tailoring important properties of glycosylation pathway enzymes including catalytic activity, specificity, and selectivity, thereby relieving system constraints that limit bacterial glycoprotein engineering platforms. To address the restricted sequon specificity of CjPglB, we employed a rational approach whereby structure-guided mutagenesis of CjPglB active-site residues was used to generate a combinatorial library that was screened using a high-throughput replica-plating strategy termed glycoSNAP (glycosylation of secreted N-linked acceptor proteins) . Library screening yielded mutants of CjPglB with broadly relaxed specificity, which probably arose from a more open peptide-binding pocket. Interestingly, when these mutations were mapped onto Campylobacter lari PglB, the relaxed acceptor sequon specificity was maintained, suggesting that these mutations are broadly applicable to many homologs. This structure-guided PglB engineering approach was complemented with the metagenomic mining of CjPglB homologs that also exhibited relaxed sequon specificity, in particular the PglB enzyme from Desulfovibrio gigas . Additional efforts for structure-guided engineering of CjPglB uncovered mutants with 16-fold increased activity to a non-native polysaccharide substrate .
As a complement to structure-guided approaches, directed evolution has been used to engineer specific traits of glycosylation enzymes. The success of directed evolution for glyco-enzymes hinges on the non-trivial task of developing genetic screens or selections that enable isolation of rare mutants from large populations (1010–12 clones) by creating a crucial genotype-to-glycophenotype linkage. In seminal work, Withers and co-workers engineered the sugar-transfer activity of the sialyltransferase CstII from C. jejuni using a high-throughput assay that relied on the fact that fluorescently labeled acceptor sugars can freely diffuse across the bacterial cell membrane, but the transfer products are selectively trapped in cells. Using this assay, a combinatorial library containing more than a million CstII variants was screened via fluorescence-activated cell sorting, and a mutant exhibiting 400-fold higher catalytic activity was discovered . Similar mutagenesis and screening approaches have since been developed to evolve other GTs such as the Neisseria meningitidis serogroup B polysialyltransferase  and the C. jejuni β-1,3-galactosyltransferase CgtB .
Directed evolution in the context of PglB-mediated glycosylation has been performed using enzyme-linked immunosorbent assay (ELISA)-based detection of periplasm-extracted glycoproteins in 96-well plates, uncovering inactivating mutations in CjPglB and corroborating the necessity of a strictly conserved WWDYG motif . However, the ELISA-based screening approach, while effective, was limited to relatively small libraries containing ∼5000 or fewer clones and thus incapable of deeply penetrating protein sequence space as is desirable in directed evolution experiments. Another concern with extract- or cell lysis-based screens [68,75] is their potential to be confounded by the prevalence of glycan intermediates that have not been transferred to proteins (for example, LLOs in bacterial cell membranes), increasing the likelihood for false-positive hits. These issues can be overcome using genetic assays where final glycosylated products are effectively segregated from their precursors such as in cell-surface display of glycoconjugates [29,44,65], glycophage display [76,77], and glycoSNAP , with the latter two methods having already shown promise for screening rationally designed combinatorial libraries of acceptor sequons and glycosylation enzymes.
In contrast with the growing suite of genetic screens that have been reported, genetic selections for interesting glyco-phenotypes have proved more challenging because these require associating the glycosylation state of a protein with survival of a host that does not natively produce such protein products. At present, only one such selection has been described for bacterial glycosylation and is based on the commonly observed phenomenon of glycan-induced alterations in protein folding [78,79]. Specifically, TEM-1 β-lactamase was fused to a glycoprotein substrate in cells carrying the C. jejuni glycosylation machinery such that changes in periplasmic folding and stability caused by N-glycosylation of the substrate resulted in measurable differences in bacterial resistance to β-lactam antibiotics . The selection was then used to explore the effects of CjPglB-mediated glycosylation on in vivo protein folding and stability and could potentially enable the isolation of modified proteins with advantageous folding properties. However, there have been no reports of using the assay for high-throughput screening of combinatorial libraries. Finally, it is worth mentioning that the application of screens and selections for general protein folding and stability [81,82], independent from glycosylation, could be used to overcome the well-documented insolubility and aggregation issues associated with heterologous expression of glycosylation enzymes in E. coli. Such solubility-enhanced, aggregation-resistant enzymes are prized for glycoprotein engineering applications.
Cell-free metabolic engineering of glycoprotein biosynthesis
Cell-free metabolic engineering (CFME) is emerging as a complementary method to in vivo approaches for accelerating the design, build, and test (DBT) cycles of biochemical pathways. From a metabolic engineering perspective, cell-free systems decouple cell survival and engineering goals, offering a unique opportunity to redirect nearly all carbon flux towards producing desired products. Prokaryote-based cell-free systems were introduced over a century ago, but have only recently been applied to glycan and glycoprotein biosynthesis with an eye towards maximizing product homogeneity and yield as well as characterizing the relevant enzymatic reactions. The two major types of CFME approaches are based on: (i) purified enzymes and (ii) crude extracts (Figure 2).
CFME of glycoprotein biosynthesis.
The principle of purified enzyme CFME is to express, purify0001
1Owing to a lack of robust membrane protein purification protocols, several early studies used crude-membrane extracted OSTs instead of purified enzymes in cell-free reaction. However, we consider these reactions as purified systems because the OST has been overexpressed and isolated from cells.
Purified enzyme CFME is also a powerful method to construct pathways for glycan assembly on lipid carriers, which can later be used as a substrate by OSTs. Since glycan synthesis often depends on the concomitant expression of several GTs, sometimes derived from different organisms, the ability to monitor each of their activities both independently and in concert can speed up the DBT process towards pathway assembly. This approach has been exemplified in the construction of a multi-step enzymatic pathway for the biosynthesis of the C. jejuni N-linked glycan [96–98] and eukaryotic GlcNAc2Man1–5 [99,100] on lipid carriers, which are then transferred onto peptide or protein acceptors using their respective OSTs. Finally, two-stage glycosylation strategies that involve the in vivo production of an initial glycoprotein, whose glycans are then remodeled in vitro using purified GTs, have enabled the synthesis of glycoconjugates with complex glycan structures including Lex antigen  as well as Man9GlcNAc2 and biantennary complex-type N-glycans .
In contrast with purified protein CFME, crude extract systems bypass laborious protein production and purification by employing cell extracts containing relevant enzymes. The extract is made by simply collecting the supernatant from cell lysates enriched with protein synthesis machineries. Currently, the best-characterized and most widely adopted crude extract CFME is from E. coli, which can activate long enzymatic pathways of over eight enzymes with productivities exceeding 100 mg/ml-h and in scales of 100 l [102–105]. The application of E. coli crude extract CFME in endeavors of glycoengineering is limited owing to the absence of endogenous glycosylation machinery. This was initially overcome through augmenting commercial E. coli S30 extracts with purified C. jejuniN-linked glycosylation components, yielding glycoproteins at ∼1 mg/ml . Our more recent work reports the development of a fully integrated E. coli cell-free glycoprotein synthesis (CFGpS) technology . CFGpS enables a one-pot reaction for efficient and site-specific glycosylation of target proteins, requiring only the addition of DNA into the reaction. This platform is highly modular allowing the interchange of multiple distinct OSTs and structurally diverse LLOs, including the eukaryotic N-glycan Man3GlcNAc2.
The ability to rapidly synthesize and characterize GTs would greatly benefit the study and development of glycosylation systems. The bottleneck of expressing these enzymes, which are usually integral transmembrane proteins, in a cell-free system has recently been overcome . By supplying E. coli extracts with optimal concentrations of nanodisc lipid scaffolds, soluble expression of functional OSTs from the Campylobacter species has been achieved. The full potential of cell-free strategies towards characterizing GTs has recently been showcased through the glycosylation sequence characterization and optimization by rapid expression and screening (GlycoSCORES) technology, a high-throughput screening platform that couples crude extract-based glycosylation with self-assembled monolayers for matrix-assisted desorption/ionization (SAMDI) MS . The automation of this platform should accelerate the pace of characterizing relevant enzymes. Data gathered through this strategy can be used to generate cell-free kinetic models [66,109,110] and incorporated into cell-free machine learning algorithms [111,112] to guide the design of glycoprotein and glycan biosynthesis pathways.
The identification of several distinct N- and O-linked protein glycosylation systems in a variety of bacterial species and their functional transfer to genetically tractable E. coli cells has greatly enhanced the potential of bacterial systems for the generation of therapeutically relevant glycoconjugates. These bacterial glycoprotein engineering platforms have been employed to generate well-defined therapeutic compounds including the first conjugate vaccines produced entirely in bacteria, two of which have recently undergone successful phase I clinical trials [37–39].
Owing to their versatility and ease of manipulation, bacteria are an ideal host for the production of a diverse array of structurally defined polysaccharides and glycoconjugates that will be of interest as medical and industrial products. Furthermore, the low cost of goods associated with the culturing of bacterial strains, especially E. coli, opens up this technology to a far wider range of laboratories than existing chemical/chemoenzymatic synthesis methods or mammalian cell culture approaches. The findings from a recent report commissioned by the National Academy of Sciences state that ‘glycans play roles in almost every biological process and are involved in every major disease’ and further assert that ‘the development of transformative methods for the facile synthesis of carbohydrates and glycoconjugates should be a high priority’ (National Research Council 2012). Bacterial glycoengineering represents an emerging field with the potential to play a major role in meeting these goals.
While recent work shows great promise for the design and construction of orthogonal glycosylation pathways in E. coli, there is still a critical need to bring major performance indices, namely titer, yield, and productivity to production scale by tackling bottlenecks in cell metabolism, pathway enzyme expression, and catalytic transformation of precursors to products. The marriage of metabolic engineering with bacterial cell-based and cell-free glycoprotein production systems is helping to overcome these bottlenecks by fine-tuning and optimizing the metabolic fluxes that lead to increased yield and concentration of desired glyco-products while limiting formation of unwanted byproducts, creating novel glycosylation enzymes by rational design and random mutagenesis-based strategies and integrating these engineered catalysts into synthetic gene regulatory circuits, increasing the tolerance of cells to the burden of expressing multi-enzyme glycosylation pathways, and even the development of cost-effective fermentation strategies and downstream processes. Complementing these experimental strategies are computational methods for characterizing glycosylation phenotypes from a holistic perspective and predicting gene targets throughout the whole cellular network that contribute to the enhanced production of glycoprotein targets. It is anticipated that the continued deployment of advanced metabolic engineering tools and strategies will increase the efficiency of strain development, leading to a veritable transformation of bacteria into living glyco-factories for scalable, bottom-up production of a wide array of complex glycoconjugates with applications in research and medicine.
The identification of several distinct N- and O-linked protein glycosylation systems in a variety of bacterial species and their functional transfer to genetically tractable E. coli cells has greatly enhanced the potential of bacterial systems for the generation of glycoconjugates.
Owing to their versatility and ease of manipulation, bacteria are an ideal host for the production of a diverse array of structurally defined polysaccharides and glycoconjugates that will be of interest as medical and industrial products.
The marriage of metabolic engineering with bacterial cell-based and cell-free glycoprotein production systems is increasing the efficiency of strain development, leading to a veritable transformation of bacteria into living glycol-factories for scalable, bottom-up production of a wide array of complex glycoconjugates.
N-glycosyltransferase from Actinobacillus pleuropneumoniae
cell-free glycoprotein synthesis
cell-free metabolic engineering
design, build, and test
enzyme-linked immunosorbent assay
glycosylation of secreted N-linked acceptor proteins
mannose-1-phosphate guanylyl transferase
This work was supported by the Defense Threat Reduction Agency [grant 11631647 to M.P.D.] and the National Science Foundation [grants # CBET-1159581, CBET-1264701, CBET-1402843, CBET-1605242, and MCB-1413563 to M.P.D.]. T.J. was supported by a Royal Thai Government Fellowship. C.J.G. was supported by a National Science Foundation Graduate Research Fellowship [grant # DGE-1144153].
M.P.D. has a financial interest in Glycobia, Inc. and Versatope, Inc. M.P.D.'s interests are reviewed and managed by Cornell University in accordance with their conflict of interest policies.