Facilitated by advances in the separation sciences, mass spectrometry and informatics, glycoproteomics, the analysis of intact glycopeptides at scale, has recently matured enabling new insights into the complex glycoproteome. While diverse quantitative glycoproteomics strategies capable of mapping monosaccharide compositions of N- and O-linked glycans to discrete sites of proteins within complex biological mixtures with considerable sensitivity, quantitative accuracy and coverage have become available, developments supporting the advancement of structure-focused glycoproteomics, a recognised frontier in the field, have emerged. Technologies capable of providing site-specific information of the glycan fine structures in a glycoproteome-wide context are indeed necessary to address many pending questions in glycobiology. In this review, we firstly survey the latest glycoproteomics studies published in 2018–2020, their approaches and their findings, and then summarise important technological innovations in structure-focused glycoproteomics. Our review illustrates that while the O-glycoproteome remains comparably under-explored despite the emergence of new O-glycan-selective mucinases and other innovative tools aiding O-glycoproteome profiling, quantitative glycoproteomics is increasingly used to profile the N-glycoproteome to tackle diverse biological questions. Excitingly, new strategies compatible with structure-focused glycoproteomics including novel chemoenzymatic labelling, enrichment, separation, and mass spectrometry-based detection methods are rapidly emerging revealing glycan fine structural details including bisecting GlcNAcylation, core and antenna fucosylation, and sialyl-linkage information with protein site resolution. Glycoproteomics has clearly become a mainstay within the glycosciences that continues to reach a broader community. It transpires that structure-focused glycoproteomics holds a considerable potential to aid our understanding of systems glycobiology and unlock secrets of the glycoproteome in the immediate future.
Introduction
The human genome encodes a plethora of enzymes, receptors, and transporters that concertedly facilitate and regulate protein glycosylation, the covalent attachment of complex carbohydrates (glycans) to polypeptides. Glycosylation is a prevalent and structurally diverse type of protein modification central to many biological processes including cellular development [1], lectin-mediated cell–cell communication [2], adhesin-based host-microbial interactions [3,4], and immunity [5–7]. Glycans also aid the folding and impact the solubility and integrity of glycoproteins, as well as regulate their susceptibility to proteolytic cleavage to mention a few of their modulatory functions [8–10].
Dysregulation of the glycosylation machinery often leads to aberrant protein glycosylation. Altered glycophenotypes can arise from and, in other cases, lead to a broad range of pathologies, amongst others, cancer [11–13], infection and inflammation [14–17], and schizophrenia [18]. Furthermore, glycans are functional effectors of genetic and epigenetic disease risk [19]. Consequently, unravelling the dynamics and the glycan remodelling associated with changes to the cellular milieu under normal physiology and during disease conditions have been a long-standing goal of glycobiologists; such advances are required to promote the next generation of glycoprotein-based diagnostics and therapeutics [20,21].
Glycoproteomics, the system-wide study of glycans linked to proteins or peptides, is a powerful analytical discipline capable of identifying and quantifying physiological or disease-driven aberrations in protein glycosylation arising from changes in the glycoprotein level, site occupancy (macro-heterogeneity) and/or glycosylation pattern at individual sites (micro-heterogeneity) [22]. Mass spectrometry (MS)-based analysis of intact glycopeptides as opposed to the study of detached glycans (glycomics) or formerly glycosylated peptides (site mapping) is an attractive analytical strategy since only intact glycopeptide (or glycoprotein) analysis is able to provide direct evidence for the site-specific glycosylation of proteins. Intact glycopeptide profiling, however, remains analytically challenging when applied to crude biological mixtures given the fact that both the chemical structure of the glycan, the peptide modification site, and the identity of the peptide carrier need to be correctly identified and quantified for each glycopeptide.
The field of glycoproteomics has for these reasons received considerable attention over the past years. As previously reviewed [22–34], diverse LC–MS/MS-driven glycoproteomics tools and strategies capable of quantitatively mapping monosaccharide compositions of N- and O-linked glycans to peptides and, in favourable cases, to discrete sites of proteins in complex biological mixtures are now available. This has led to a body of literature exploring the glycoproteome in various biological systems, some with considerable coverage, as demonstrated by studies detecting thousands of intact N- and O-glycopeptides within a single glycoproteomics experiment [35–43]. While the identification accuracy of some of the reported glycopeptides from such studies may be questioned due to the analytical challenges outlined above [44–46], these examples illustrate that the technology is now getting ready to address pending questions in systems glycobiology.
A recognised limitation, however, is that most glycoproteomics strategies, at present, do not provide information of the glycan structure beyond the generic monosaccharide composition i.e. hexose (Hex), N-acetylhexosamine (HexNAc), deoxyhexose (dHex), and N-acetylneuraminic acid (NeuAc) or even, in some cases, simply consider the glycans as a largely undefined protein modification based on their molecular mass. The structural elucidation of peptide-linked glycans beyond their monosaccharide compositions is often inferred from our growing knowledge of the related yet species-specific biosynthetic pathways responsible for the glycoprotein formation that exist across the areas of life [47,48]. This lack of concrete evidence for the glycan fine structure is an important shortcoming given that glycobiological processes are often governed by specific rather than generic glycan structures and substructures (glycoepitopes) occupying distinct sites on their carrier proteins. Detailed site-specific glycan profiling of the glycoproteome, herein referred to as structure-focused glycoproteomics, capable of discriminating isomeric monosaccharide residues e.g. N-acetyl-β-d-glucosamine (β-linked GlcNAc) versus N-acetyl-α-d-galactosamine (α-linked GalNAc), key topological differences e.g. antenna (α1,2/3/4-) versus core (α1,6-) fucosylation and triantennary (β1,4/6-) versus bisecting (β1,4-) GlcNAc-containing glycans, and isomeric glycosidic linkage types e.g. α2,3- vs α2,6-sialylation is consequently considered one of the next important frontiers in glycoproteomics.
This review will first provide an overview of the most recent glycoproteomics studies published in the period 2018–2020, their approaches and their findings, and then summarise important technological developments supporting the emergence of structure-focused glycoproteomics.
Recent N- and O-glycoproteomics studies (2018–2020)
Our survey of the most recent literature identified a total of 28 studies applying LC–MS/MS-based glycoproteomics to biological samples, Figure 1. While our previous review revealed that glycoproteomics studies carried out in the period 2014–2016 typically reported between 100 and 2000 unique (non-redundant) intact N- and O-linked glycopeptides [22], studies published since 2018 are now routinely identifying and quantifying thousands of glycopeptides, some even beyond 10 000 and 20 000 intact glycopeptides [49,50].
Glycoproteome coverage reported by recent glycoproteomics studies published in the period 2018–2020 arranged by the studied species and publication year. The studies included in this overview all performed large-scale glycoproteomics analysis of complex biological samples from unaltered or only mildly manipulated biological sources and reported a significant number of intact glycopeptides. Studies not matching these criteria and studies reanalysing existing glycoproteomics data were left out. We apologise for any omission of studies published in this period that we may have failed to identify in our survey of the literature. The coverage as measured by the reported unique (non-redundant) intact N-glycopeptides (yellow bars) and O-glycopeptides (blue bars) has been plotted for each study. Details of these 28 glycoproteomics studies including their full references are provided in Table 1.
Glycoproteome coverage reported by recent glycoproteomics studies published in the period 2018–2020 arranged by the studied species and publication year. The studies included in this overview all performed large-scale glycoproteomics analysis of complex biological samples from unaltered or only mildly manipulated biological sources and reported a significant number of intact glycopeptides. Studies not matching these criteria and studies reanalysing existing glycoproteomics data were left out. We apologise for any omission of studies published in this period that we may have failed to identify in our survey of the literature. The coverage as measured by the reported unique (non-redundant) intact N-glycopeptides (yellow bars) and O-glycopeptides (blue bars) has been plotted for each study. Details of these 28 glycoproteomics studies including their full references are provided in Table 1.
Biological sample (species) . | Approach 1) Mass spectrometer 2) MS/MS method (energy level applieda) 3) Data acquisition 4) Data analysis 5) Quantification . | Glycoproteome coverageb 1) Glycopeptides 2) Glycosites 3) Glycoproteins . | Main findings/novelty . | Reference . |
---|---|---|---|---|
Sera from PCa patients and healthy individuals (human) | 1) Q-Exactive 2) SCE-HCD (25 ± 10% NCE) 3) DDA 4) Mascot v2.4 5) Label-free (XICs) | 1) 3447(N) 2) 970(N) 3) 257(N) | • Site-specific identification and quantification of the sialylated N-glycoproteome in sera from PCa patients and healthy donors | Bollineni et al. [51] |
Breast cancer PANC1 and PKM2 knockout cell lines (human) | 1) Orbitrap Fusion Lumos Tribrid 2) EThcD (SA 33% NCE) 3) DDA 4) Byonic v2.9.38 in Proteome Discoverer v2.1 5) DiLeu | 1) 1067(N) 2) 311(N) 3) 205(N) | • Identification of site-specific N-glycosylation differences in PKM2 knockout breast cancer and parental cells | Chen et al. [39] |
Urine from patients with PCa and BPH (human) | 1) Q-Exactive HF 2) HCD (29% NCE) 3) DDA 4) Byonic v2.6.46 5) TMT | 1) 954(N)/965(O) 2) 1310(N) 3) 729(N)/160(O) | • Identification of a panel of 56 N-glycopeptides able to discriminate PCa and BPH patients | Kawahara et al. [52] |
Resting and activated primary T cells (human) | 1) LTQ-Orbitrap Elite 2) HCD (27 eV)-pd-ETD (SA 35 eV) and inclusion list-based HCD (27 eV)/ETD (SA 35 eV)/CID (35 eV) 3) DDA 4) Byonic v2.0 in Proteome Discoverer v1.4 5) Label-free (XIC) | 1) 2219(O) 2) 851(O) 3) 1045(O) | • Identification and subcellular localisation of O-GlcNAcylated proteins expressed by T cells • Found changes in the site-specific O-GlcNAcylation during T cell activation | Woo et al.[53] |
Tumour and adjacent control tissues from patients with kidney cancer, T cells and sera (human) | 1) Q-Exactive HF and Orbitrap Fusion Lumos Tribrid 2) ETD/HCD (36% NCE) 3) DDA 4) Sequestc in Proteome Discoverer v2.2. 5) Label-free (spectral counting) | 1) 4827(O) 2) 3055(O) 3) 1060(O) | • Specific enrichment of O-glycopeptides in complex clinical samples using OpeRATOR • Identified site-specific O-glycosylation differences between kidney tumour and adjacent tissues | Yang et al. [40] |
Serum from patients with (non-) crescentic IgA nephropathy and healthy donors (human) | 1) Q-Exactive HF and Orbitrap Fusion Lumos Tribrid 2) HCD, EThcD 3) DDA 4) Byonic v2.10.21 5) Dimethyl labelling | 1) 499(O) 2) 173(O) 3) 49(O) | • Identification of site-specific changes in O-glycosylation associated with crescentic IgA nephropathy | Zhang et al. [54] |
Breast cancer MCF-7 cell line and matching MCF-7 stem cells (human) | 1) Q-Exactive 2) SCE-HCD (30 ± 10% NCE) 3) DDA 4) GPSeekerc 5) Diethyl labelling | 1) 2558(N) 2) 727(N) 3) 640(N) | • Identified differences in the site-specific N-glycosylation between cancer cells and cancer stem cells | Wang et al. [55] |
HepG2 liver cancer cell line and LO2 normal liver cell line (human) | 1) Q-Exactive 2) SCE-HCD (30 ± 10% NCE) 3) DDA 4) GPSeekerc 5) Dimethyl labelling | 1) 5405(N) 2) 1218(N) 3) 1077(N) | • Structure-focused glycoproteomics study revealing site-specific N-glycan differences in HepG2 and LO2 | Xiao and Tian [56] |
Serum from healthy donors (human) | 1) Orbitrap Fusion Tribrid 2-3) ETD (SA 25% NCE)/SCE-HCD (27 ± 5% NCE)-DDA; SCE-HCD (27 ± 5% NCE)-DIA 4) Sequest TH, ptmRS in Proteome Discoverer v1.4, v2.2 and Spectronaut v11.0 with a large O-glycopeptide spectral library 5) Label-free (XIC) | 1) 269(O) 2) 253(O) 3) 159(O) | • Amongst the first DIA-based glycoproteomics studies • Deep profiling of the serum O-glycoproteome without glycopeptide enrichment | Ye et al. [57] |
Urinary exosomes from male donors (human) | 1) Orbitrap Fusion Lumos Tribrid 2) SCE-HCD (35 ± 5% NCE)-pd-EThcD (SA 15% NCE) 3) DDA 4) Protein Prospector v5.22.1, pGlyco v2.0 5) Label-free (XIC) | 1) 3144(N) 2) 604(N) 3) 378(N) | • N-glycoproteome and site-specific micro-heterogeneity of human urinary exosomes | Brown et al. [42] |
DG75 Burkitt's lymphoma cells (human) | 1) Orbitrap Fusion and Lumos Tribrid 2) HCD (25% for MS2, 35% NCE for MS3) 3) DDA 4) pGlyco v2.0 5) TMT | 1) 5367(N) 2) 855(N) 3) 528(N) | • Optimised strategy for quantitative site-specific glycopeptide analysis • Site-specific glycosylation changes upon inhibition of the fucosylation process | Fang et al. [58] |
Tumour and control tissues from high-grade serous ovarian carcinoma patients (human) | 1) Q-Exactive and Orbitrap Fusion Lumos Tribrid 2) HCD (38% NCE) 3) DDA 4) GPQuest v2.1 5) TMT | 1) 3202(N) 2) 1690(N) 3) Not reported | • Differences in site-specific N-glycosylation and abundance of glycosylation enzymes in tumour and control tissues | Hu et al. [59] |
PCa and BPH tissues (human) | 1) Q Exactive HF-X 2) HCD (35% NCE) 3) DDA 4) Byonic v2.6.46 5) TMT | 1) 7447(N)/522(O) 2) 1085(N)/308(O) 3) 540(N)/178(O) | • The use of integrated N- and O-glycome and glycoproteome profiling revealed key glycan changes associated with PCa progression | Kawahara et al. [43] |
Oncogene-transformed MCF10A breast epithelial cells (human) | 1) Orbitrap Fusion Lumos Tribrid 2) HCD (30% NCE)-pd-AI-ETD 3) DDA 4) Byonicc 5) Label-free (XIC) | 1) 2459(N) 2) 785(N) 3) 480(N) | • Oncogene-induced remodelling of the surface glycoproteome and the global cellular glycoproteome | Leung et al. [60] |
Urine from healthy donors (human) | 1) Q-Exactive 2) SCE-HCD (30 ± 10% NCE) 3) DDA 4) GPSeekerc 5) Label-free (spectral counting) | 1) 2986(N) 2) 419(N) 3) 327(N) | • Site-specific N-glycopeptide analysis of human urinary glycoproteome with putative N-glycan structural information | Shen et al. [61] |
Serum from healthy donors (human) | 1) Q-Exactive 2) HCD (33% NCE) 3) DDA 4) pMatchGlyco v1.2 (spectral library of de-N-glycopeptides) 5) Label-free (XIC) | 1) 22 677(N) 2) 1036(N) 3) 526(N) | • Site-specific N-glycosylation of serum glycoproteins with very high glycoproteome coverage | Shu et al.[50] |
ADR-resistant MCF-7/ADR cell line and matching MCF-7/ADR cancer stem cell line (human) | 1) Q-Exactive 2) SCE-HCD (30 ± 10% NCE) 3) DDA 4) GPSeekerc 5) Diethyl labelling | 1) 4016(N) 2) 1102(N) 3) 1014(N) | • Differences in site-specific N-glycosylation between cancer cells and cancer stem cells | Xu et al. [62] |
Plasma from papillary thyroid carcinoma patients and healthy donors (human) | 1) Orbitrap Fusion Lumos Tribrid 2) SCE-HCD (30 ± 10% NCE) 3) DDA 4) pGlyco v2.2.0 5) Qualitative | 1) 1644(N) 2) 862(N) 3) 369(N) | • Profiling of the medium- and low-abundant plasma glycoproteome (reported circulating glycoproteins down to 0.5 ng/ml levels) | Zhang et al. [63] |
Liver tumour and para-cancerous tissues from HCC patients (human) | 1) Orbitrap Fusion Lumos Tribrid 2) HCD (37% NCE) 3) DDA 4) GPQuest v2.0 5) TMT | 1) 4741(N) 2) 1184(N) 3) 894(N) | • Differences in site-specific N-glycosylation in HCC patients with low and high levels of α-fetoprotein | Zhao et al. [64] |
Urine from healthy male and female donors (human) | 1) Q-Exactive 2) HCD (30% NCE) 3) DDA 4) Byonicc 5) Label-free (XIC) | 1) 1300(O) 2) Not reported 3) 435(O) | • Gender differences identified in the urine O-glycoproteome | Zhao et al. [65] |
Milk from healthy donor (human) | 1) Orbitrap Fusion Lumos Tribrid 2) HCD (30% NCE)-pd-EThcD (SA 27% NCE) 3) DDA, SIM, PRM (30% NCE) 4) Byonic v3.2.0 in Proteome Discoverer v2.2 5) Label-free (XIC) | 1) 1697(N) 2) 191(N) 3) 110(N) | • Dynamics in the site-specific N-glycosylation over the lactation period | Zhu et al. [66] |
CHO-K1 subclone (mouse) | 1) Q-Exactive 2) HCD (32% NCE) 3) DDA 4) GPQuest v2.0 5) Label-free (spectral counting) | 1) 10 338(N) 2) 1162(N) 3) 530(N) | • Macro- and micro-heterogeneity of the N-glycoproteome of the lysate and culture media of CHO-K1 cells | Yang et al. [49] |
Brain tissue (mouse) | 1) Orbitrap Fusion Lumos Tribrid 2) HCD (28% NCE)-pd-AI-ETD 3) DDA 4) Byonicc 5) Qualitative | 1) 5662(N) 2) 1545(N) 3) 771(N) | • Application of AI-ETD to complex N-glycopeptide mixtures • Site-specific identification of isomeric glycans using oxonium ions • Creative visualisation of glycoproteomics data | Riley et al. [41] |
L6 myoblasts cell line and myotubes (rat) | 1) Orbitrap Fusion Tribrid 2) HCD (40% NCE)-pd-EThcD/CID (35% NCE) 3) DDA 4) Byonicc in Proteome Discoverer v2.1.0.801 5) TMT | 1) 2751(N) 2) Not reported 3) Not reported | • Longitudinal monitoring of site-specific N-glycosylation during myogenesis and muscle development | Blazev et al. [67] |
Brain tissue from AD APP/PS1 transgenic and wild-type models (mouse) | 1) Triple-TOF 4600 and Orbitrap Fusion Tribrid 2) SCE-HCD (30 ± 10% NCE) 3) DDA 4) pGlyco v2.0 5) Label-free (n/a) | 1) 3524(N) 2) 1493(N) 3) 722(N) | • Differences in site-specific N-glycosylation in brain tissue from AD and wild type mice | Fang et al. [68] |
Heart tissue (mouse) | 1) Orbitrap Fusion Lumos Tribrid 2) SCE-HCD (20% and 33% NCE) 3) DDA 4) GPQuestc 5) Label-free (XIC) | 1) 2925(N) 2) 1016(N) 3) 864(N) | • Site-specific changes in N-glycosylation during the loss of regenerative capacity of neonatal mouse hearts early in life | Li et al. [69] |
CHO-K1 cells with FUT8 knockout and wild type (mouse) | 1) Q-Exactive and Orbitrap Fusion Lumos Tribrid 2) HCD (32% NCE) 3) DDA 4) GPQuest v2.0 5) Label-free (spectral counting) | 1) 2634(N) 2) 459(N) 3) 243(N) | • Global alteration in the N-glycosylation including the lack of core fucosylation in CHO-K1 cells upon FUT8 disruption | Yang et al. [70] |
Arabidopsis thaliana microsomes from wild-type and β-1,2-xylosyltransferase mutant (plant) | 1) Orbitrap Fusion Lumos Tribrid 2) HCD (30% NCE)-pd-ETD or ETD 3) DDA 4) Byonic v2.6 in Proteome Discoverer v2.1 5) Label-free (XIC) | 1) 1110(N) 2) 492(N) 3) 324(N) | • Mapping of the structural heterogeneity of N-glycosylation in Arabidopsis • Absence of Xyl in xylt mutant plants | Zeng et al. [71] |
Biological sample (species) . | Approach 1) Mass spectrometer 2) MS/MS method (energy level applieda) 3) Data acquisition 4) Data analysis 5) Quantification . | Glycoproteome coverageb 1) Glycopeptides 2) Glycosites 3) Glycoproteins . | Main findings/novelty . | Reference . |
---|---|---|---|---|
Sera from PCa patients and healthy individuals (human) | 1) Q-Exactive 2) SCE-HCD (25 ± 10% NCE) 3) DDA 4) Mascot v2.4 5) Label-free (XICs) | 1) 3447(N) 2) 970(N) 3) 257(N) | • Site-specific identification and quantification of the sialylated N-glycoproteome in sera from PCa patients and healthy donors | Bollineni et al. [51] |
Breast cancer PANC1 and PKM2 knockout cell lines (human) | 1) Orbitrap Fusion Lumos Tribrid 2) EThcD (SA 33% NCE) 3) DDA 4) Byonic v2.9.38 in Proteome Discoverer v2.1 5) DiLeu | 1) 1067(N) 2) 311(N) 3) 205(N) | • Identification of site-specific N-glycosylation differences in PKM2 knockout breast cancer and parental cells | Chen et al. [39] |
Urine from patients with PCa and BPH (human) | 1) Q-Exactive HF 2) HCD (29% NCE) 3) DDA 4) Byonic v2.6.46 5) TMT | 1) 954(N)/965(O) 2) 1310(N) 3) 729(N)/160(O) | • Identification of a panel of 56 N-glycopeptides able to discriminate PCa and BPH patients | Kawahara et al. [52] |
Resting and activated primary T cells (human) | 1) LTQ-Orbitrap Elite 2) HCD (27 eV)-pd-ETD (SA 35 eV) and inclusion list-based HCD (27 eV)/ETD (SA 35 eV)/CID (35 eV) 3) DDA 4) Byonic v2.0 in Proteome Discoverer v1.4 5) Label-free (XIC) | 1) 2219(O) 2) 851(O) 3) 1045(O) | • Identification and subcellular localisation of O-GlcNAcylated proteins expressed by T cells • Found changes in the site-specific O-GlcNAcylation during T cell activation | Woo et al.[53] |
Tumour and adjacent control tissues from patients with kidney cancer, T cells and sera (human) | 1) Q-Exactive HF and Orbitrap Fusion Lumos Tribrid 2) ETD/HCD (36% NCE) 3) DDA 4) Sequestc in Proteome Discoverer v2.2. 5) Label-free (spectral counting) | 1) 4827(O) 2) 3055(O) 3) 1060(O) | • Specific enrichment of O-glycopeptides in complex clinical samples using OpeRATOR • Identified site-specific O-glycosylation differences between kidney tumour and adjacent tissues | Yang et al. [40] |
Serum from patients with (non-) crescentic IgA nephropathy and healthy donors (human) | 1) Q-Exactive HF and Orbitrap Fusion Lumos Tribrid 2) HCD, EThcD 3) DDA 4) Byonic v2.10.21 5) Dimethyl labelling | 1) 499(O) 2) 173(O) 3) 49(O) | • Identification of site-specific changes in O-glycosylation associated with crescentic IgA nephropathy | Zhang et al. [54] |
Breast cancer MCF-7 cell line and matching MCF-7 stem cells (human) | 1) Q-Exactive 2) SCE-HCD (30 ± 10% NCE) 3) DDA 4) GPSeekerc 5) Diethyl labelling | 1) 2558(N) 2) 727(N) 3) 640(N) | • Identified differences in the site-specific N-glycosylation between cancer cells and cancer stem cells | Wang et al. [55] |
HepG2 liver cancer cell line and LO2 normal liver cell line (human) | 1) Q-Exactive 2) SCE-HCD (30 ± 10% NCE) 3) DDA 4) GPSeekerc 5) Dimethyl labelling | 1) 5405(N) 2) 1218(N) 3) 1077(N) | • Structure-focused glycoproteomics study revealing site-specific N-glycan differences in HepG2 and LO2 | Xiao and Tian [56] |
Serum from healthy donors (human) | 1) Orbitrap Fusion Tribrid 2-3) ETD (SA 25% NCE)/SCE-HCD (27 ± 5% NCE)-DDA; SCE-HCD (27 ± 5% NCE)-DIA 4) Sequest TH, ptmRS in Proteome Discoverer v1.4, v2.2 and Spectronaut v11.0 with a large O-glycopeptide spectral library 5) Label-free (XIC) | 1) 269(O) 2) 253(O) 3) 159(O) | • Amongst the first DIA-based glycoproteomics studies • Deep profiling of the serum O-glycoproteome without glycopeptide enrichment | Ye et al. [57] |
Urinary exosomes from male donors (human) | 1) Orbitrap Fusion Lumos Tribrid 2) SCE-HCD (35 ± 5% NCE)-pd-EThcD (SA 15% NCE) 3) DDA 4) Protein Prospector v5.22.1, pGlyco v2.0 5) Label-free (XIC) | 1) 3144(N) 2) 604(N) 3) 378(N) | • N-glycoproteome and site-specific micro-heterogeneity of human urinary exosomes | Brown et al. [42] |
DG75 Burkitt's lymphoma cells (human) | 1) Orbitrap Fusion and Lumos Tribrid 2) HCD (25% for MS2, 35% NCE for MS3) 3) DDA 4) pGlyco v2.0 5) TMT | 1) 5367(N) 2) 855(N) 3) 528(N) | • Optimised strategy for quantitative site-specific glycopeptide analysis • Site-specific glycosylation changes upon inhibition of the fucosylation process | Fang et al. [58] |
Tumour and control tissues from high-grade serous ovarian carcinoma patients (human) | 1) Q-Exactive and Orbitrap Fusion Lumos Tribrid 2) HCD (38% NCE) 3) DDA 4) GPQuest v2.1 5) TMT | 1) 3202(N) 2) 1690(N) 3) Not reported | • Differences in site-specific N-glycosylation and abundance of glycosylation enzymes in tumour and control tissues | Hu et al. [59] |
PCa and BPH tissues (human) | 1) Q Exactive HF-X 2) HCD (35% NCE) 3) DDA 4) Byonic v2.6.46 5) TMT | 1) 7447(N)/522(O) 2) 1085(N)/308(O) 3) 540(N)/178(O) | • The use of integrated N- and O-glycome and glycoproteome profiling revealed key glycan changes associated with PCa progression | Kawahara et al. [43] |
Oncogene-transformed MCF10A breast epithelial cells (human) | 1) Orbitrap Fusion Lumos Tribrid 2) HCD (30% NCE)-pd-AI-ETD 3) DDA 4) Byonicc 5) Label-free (XIC) | 1) 2459(N) 2) 785(N) 3) 480(N) | • Oncogene-induced remodelling of the surface glycoproteome and the global cellular glycoproteome | Leung et al. [60] |
Urine from healthy donors (human) | 1) Q-Exactive 2) SCE-HCD (30 ± 10% NCE) 3) DDA 4) GPSeekerc 5) Label-free (spectral counting) | 1) 2986(N) 2) 419(N) 3) 327(N) | • Site-specific N-glycopeptide analysis of human urinary glycoproteome with putative N-glycan structural information | Shen et al. [61] |
Serum from healthy donors (human) | 1) Q-Exactive 2) HCD (33% NCE) 3) DDA 4) pMatchGlyco v1.2 (spectral library of de-N-glycopeptides) 5) Label-free (XIC) | 1) 22 677(N) 2) 1036(N) 3) 526(N) | • Site-specific N-glycosylation of serum glycoproteins with very high glycoproteome coverage | Shu et al.[50] |
ADR-resistant MCF-7/ADR cell line and matching MCF-7/ADR cancer stem cell line (human) | 1) Q-Exactive 2) SCE-HCD (30 ± 10% NCE) 3) DDA 4) GPSeekerc 5) Diethyl labelling | 1) 4016(N) 2) 1102(N) 3) 1014(N) | • Differences in site-specific N-glycosylation between cancer cells and cancer stem cells | Xu et al. [62] |
Plasma from papillary thyroid carcinoma patients and healthy donors (human) | 1) Orbitrap Fusion Lumos Tribrid 2) SCE-HCD (30 ± 10% NCE) 3) DDA 4) pGlyco v2.2.0 5) Qualitative | 1) 1644(N) 2) 862(N) 3) 369(N) | • Profiling of the medium- and low-abundant plasma glycoproteome (reported circulating glycoproteins down to 0.5 ng/ml levels) | Zhang et al. [63] |
Liver tumour and para-cancerous tissues from HCC patients (human) | 1) Orbitrap Fusion Lumos Tribrid 2) HCD (37% NCE) 3) DDA 4) GPQuest v2.0 5) TMT | 1) 4741(N) 2) 1184(N) 3) 894(N) | • Differences in site-specific N-glycosylation in HCC patients with low and high levels of α-fetoprotein | Zhao et al. [64] |
Urine from healthy male and female donors (human) | 1) Q-Exactive 2) HCD (30% NCE) 3) DDA 4) Byonicc 5) Label-free (XIC) | 1) 1300(O) 2) Not reported 3) 435(O) | • Gender differences identified in the urine O-glycoproteome | Zhao et al. [65] |
Milk from healthy donor (human) | 1) Orbitrap Fusion Lumos Tribrid 2) HCD (30% NCE)-pd-EThcD (SA 27% NCE) 3) DDA, SIM, PRM (30% NCE) 4) Byonic v3.2.0 in Proteome Discoverer v2.2 5) Label-free (XIC) | 1) 1697(N) 2) 191(N) 3) 110(N) | • Dynamics in the site-specific N-glycosylation over the lactation period | Zhu et al. [66] |
CHO-K1 subclone (mouse) | 1) Q-Exactive 2) HCD (32% NCE) 3) DDA 4) GPQuest v2.0 5) Label-free (spectral counting) | 1) 10 338(N) 2) 1162(N) 3) 530(N) | • Macro- and micro-heterogeneity of the N-glycoproteome of the lysate and culture media of CHO-K1 cells | Yang et al. [49] |
Brain tissue (mouse) | 1) Orbitrap Fusion Lumos Tribrid 2) HCD (28% NCE)-pd-AI-ETD 3) DDA 4) Byonicc 5) Qualitative | 1) 5662(N) 2) 1545(N) 3) 771(N) | • Application of AI-ETD to complex N-glycopeptide mixtures • Site-specific identification of isomeric glycans using oxonium ions • Creative visualisation of glycoproteomics data | Riley et al. [41] |
L6 myoblasts cell line and myotubes (rat) | 1) Orbitrap Fusion Tribrid 2) HCD (40% NCE)-pd-EThcD/CID (35% NCE) 3) DDA 4) Byonicc in Proteome Discoverer v2.1.0.801 5) TMT | 1) 2751(N) 2) Not reported 3) Not reported | • Longitudinal monitoring of site-specific N-glycosylation during myogenesis and muscle development | Blazev et al. [67] |
Brain tissue from AD APP/PS1 transgenic and wild-type models (mouse) | 1) Triple-TOF 4600 and Orbitrap Fusion Tribrid 2) SCE-HCD (30 ± 10% NCE) 3) DDA 4) pGlyco v2.0 5) Label-free (n/a) | 1) 3524(N) 2) 1493(N) 3) 722(N) | • Differences in site-specific N-glycosylation in brain tissue from AD and wild type mice | Fang et al. [68] |
Heart tissue (mouse) | 1) Orbitrap Fusion Lumos Tribrid 2) SCE-HCD (20% and 33% NCE) 3) DDA 4) GPQuestc 5) Label-free (XIC) | 1) 2925(N) 2) 1016(N) 3) 864(N) | • Site-specific changes in N-glycosylation during the loss of regenerative capacity of neonatal mouse hearts early in life | Li et al. [69] |
CHO-K1 cells with FUT8 knockout and wild type (mouse) | 1) Q-Exactive and Orbitrap Fusion Lumos Tribrid 2) HCD (32% NCE) 3) DDA 4) GPQuest v2.0 5) Label-free (spectral counting) | 1) 2634(N) 2) 459(N) 3) 243(N) | • Global alteration in the N-glycosylation including the lack of core fucosylation in CHO-K1 cells upon FUT8 disruption | Yang et al. [70] |
Arabidopsis thaliana microsomes from wild-type and β-1,2-xylosyltransferase mutant (plant) | 1) Orbitrap Fusion Lumos Tribrid 2) HCD (30% NCE)-pd-ETD or ETD 3) DDA 4) Byonic v2.6 in Proteome Discoverer v2.1 5) Label-free (XIC) | 1) 1110(N) 2) 492(N) 3) 324(N) | • Mapping of the structural heterogeneity of N-glycosylation in Arabidopsis • Absence of Xyl in xylt mutant plants | Zeng et al. [71] |
The table provides information of the biological samples, experimental conditions, the glycoproteome coverage and key findings of each study plotted in Figure 1.
As stated in the respective papers;
The glycoproteome coverage is here indicated as the reported unique (non-redundant) intact glycopeptides, glycosylation sites and glycoproteins. (N) refers to N-glycosylation and (O) refers to O-glycosylation;
Software version not reported.
It is widely recognised that the glycoproteome coverage depends on the complexity and dynamical range of the biological sample(s) being studied, the amount of protein starting material (often high µg/low mg protein levels), and, notably, relates directly to the accuracy and confidence by which the reported glycopeptides are identified [22]. Regardless of these constraints, the recent glycoproteomics studies have clearly reached a greater glycoproteome coverage over a relative short period indicating that the glycoproteomics technologies are rapidly maturing. We therefore decided to carefully mine these recent glycoproteomics papers to explore how the studies were technically performed, how data were collected and how the data provided new insight into the glycoproteome, Table 1.
With the exception of a single study exploring the plant N-glycoproteome [71], these recent glycoproteomics papers all investigated tissues, bodily fluids (e.g. sera) or cell lines of human origin or from rodent model systems derived under different (patho)physiological conditions. This illustrates that glycoproteomics currently is used to predominantly address glycobiological questions related to human health and disease.
The published studies either used glycoproteomics to qualitatively or quantitatively map a particular glycoproteome of interest, e.g. the serum O-glycoproteome [57] or alternatively performed comparative glycoproteomics to elucidate the site-specific N- or O-glycosylation changes between two or more patient groups e.g. patients with prostate cancer (PCa) and benign prostatic hyperplasia [52] or over a period of interest, e.g. longitudinal profiling during lactation [66] or neonatal heart development [69].
In the surveyed studies, data were most frequently collected using data-dependent acquisition (DDA) by employing higher-energy collision dissociation (HCD-), stepped collision energy (SCE)-HCD-, electron-transfer dissociation (ETD)- and/or electron-transfer/higher-energy collision dissociation (EThcD-) MS/MS acquisition strategies (discussed below). Our survey also illustrated that targeted re-isolation and orthogonal fragmentation of glycopeptide ions are increasingly performed using diagnostic oxonium ions arising from HCD-MS/MS commonly referred to as product-dependent (pd) acquisition [60,66,67,71]. Data-independent acquisition (DIA), albeit less common, has also been used to profile mixtures of glycopeptides [57,72–76]. Various search engines, most commonly the commercially available Byonic [77,78], and academic tools, amongst others, GPQuest [79] and GPSeeker [56], were used to identify glycopeptides from the obtained raw data.
Regarding the sample preparation, most studies used trypsin digestion or, less commonly, combinations of proteases e.g. trypsin and Glu-C, Lys-C or chymotrypsin, and employed some form of glycoprotein or glycopeptide enrichment prior to the LC–MS/MS detection. In fact, glycoproteomics is rarely applied directly to complex biological samples without any type of enrichment [57]. Hydrophilic interaction liquid chromatography (HILIC), lectin affinity chromatography e.g. concanavalin A (ConA), Vicia villosa agglutinin (VVA), Jacalin, wheat germ agglutinin (WGA), Ricinus communis agglutinin (RCA), and less common strategies including mixed anion-exchange chromatography, TiO2, and chemical and chemoenzymatic methods, were used to enrich glycopeptides. HILIC-based solid phase extraction (SPE) represents one of the least biased glycopeptide enrichment methods given the considerable (local) hydrophilicity shared by most N-glycopeptides [80], but may not quantitatively capture the less hydrophilic (shorter) O-glycopeptides and truncated N-glycopeptides [81–83]. To increase the glycoproteome coverage, some studies used multi-step enrichment protocols including sequential glycoprotein enrichment using Jacalin affinity chromatography followed by glycopeptide enrichment using HILIC [54] or employed sequential glycopeptide enrichment using HILIC and multi-lectin affinity (ConA, WGA and RCA 120) [39], while others applied parallel glycopeptide enrichment using ConA and mixed mode strong anion exchange-HILIC [60], or employed glycopeptide prefractionation using high-pH reversed-phase LC after the glycopeptide enrichment step [42].
The enriched glycopeptides were, as expected, most commonly separated and detected using reversed-phase nano-scale LC–MS/MS on various Orbitrap platforms including the Q-Exactive or Tribrid systems. Only few studies used other online separation techniques including HILIC-LC, which reportedly was able to separate sialoglycopeptide isomers in complex mixtures [56]. Although frequently used in proteomics, we did not identify any recent glycoproteomics studies using Q-ToF instrumentation or indeed the more recent timsToF for large-scale glycopeptide detection.
While the pre-2016 glycoproteomics studies were found to be rather qualitative in their approaches by simply reporting lists of identified glycopeptides as reviewed [22], our survey of the post-2018 glycoproteomics papers shows that glycopeptide quantification is now commonly applied. Approaches include label-free quantification by XIC or spectral counting [50,65], isotopic dimethyl [54] or diethyl [55] labelling as well as TMT [52,64] and DiLeu [39] labelling, which collectively enable different types of glycopeptide quantitation strategies all regularly used in the field. While glycoproteome mapping studies typically quantify the relative abundance of all glycoforms observed at each glycosylation site to establish a stoichiometric site-specific glycoprofile, comparative glycoproteomics studies often compare the relative quantity of individual glycopeptides or the glycoprofile at each site between conditions. The glycosylation site occupancy levels are unfortunately still rarely reported in glycoproteomics given the loss of this information in studies using glycopeptide enrichment.
In addition to the above-mentioned glycoproteomics studies performed on glycoprotein samples from unaltered biological sources, the SimpleCell technology, which manipulates cells to homogenously express truncated O-GalNAc glycans (Tn and sialyl-Tn (sTn)) on mucins and mucin-like domains due to the inactivation of the chaperone COSMC, represents an important innovation relevant for glycoproteomics [84–88]. The SimpleCell technology has not only opened exciting avenues to study the challenging mucin glycobiology including how O-glycans impact proteolytic cleavage events, the substrate-specificity of GalNAc-transferases and the role of O-GalNAc glycans in cell–cell interactions and cell differentiation [89–93], but has also improved our ability to profile the critically underexplored O-glycoproteome. Glycoproteomics efforts using the SimpleCell technology to study mucin-type glycosylation in different human and other mammalian cell lines have led to an impressive database (http://www.glycoproteomics.somee.com/), comprising a total of 18 397 unique O-glycopeptides spanning 10 197 O-glycan sites from 2039 O-glycoproteins within the human O-glycoproteome supported by glycopeptide MS/MS evidence.
Innovations in structure-focused glycoproteomics
The following section summarises recent innovations within the emerging field of structure-focused glycoproteomics including improvements in the sample preparation, glycopeptide separation, and LC–MS/MS-based detection. The discussed innovations are summarised in Figure 2. Advances in glycopeptide data analysis and novel search strategies including, amongst several others, MSFragger-Glyco and O-Pair Search [94,95] will not be covered in this review. Readers are instead referred to recent review literature on this topic [96,97].
Technical innovations supporting the advancement of structure-focused glycoproteomics.
The advances and the structural information they provide have been separated into sample preparation including protein digestion, glycoepitope labelling and enrichment, online separation and LC–MS/MS detection of intact N-glycopeptides (yellow) and O-glycopeptides (blue). See key for symbol and linkage nomenclature [105].
The advances and the structural information they provide have been separated into sample preparation including protein digestion, glycoepitope labelling and enrichment, online separation and LC–MS/MS detection of intact N-glycopeptides (yellow) and O-glycopeptides (blue). See key for symbol and linkage nomenclature [105].
Innovations in sample preparation
New glycoproteomics-friendly proteases
Bacterial proteases capable of cleaving mucin-type O-glycan motifs and thereby aiding the profiling of densely glycosylated mucin domains and non-mucin O-glycoproteins often left undigested with conventional proteases are rapidly emerging. The commercial OpeRATOR originally derived from the mucin-degrading Akkermansia muciniphila, and the secreted protease of C1 esterase inhibitor (StcE) from Escherichia coli represent prominent examples of recently reported mucinases [98–100].
OpeRATOR cleaves N-terminal to O-glycosylated Ser/Thr in Pro-rich sequences thereby generating peptides carrying N-terminal O-glycosylation for easy site assignment [40,101]. However, site assignment based on the OpeRATOR cleavage pattern alone or with HCD-MS/MS is not ideal given that more than half of all OpeRATOR-generated glycopeptides reportedly contain multiple O-glycans [46] including adjacent O-glycan sites prone to be missed by OpeRATOR [94]. In contrast, EThcD-MS/MS provides, in favourable cases, unambiguous site identification of OpeRATOR-generated O-glycopeptides [46]. Studies exploring the substrate requirements of OpeRATOR indicated that it 1) does not cleave non-glycosylated Ser/Thr [101], 2) preferentially cleaves core 1-type O-glycans and, albeit at a much slower rate, core 3-type O-glycans [40,102,103], and 3) shows poor or no activity against sialylated O-glycans and core 2-type structures [103].
The mucin-selective StcE cleaves the Xxx-Ser/Thr peptide bonds in consensus sequences containing Ser/Thr (P2) — Xxx (P1) — Ser/Thr (P1′) motifs where Xxx is any amino acid residue [104]. Unlike P1 and P1′, glycosylation of Ser/Thr in the P2 position is a strict requirement for StcE activity [104]. StcE displays activity towards motifs carrying various O-glycans including the di-sialyl-T antigens (di-sT) and core 2-type glycopeptide structures and thus complements OpeRATOR as a tool for mucin glycoprofiling.
Other bacterial serine and zinc metalloproteinases have also expanded the toolkit for O-glycoproteome analysis [99]. Pic from E. coli and BT4244 from Bacteroides thetaiotaomicron preferentially cleave N-terminal to glycosylated Ser/Thr (P1′) carrying truncated Tn/T or Tn/T/sT O-glycans, respectively. In contrast, the M60-like protease AM0627 from A. muciniphila preferentially cleaves peptide bonds between two adjacent O-glycosylated Ser/Thr residues [99]. Finally, AM0908 and AM1514 are also M60-like proteases expressed by A. muciniphila that were found to cleave N-terminal to glycosylated Ser/Thr (P1′) with a preference for unoccupied Ser/Thr motifs in the P1 position [99].
While the above mucinases specifically act on glycoproteins containing mucin domains, ZmpC from Streptococcus pneumoniae, which reportedly cleaves peptide bonds in the vicinity of O-glycosylation sites, acts on both mucin and non-mucin proteins [99]. The O-glycopeptidase CpaA expressed by several Acinetobacter strains cleaves N-terminal to glycosylated Ser/Thr in Pro (P1) — Ser/Thr (P1′) motifs irrespective of the sialylation status of the P1′-localised O-glycan substrate on both mucin and non-mucin O-glycoproteins [100].
While the clear substrate preference of most of the newly discovered proteases including OpeRATOR and StcE limits their use for global (unbiased) glycoproteome profiling, they have already proven valuable in studies aiming to map particular subsets of the glycoproteome. Notably, none of the proteases appears to be fully specific for a single glycan structure or glycoepitope emphasizing the importance of downstream glycan characterisation if used for structure-specific glycoproteomics workflows. Collectively, these new proteases have a considerable potential of advancing the field of glycoproteomics; their advent promises to accelerate particularly the analysis of the under-studied O-glycoproteome in the immediate future.
Glycoepitope labelling
Chemoenzymatic and metabolic labelling methods enable visualisation and detection of glycoepitopes of interest in a cellular or tissue setting. These technologies that have been developed over the past two decades are gaining popularity and have a considerable potential for advancing structure-focused glycoproteomics [106]. While metabolic labelling approaches aim to metabolically incorporate functional and easy-to-detect tags into newly synthesised glycoproteins (or other glycoconjugates) in living cellular systems, chemoenzymatic labelling approaches focus on incorporating such probes after glycoprotein formation without making use of the metabolic pathways [106,107].
Recognised limitations of metabolic labelling include the potential of adversely perturbating the living system being explored in particular if artificial metabolic precursors are supplemented at relatively high (micromolar) concentrations [108]. Furthermore, incomplete and unspecific incorporation of the supplemented probes are also commonly experienced [109,110] including the off-target labelling of cysteine residues of some metabolic labelling reagents [111–113]. However, a more specific labelling approach designed to target O-GalNAcylated proteins was recently reported [114]. In that study, precision labelling of O-GalNAcylated proteins was performed by supplementing membrane-permissive caged N-(S)-azidopropionylgalactosamine (GalNAzMe)-1-phosphate probes to cells transfected with a pyrophosphorylase AGX1 mutant in order to biosynthesize uridine diphosphate (UDP)-GalNAzMe donor molecules for downstream GalNAzMe-based labelling of target O-glycoproteins. UDP-GalNAzMe-mediated labelling of O-GalNAc proteins can be further enhanced by transfecting cells with an engineered ‘bump-and-hole' GalNAc-T2 double mutant that contains an enlarged active site and exhibits increased activity for UDP-GalNAzMe as compared with UDP-GalNAc [114–117].
Diverse chemoenzymatic labelling methods that employ various glycosyltransferases and unnatural monosaccharide analogues have been developed over the past decade [107,118–121]. For example, the bacterial β-(1,4)-N-acetylgalactosaminyltransferase CgtA from Campylobacter jejuni that targets NeuAcα2,3Gal-R substrates has been used to label HEK293T cell surface N- and O-glycoproteins with N-azidoacetylgalactosamine (GalNAz), an unnatural analogue of GalNAc [120]. GalNAz incorporation enables convenient click chemistry-driven biotinylation, which, in turn, facilitates the enrichment of labelled targets using streptavidin agarose or detection by western blot, fluorescence microscopy, or LC–MS/MS. Recently, CgtA was used to label NeuAcα2,3Gal-containing glycopeptides derived from human serum glycoproteins with GalNAz [122]. The informative GalNAz reporter ions at m/z 245.09 generated upon SCE-HCD-MS/MS enabled the detection and sialyl-linkage annotation of 1236 N-glycopeptides from 161 human serum glycoproteins. In another study, cell surface-located sT O-glycoproteins expressed by the MCF7 breast cancer and HT29 colon cancer cell lines were labelled, captured and detected using cytidine-5′-monophosphate-NeuAc-biotin and recombinant human ST6GalNAc-IV expressed in baculovirus insect cells [123].
Other approaches have specifically targeted the disease-relevant O-GalNAc glycoepitope (Tn antigen) [124,125]. For example, a study showed that α-linked GalNAc (Tn) and β-linked GlcNAc O-glycopeptides can be discriminated by selective oxidation of the O-GalNAc residues by galactose oxidase from Dactylium dendroides facilitating Tn glycopeptide enrichment using hydrazide solid-phase capture and methoxyamine (CH3ONH2) mediated release [126]. This selective labelling strategy adds a 27 Da mass tag (CH3-O-N=R with an associated water loss (-H2O) upon derivatisation) to all Tn glycopeptides, which, in turn, enables their detection using diagnostic oxonium ions at m/z 171.0765 (144.0656 + 27.0109) and m/z 195.0764 (168.0655 + 27.0109) upon HCD-MS/MS. A similar oxidation-based labelling approach was recently used to enrich O-glycopeptides carrying core 1-type O-glycans including Tn, sTn, T, sT, and di-sT antigens [127]. In that enrichment study, which also employed hydrazide capture and methoxyamine release, tryptic glycopeptides derived from human serum glycoproteins were treated with PNGase F and chemically desialylated, before the exposed Gal and GalNAc residues were oxidised by D. dendroides galactose oxidase.
OpeRATOR was also recently used for the digestion and release of resin-conjugated labelled Tn O-glycoproteins expressed by human Jurkat cells and in pancreatic cancer sera using a method called EXoO-Tn [128]. In short, the EXoO-Tn method involves: (i) trypsinization and desialylation of the extracted proteins, (ii) Tn glycopeptide enrichment using VVA lectin, (iii) isotopic labelling of the Tn glycopeptides with heavy Gal(13C6) by the recombinant human C1GalT1 glycosyltransferase in complex with its chaperone C1GalT1C1 using an artificial UDP-Gal(13C6) nucleotide sugar donor, (iv) digestion by OpeRATOR, and finally v) LC–MS/MS-based detection of the Gal(13C6)-capped Tn glycopeptides using diagnostic oxonium ions at m/z 372.2 (13C6 variant of HexHexNAc) that can discriminate labelled Tn glycopeptides from endogenous T glycopeptides. This approach enabled the identification of 1011 Tn glycopeptides from 480 glycoproteins expressed by Jurkat cells while considerably fewer Tn glycoproteins were identified from pancreatic cancer sera.
Furthermore, selective isotopic labelling and quantification of O-GlcNAcylated glycoproteins was recently achieved by employing UDP-GalNAz, a recombinant mutant of bovine β1,4-galactosyltransferase 1 (GalT1Y289L) and the isotope-tagged cleavable linker (isoTCL) [129] or an isotope-coded photocleavable probe [130]. Both strategies facilitate selective tagging, enrichment, and isotopic labelling of O-GlcNAcylated glycoproteins. Others have used wild-type bovine GalT1 and natural UDP-Gal to label β-linked GlcNAc O-glycopeptides derived from glycoproteins expressed by MCF-7 cells, which were then selectively oxidised by D. dendroides galactose oxidase ahead of enrichment by hydrazide capture [131]. The resulting O-GlcNAcGal glycopeptides could be distinguished from the endogenous O-GalNAcGal glycopeptides by their different oxonium ion ratios (discussed below).
Finally, sialyl linkage information, important to understand many processes related to sialoglycobiology [132,133], can be obtained at the glycoproteome-wide level by applying linkage-selective labelling approaches. For example, α2,6-linked sialic acid residues can be selectively labelled using ethanol creating a 28 Da mass tag by the esterification with C2H5OH (and an associated water loss) while α2,3-sialic acid residues can be selectively labelled using ethylenediamine generating a 42 Da mass tag by the amidation with C2H4(NH2)2 (and an associated water loss) of immobilised glycoproteins [134]. In that study, the derivatised sialoglycoproteins were characterised using on-bead digestion and HILIC-SPE glycopeptide enrichment before the labelled sialoglycopeptide isomers were detected by the chemically unique oxonium ions arising from the derivatised α2,3-(m/z 334.1/316.1) and α2,6-linked (m/z 302.1) sialoglycopeptides upon HCD-MS/MS.
Enrichment strategies
The enrichment methods used in glycoproteomics have been comprehensively reviewed [135]. While some enrichment strategies including HILIC-SPE aim to capture the entire complement of glycopeptides to provide a quantitative view of the glycoproteome [80], structure-specific enrichment approaches aim to capture only a subset of glycopeptides carrying specific glycan structures. In the latter cases, information of the glycan structure can either be inferred from the capture, or, more appropriately if used for structure-focused glycoproteomics, be supported by direct mass spectral evidence of the captured glycopeptides/glycoproteins using downstream detection.
Natural and genetically engineered lectins form a diverse class of glycan-binding proteins that can be used to enrich subsets of glycoepitopes carried by glycoproteins, glycopeptides or other glycoconjugates. For example, BTL from Bryothamnion triquetrum [136], PhoSL from Pholiota squarrosa [137,138] and SL2-1 [139] from Streptomyces rapamycinicus are known to recognise α1,6- (core) fucosylated N-glycans. Thus, binding to any of these lectins indicates that the captured glycoproteins carry core fucosylation, an observation that ideally should be supported by MS or other types of orthogonal evidence. Lectins that recognise other N-glycan structural features include Phaseolus vulgaris erythroagglutinin (PHA-E) and Calystegia sepium lectin (Calsepa) that bind bisecting β1,4-GlcNAc structures, Maackia amurensis lectin (MAL) that recognises α2,3-sialic acid residues and Sambucus nigra agglutinin (SNA) and Sambucus sieboldiana agglutinin (SSA) that both bind α2,6-sialic acid residues [140–142]. Furthermore, the recombinant human macrophage galactose-like C-type lectin (MGL)-Fc chimeric protein was recently used to enrich Tn glycopeptides from ovarian cancer cells and tissues [143]. Finally, engineered lectins displaying enhanced or tailored recognition of particular glycan epitopes including, amongst others, core fucose recognised by the recombinant N224Q mutant of the Aleuria aurantia lectin (AAL), and O-GlcNAc recognised by the Agrocybe aegerita lectin (AANL) mutant AANL6 have the potential of expanding the enrichment tool box for structure-focused glycoproteomics [144–149].
Catalytically inactive hydrolases have also proven valuable for structure-specific glycopeptide enrichment. Recently, a mutant form of the Clostridium perfringens O-GlcNAcase (CpOGAD298N), which retained the binding but not the hydrolytic properties towards O-GlcNAc epitopes [150], was used to enrich β-linked GlcNAcylated O-glycoproteins from HeLa cell lysates and Drosophila embryo lysates [151]. Furthermore, the commercially available catalytically inactive mutant form of the OpeRATOR OgpAH205A/E206A (GlycOCATCH, Genovis AB., Sweden) was shown to bind O-glycopeptides suggesting that this pseudo-protease may serve as another enrichment tool for mucin-type glycosylation [103]. The catalytically-inactive StcEE447D conjugated to beads was also able to enrich MUC16 from an OVCAR-3 cell line [104]. Both the StcEE447D mutant and a BT4244E575A mutant form of the B. thetaiotaomicron BT4244 metalloproteinase (discussed above) were found to be valuable for the detection of mucins using western blotting and for their direct visualisation in tissues [99].
Furthermore, recombinant antibodies displaying affinity towards glycoepitopes are relevant tools for structure-focused glycoproteomics [152–157]. For example, the recombinant mouse anti-Tn IgM ReBaGs6 was used to immunoprecipitate Tn-containing glycoproteins from Tn-positive Colo205 SimpleCells lysates for downstream mass spectrometry identification [154]. Smart anti-glycan reagents (SAGRs) represent another exciting type of emerging affinity tools [156,157]. SAGRs are chimeric proteins comprising murine IgG Fc and the single-chain variable lymphocyte receptors (VLRs) secreted by Petromyzon marinus VLRB+ lymphocytes (B-like cell) upon immunisation with different glycoconjugates, cells and tissues of interest. This innovative technology enables SAGRs to recognise glycan determinants with considerable affinity. Finally, the emergence of artificial glycan-directed antibodies generated by glycopeptide imprinting represents a new interesting avenue that may contribute to structure-focused glycoproteomics in the future [158,159].
Improvements in glycopeptide separation
Glycopeptide separation prior to mass spectrometry detection, commonly performed using reversed-phase nano-scale LC, is an important component of most, if not all, glycoproteomics experiments. Current strategies for the isomeric separation of glycopeptides were recently reviewed [160].
The glycopeptide retention behaviour on reversed-phase LC columns, which is governed primarily by the peptide hydrophobicity, and, secondly, by the properties of the conjugated glycan(s) i.e. size, shape and monosaccharide composition particularly the sialic acid content, was recently studied to generate better prediction tools utilising retention time as orthogonal information for glycopeptide identification and characterisation [161–163]. While typically not allowing for separation of glycopeptide isomers, a recent study found that reversed-phase LC may separate isomeric N- and O-glycopeptides at elevated temperatures [164].
Whilst commonly used for glycomics [165,166], porous graphitised carbon (PGC) LC is still under-utilised in glycoproteomics despite its potential for isomeric separation of relatively small N- and O-glycopeptides generated by nonspecific proteolysis [167–170] or by trypsin [171]. Larger glycopeptides bind irreversibly to PGC LC columns [172,173]. Similar to the PGC-based separation of sialoglycans [174–176], PGC LC reportedly retains α2,3-sialoglycopeptides better than isomeric α2,6-sialoglycopeptides [177], a separation feature that appears to be temperature-dependent [178]. Alternatively, HILIC LC can be used for the separation of intact glycopeptide isomers [56,179] and glycoproteins [180]. HILIC LC coupled with multiple reaction monitoring was used to profile isomeric α2,3- and α2,6-sialoglycopeptides derived from human prostate-specific antigen [181], the latter isomer displaying a greater LC retention [182]. Others have used HILIC LC to separate topological N-glycopeptide isomers of human hemopexin in order to assign the position of the fucose residues on N-glycans (core versus outer arm) and to elucidate the glycan arm position of α2,6-sialic acid residues [183]. A study comparing different HILIC stationary phases recently reported that HALO penta-HILIC provides better isomeric separation of N-glycopeptides than the more widely used ZIC-HILIC phase [184].
Furthermore, isomeric α2,3- and α2,6-sialoglycopeptides possess different electrophoretic mobilities facilitating their separation using capillary electrophoresis (CE)-ESI-MS [185,186]. Recent improvements in the CE hyphenation methods have increased the sensitivity and reproducibility of glycopeptide analysis by CE-MS [187,188] and opened interesting avenues for CE-based glycoproteomics [189]. In fact, CE-MS even showed the potential to separate intact glycoproteoforms [190–193]. LC and CE methods for intact glycoprotein separation were recently reviewed [194].
Aiming to benefit from their orthogonal separation properties, studies employing multi-dimensional separation of glycopeptides are emerging. For example, an online two-dimensional PGC and reversed-phase LC setup was used to separate glycopeptides from simple mixtures [195–198]. Another study used offline reversed-phase LC prefractionation coupled with CE to profile N-glycopeptides generated from human α-1-acid glycoprotein [199]. The potential of multi-dimensional approaches to separate isomeric N-glycopeptides was demonstrated using offline HILIC LC coupled with PGC LC to fractionate a simple mixture of bovine RNase B glycopeptides [200] as well as offline reversed-phase LC prefractionation coupled with online HALO penta-HILIC LC to separate glycopeptides from HepG2 and LO2 cell lysates [56].
Finally, ion mobility spectrometry (IMS), a gas-phase alternative to the chromatographic and electrophoretic separation of glycopeptides, is an emerging separation technique in structure-focused glycoproteomics. When coupled with mass spectrometry, IMS-MS is evidently capable of separating isomeric glycopeptide ions based on their differential conformational size and shape resulting in different drift times [201,202]. Direct IMS-MS analysis showed the potential to separate α-GlcNAc and α-GalNAc-containing glycopeptide epimers [203,204], isomeric O- and N-glycopeptides differing only in the position of the glycosylation site [204–206], α- and β-GalNAc anomeric glycopeptides [204], and epimeric O-glycopeptides carrying either Gal or Glc residues or alternative Glc or Man residues on the same site [207]. The detection of CID-MS/MS-generated B3-type ions (m/z 657.24, corresponding to NeuAcα2,3Galβ1,4GlcNAc or NeuAcα2,6Galβ1,4GlcNAc fragments) by IMS-MS enabled the discrimination of isomeric N- and O-sialoglycopeptides; these two types of B3 ions display NeuAc linkage-specific drift times [206,208]. IMS-MS has also demonstrated a potential to separate intact glycoproteins [209], an observation that deserves to be explored further as the field moves towards top-down glycoprotein analysis.
Improvements in LC–MS/MS detection
Glycoproteomics has recently witnessed many key advances in mass spectrometry-based analysis of glycopeptides [25,32,34]. As discussed below, most advances relevant to structure-focused glycoproteomics involve improvements in the dissociation methods and intelligent glycopeptide-centric LC–MS/MS data acquisition methods, both of which have led to a greater volume and quality of the spectral data generated from glycoproteome profiling experiments.
Hybrid-type fragmentation methods yield high content glycopeptide spectra
Hybrid-type fragmentation methods that combine multiple dissociation methods include EThcD-MS/MS [210–213] widely used in glycoproteomics (see Table 1 for examples) and the less common AI-ETD-MS/MS method restricted to few specialist labs [41,60]. Fragmentation methods that employ multiple energy schemes to target more complete analyte dissociation most prominently the SCE-HCD-MS/MS method are also frequently used in large-scale glycopeptide profiling studies [37,51,56,214,215]. The benefits of hybrid-type fragmentation methods for glycoproteomics were recently summarised [216].
Confident and detailed glycopeptide characterisation from MS/MS spectral evidence rely on the presence of oxonium ions (internal glycan fragments), B-type ions (non-reducing end glycan fragments), Y-type ions (intact glycopeptide fragments), and b-/y- and c/z-type ions (peptide fragments) [217,218]. The peptide fragments may be present with and without the conjugated glycan moieties. These different ion types inform on different parts of the glycopeptide molecule. The SCE-HCD- and EThcD-MS/MS methods have been purpose-build and tailored to intact glycopeptide analysis and are now widely recognised to be well-suited for N- and O-glycopeptide analysis at scale [37,54,219]. Relative to the conventional dissociation methods including resonance activated (ion trap) or beam-type (Q-ToF) CID-, HCD- and ETD-MS/MS, the SCE-HCD- and EThcD-MS/MS fragmentation methods yield more informative and content-rich fragment mass spectra containing both peptide and glycan fragments required for unambiguous or more complete glycopeptide spectral assignment. Glycopeptide-friendly variants of the more common fragmentation methods available on most instruments have also been developed and include the low-energy beam-type CID- and HCD-MS/MS methods that generate spectra rich in oxonium and B-type ions useful in structure-focused glycopeptide analysis (discussed below).
Finally, ultraviolet photodissociation (UVPD)-MS/MS, which utilises high-energy laser photons for the activation and fragmentation of ions represents a less conventional fragmentation method of glycopeptides [220–222]. HCD-triggered-UVPD-MS/MS and EThcD-MS/MS were recently tested for their ability to locate O-GlcNAc sites on peptides from recombinant human and Drosophila melanogaster glycoproteins [223]. In that study, 193 nm UVPD-MS/MS showed a considerable potential for accurate site allocation by generating a range of O-GlcNAc-containing peptide backbone fragments (a-/x- and c-/z-type ions) and in fact outperformed EThcD-MS/MS with respect to the spectral quality and the sequence coverage of the identified glycopeptides [223]. In another study, high-energy UV (213 nm) combined with low-energy infrared photodissociation (10.6 µm) (HiLoPD) was applied to an O-GalNAc-peptide from human erythropoietin [224]. The authors demonstrated that HiLoPD showed a higher backbone ion retention efficiency of GalNAc than UVPD-MS/MS. Despite these promising reports, future experiments are required to evaluate the potential of UVPD-MS/MS for large-scale glycopeptide analysis.
Diagnostic ions for glycoform-specific assignment
Similar to the use of diagnostic ions to identify glycan fine structures in LC–MS/MS-based glycomics [175,176,225], diagnostic ions are important for the glycoform-specific assignment of glycopeptide spectra. Diagnostic ions have been used to discriminate isomeric glycopeptide features such as O-GlcNAc versus O-GalNAc glycopeptides [226,227], sialyl linkage of both N- and O-sialoglycopeptides [228], core versus antenna fucosylated N-glycopeptides [229–231], antenna versus bisecting GlcNAcylated N-glycopeptides [230,232], and glycopeptides containing various other N- and O-glycan motifs [231,233–235]. Some examples of diagnostic ions for the glycoform-specific assignment of glycopeptides are briefly discussed below.
The ratio between the signal intensities of two groups of two HexNAc-specific oxonium ions i.e. (m/z 138 + m/z 168) and (m/z 126 + m/z 144), referred as the GlcNAc/GalNAc ratio, generated using ion trap CID- or HCD-MS/MS, is reportedly able to discriminate between GlcNAc- and GalNAc-containing N- and O-glycopeptides [226,227]. In support, multiple glycoproteomics studies have applied this approach to distinguish extended α-linked GalNAc and β-linked GlcNAc O-glycopeptides [36,131,236,237], glycopeptides carrying either a single O-GlcNAc or O-GalNAc residue [238] or, more crudely, to simply classify O- and N-glycan types on peptides [41,197]. Furthermore, α2,3- and α2,6-sialyl linkage information was obtained by determining the signal intensity ratio of the HCD-MS/MS-derived oxonium ions specific for LacNAc (m/z 204 + m/z 366) and NeuAc (m/z 274 + m/z 292) following an elaborate signal normalisation process that considers both the glycan composition and the LacNAc structure (Galβ1,3GalNAc or Galβ1,4GlcNAc) [228].
Different diagnostic ion signal intensity ratios were also reported to inform on structural features of the glycan antenna. For example, core (α1,6-) and antenna (α1,2/3/4-) fucosylation can be differentiated based on the intensity ratios of B- or Y-type ions generated upon low-energy beam-type CID-MS/MS as demonstrated on glycopeptides derived from human prostate specific antigen and α-1-acid glycoprotein [229]. Moreover, diagnostic ions for HexNAc1Hex1Fuc1 (m/z 512) and HexNAc1Hex1Fuc1NeuAc1 (m/z 803) were reported to evidence antenna-localised fucosylation [230]. The authors noted that these diagnostic ions were also present, albeit at low levels, in spectra arising from core fucosylated glycopeptides [229,230]. Conversely, core fucosylation can reportedly be identified based on characteristic Y1-type ions (peptide + GlcNAc1Fuc1) generated at relatively high dissociation energies [229].
Furthermore, characteristic Y-type ions (i.e. peptide + HexNAc3Hex1 and peptide + HexNAc3Hex1Fuc1) generated via low-energy HCD-MS/MS can be used to indicate bisecting GlcNAc-containing glycopeptides as demonstrated for different mammalian model glycoproteins and glycoproteins from rat renal tissues [230,232]. B- and Y-type ions generated upon EThcD-MS/MS carried out with low supplemental energy were also found to be useful to establish the glycan topology as was shown for extended O-sialoglycopeptides obtained from human urine [234,235]. Diagnostic B-type ions generated by low-energy HCD-MS/MS were also found to reveal various structural motifs of the glycan antenna of glycopeptides derived from recombinant human glycoproteins and HEK293 secretomes including outer arm N,N-diacetyllactosamine (LacdiNAc), sialylated LacdiNAc, fucosylated LacdiNAc, sialylated GalNAc, and polyLacNAc amongst other motifs [231]. Furthermore, a Hex2HexNAc1 (m/z 528.19) to Hex1HexNAc1 (m/z 366.14) intensity ratio above 0.3 upon SCE-HCD-MS/MS was recently shown to indicate the presence of α-galactosyl epitopes (i.e. Galα1,3Galβ1,4GlcNAc-R) carried by complex-type N-glycopeptides as demonstrated for mice spleen and lung tissues and plasma [233]. The absence of a particular Y-type ion (peptide + HexNAc2Hex4) characteristic for hybrid- or oligomannosidic-type N-glycans represents complementary evidence of α-galactosylation. Finally, negative-mode detection in LC–MS/MS-based glycopeptide analysis may enhance the information of the fine structure of peptide-linked glycans by generating diverse glycan fragment ions upon ion trap CID-, HCD-MS/MS and CID-MS3 [239].
Lastly, albeit not the focus of this review, it is worthwhile to highlight just a few exciting advances of glycopeptide-search algorithms and machine learning tools that have opened interesting avenues for more efficient classification of N- and O-glycopeptides and their isomeric structural features [96,97]. Developments in such informatics areas have for example enabled the crude classification of N- and O-glycopeptides from human serum based on the intensities of various oxonium ions generated by HCD-MS/MS [240]. Machine learning-based classification was also applied to discriminate core- versus antenna-fucosylation using B- and Y-type ions generated by ion trap CID-MS/MS as demonstrated for N-glycopeptides from human plasma proteins [241,242]. Finally, a new computational algorithm named O-GlycoProteome Analyzer was shown to be able to identify mucin-type O-glycosylation (e.g. core 1 versus core 2) in human urine and plasma using the GlcNAc/GalNAc ratio (see above) and other B- and Y-type diagnostic ions generated by HCD-MS/MS [243].
Collectively, the above studies reporting on diverse improvements in the LC–MS/MS-based detection of glycopeptides illustrate the immense potential of using advanced fragmentation methods and diagnostic ions in innovative ways to elucidate glycan fine structural features at the glycopeptide level. It is clear that further optimisation of the dissociation methods including the applied energy setups, activation times and triggering mode(s) as well as comprehensive benchmarking and standardisation across instruments and informatics improvements are still required to enhance the utility of diagnostic ions for large-scale glycopeptide characterisation in future glycoproteomics experiments. Efforts to reduce the co-isolation of glycopeptide precursors and attempts to limit the in-source fragmentation of labile glycopeptide features, both recognised as significant analytical challenges in glycoproteomics [58,227,244,245], are examples of other important areas that should receive attention as the glycoproteomics methods develop.
Conclusion
This review has compiled and surveyed a considerable body of recent literature, some papers literally ‘hot-off-the-press', employing quantitative glycoproteomics to study the complexity and dynamics of the human and other mammalian glycoproteomes with unprecedented sensitivity and coverage. The increasing publication activity and quality in this area of science from an expanding group of research laboratories around the world is highly encouraging and indicates that the glycoproteomics technologies are reaching (and are actively being applied by) the community. As abundantly exemplified in Table 1, quantitative glycoproteomics is now sufficiently mature to tackle a range of complex biological questions from new angles. Human health and disease appear to be principle application areas of the recent glycoproteomics studies, but given the ubiquitous nature of glycoproteins, the technologies discussed herein can readily or with some adjustments be applied more widely to other biological areas of life science.
As the sample handling, instrumentation and bioinformatics solutions for glycopeptide analysis continue to evolve, it is expected that studies achieving an even greater glycoproteome coverage will soon emerge allowing us to access the low-to-medium abundance glycoproteins in complex biological specimens. The innovative use of new mucin-friendly proteases to generate O-glycopeptides with favourable MS properties is particularly exciting since such novel approaches will dramatically benefit studies seeking to improve the coverage of the hitherto ‘dark' mucin O-glycoproteome.
While such advances promise to reveal many of the secrets hidden in the low abundance glycoproteome, it is crucial that the technology improvements also eventually will open avenues for glycopeptide profiling directly from biological samples without the need for glycopeptide enrichment or other types of sample perturbation. Methods compatible with minimal sample handling and processing are required to generate workflows with higher sample throughput and reproducibility, and, importantly, to establish a fully quantitative view of the glycoproteome including information of the site occupancy currently not captured with most existing methods.
Moreover, ensuring high identification accuracy of the reported glycopeptides is imperative. Automated identification of intact glycopeptides is a comparably complex and highly error-prone process relative to the analyte identification in most other areas of proteomics [25,45,97]. Robust probability-based false discovery rate-driven search engines and algorithms for intact glycopeptides are still required to propel the field forward. Adding to the informatics advances needed to achieve such important improvements, ‘glycomics-assisted glycoproteomics' represents a complementary strategy that can guide the glycoproteomics data analysis to lower the glycopeptide false discovery rates while simultaneously addressing the complexity of the glycome [35,43,67,246,247].
Another significant limitation, indeed the focus of this review, is that current glycoproteomics methods still only allow assignment of generic monosaccharide compositions of glycans rather than the glycan fine structural features to discrete protein sites. It is widely recognised that knowledge of the glycan fine structures is a prerequisite to understand the effect of glycosylation on protein function. Technologies that can uncover the intricate details of the heterogenous glycan structures decorating each site of proteins in mixtures, herein referred to as ‘structure-focused glycoproteomics' have remained a holy grail of glycobiology [248].
As discussed in this review, the field has recently witnessed many exciting innovations supporting the advancement of structure-focused glycoproteomics. Importantly, these developments target different parts of the glycoproteomics workflow spanning sample preparation including new glycoform-selective proteases and chemoenzymatic labelling and enrichment methods, over innovations within the separation sciences involving both improved LC, CE and IMS approaches to improvements in diverse areas of mass spectrometry-based detection. The emerging O-glycan-specific mucinases and chemoenzymatic labelling approaches, and advances in the acquisition and fragmentation schemes yielding high content fragment mass spectra appear to dominate the innovation space and are likely to contribute significantly to glycoform-focused O- and N-glycoproteome profiling in the near future. We surprisingly did not identify any large-scale glycopeptide studies using sequential exoglycosidase treatment to obtain information of the terminal glycoepitopes, a strategy often used in glycomics [249], highlighting just one example serving to illustrate that all methodological avenues are still not explored in the context of structure-focused glycoproteomics.
Notwithstanding these exciting developments, structure-focused glycoproteomics remains in its infancy and the field still faces significant challenges ahead. Similar to the existing composition-focused glycopeptide mapping technologies, structure-focused glycoproteomics should ideally move towards unbiased profiling and quantitation of the glycoproteome. This is particularly important when the technology is applied in ‘discovery-mode' as opposed to targeted glycoproteomics studies where a specific glycopeptide form or a class of glycoepitopes are profiled to address a specific research question. Most innovations discussed herein including the developments of new labelling, digestion and enrichment methods target specific subsets of the glycoproteome and are thus incompatible with unbiased quantitative glycoproteome profiling. In contrast, LC–MS/MS methods that can elucidate subtle differences of all introduced glycopeptide forms either by improved separation and/or via mass spectral evidence may in our view have a stronger potential to facilitate fully quantitative yet still structure-centric glycoproteome profiling in the future.
Finally, it is clear that the field of structure-focused glycoproteomics should already now begin to prepare for a downstream integration with the other ‘omics disciplines including glycomics, transcriptomics, and metabolomics, while simultaneously paying close attention to and align with the exciting advances emerging in the area of top-down analysis of intact glycoproteins [250–254].
Glycoscientists have for years recognised that no single technology can provide a holistic and comprehensive understanding of the structural and functional heterogeneity inherently associated with protein glycosylation. While a full implementation of structure-focused glycoproteomics hold a tremendous potential to enhance our insight into pending glycobiological questions, it is clear that such technologies are but few amongst many tools that glycoscientists will need to draw on to explore the fascinating world of glycobiology in years ahead.
Perspectives
Importance: Glycoproteomics is a powerful quantitative technology that enables scientists to study the entire complement of glycoproteins expressed by cells, tissues or organisms at a given time, space and condition. Detailed site-specific characterisation of glycan structures in a glycoproteome-wide context, herein referred to as ‘structure-focused glycoproteomics’ is considered one of the next frontiers to be targeted to advance the field further and to address many pending questions in glycobiology.
Current understanding and challenges: Our survey has revealed a rapidly growing body of literature employing quantitative glycoproteomics to study the dynamics and complexity of the glycoproteome with unprecedented sensitivity and coverage demonstrating that the technology is maturing and ready to be applied to biology. Notably, many novel innovations spanning new chemoenzymatic labelling, digestion, enrichment, separation and mass spectrometry detection strategies compatible with structure-focused glycoproteomics have recently emerged revealing glycan fine structural details with protein site resolution. Challenges in the field include a high false discovery rate of glycopeptides reported in glycoproteomics papers, a lack of standardisation of the data collection, analysis, interpretation, reporting and sharing, and incomplete integration with other ‘omics disciplines (glycomics, transcriptomics..) and top-down MS approaches.
Future directions: Our review illustrates that structure-focused glycoproteomics holds a considerable potential to become a key tool in the emerging area of systems glycobiology and unlock secrets of the glycoproteome in years to come.
Competing Interests
The authors declare that there are no competing interests associated with the manuscript.
Open Access
Open access for this article was enabled by the participation of Macquarie University in an all-inclusive Read & Publish pilot with Portland Press and the Biochemical Society under a transformative agreement with CAUL.
Funding
A.C. was supported by an international Macquarie University Research Excellence Scholarship (iMQRES). R.K. was supported by an Early Career Fellowship from the Cancer Institute NSW. M.T.-A. was supported by a Macquarie University Safety Net Grant.
Author Contributions
A.C., R.K. and M.T.-A. wrote the review.
Abbreviations
- AANL
Agrocybe aegerita lectin
- AD
Alzheimer's disease
- ADR
Adriamycin
- AI-ETD
activated-ion electron transfer dissociation
- BPH
benign prostatic hyperplasia
- CE
capillary electrophoresis
- CID
collision-induced dissociation
- ConA
concanavalin A
- CSF
cerebrospinal fluid
- Da
Dalton
- DDA
data-dependent acquisition
- DIA
data-independent acquisition
- di-sT
di-sialyl-T antigen
- ETD
electron-transfer dissociation
- EThcD
electron-transfer/higher-energy collision dissociation
- Fuc
L-fucose
- Gal
D-galactose
- GalNAc
N-acetyl-D-galactosamine
- GalNAz
N-azidoacetylgalactosamine
- GlcNAc
N-acetyl-D-glucosamine
- HCC
hepatocellular carcinoma
- HCD
higher-energy collision dissociation
- Hex
hexose
- HexNAc
N-acetylhexosamine
- HILIC
hydrophilic interaction liquid chromatography
- IMS
ion mobility spectrometry
- LacdiNAc
N,N-diacetyllactosamine
- LC
liquid chromatography
- MS
mass spectrometry
- MS/MS
tandem mass spectrometry
- NCE
normalised collision energy
- NeuAc
N-acetyl-D-neuraminic acid
- NeuGc
N-glycolyl-D-neuraminic acid
- PCa
prostate cancer
- pd
product-dependent
- PGC
porous graphitised carbon
- PRM
parallel reaction monitoring
- Pro
proline
- Q-ToF
quadrupole time-of-flight
- RCA
Ricinus communis agglutinin
- SA
supplemental activation
- SAGR
smart anti-glycan reagent
- SCE
stepped collision energy
- Ser
serine
- SIM
selected ion monitoring
- SPE
solid phase extraction
- sT
sialyl-T antigen
- sTn
sialyl-Tn antigen
- Thr
threonine
- TMT
tandem mass tag
- UDP
uridine diphosphate
- UVPD
ultraviolet photodissociation
- VVA
Vicia villosa agglutinin
- WGA
wheat germ agglutinin
- XIC
extracted ion chromatogram
- ZIC-HILIC
zwitterionic-hydrophilic interaction liquid chromatography