What are all these hundreds of different bacterial species doing in and on us? What interactions occur between the host and the microbes, and between the microbes themselves? By studying proteins, metaproteomics tries to find preliminary answers to these questions. There is daunting complexity around this; in fact, many of these proteins have never been studied before. This article is an introduction to the field of metaproteomics in the context of the human microbiome. It summarizes where we are and what we have learnt so far. The focus will be on faecal proteomics as most metaproteomics research has been conducted on that sample type. Metaproteomics has made major advances in the past decade, but new sample preparation strategies, improved mass spectrometric analysis and, most importantly, data analysis and interpretation have the potential to pave the way for large-cohort metaproteomics.

Introduction

What we know of the co-inhabitants living on our inner and outer surfaces has largely been assessed via genomic approaches. The 16S rRNA gene has been used intensively to determine the phylogenetic composition of microbiomes, whereas examining the whole gene content has been exploited to unravel the encoded functions of microbiomes. Both of these approaches, especially with the trend of combining genome-wide sequencing data from the host with microbiome data, have generated massive amounts of data. While these approaches are building our knowledge about the basis of life, i.e. DNA, they do not take into account the pathways that are actually activated at the time of measurement. An understanding of what microbes do and how the host responds, and vice versa, is more readily available through transcriptomics and proteomics, the latter being the focus of this article. Several publications have revealed the difference between the development of a microbial community and the proportion of functions contributed by the microbial community members, i.e. some species may replicate better under certain conditions but some may contribute more actively to the local metabolism [1]. The type of samples examined by genomics, the number of samples and the availability of data analysis tools represented by the number of publications on human metagenomics is almost 10-fold higher than for work oriented towards functional expression represented by metaproteomics and metatranscriptomics. What the articles using meta-omic approaches have in common is their focus on the microbiome of the intestine. In the case of proteomic analysis of human microbiomes, individual studies on saliva [24], cerumen (earwax) [5] and urine [6] have been conducted. In general, far fewer studies have been performed in humans than on environmental samples, i.e. the range of accessible environmental ecosystems is higher than that of human microbiomes. Furthermore, the focus in environmental metaproteomics tends to be explorative [7]. Nonetheless, the studies on the human intestinal metaproteome have revealed not only that the composition of the intestinal microbiome is very individual specific, but that this also applies to the proteins that are present in these microbiomes [1,8]. While proteins have been called, with good reason, a cells “workhorses”, the same is true in of microbes on human surfaces.

At present, several thousand proteins have been identified from human microbiomes; the total diversity of proteins acting between humans and microbes can only be extrapolated. With the average genome encoding 2000 proteins per species and 500 to 10­00 different species per individual intestine, one would expect to find something in the order of one million different proteins there, not taking into account strain effects which may introduce even more diversity. Obviously, this complexity has not yet been resolved. The developments in revealing the diversity of the intestinal microbes during the last decade can easily been demonstrated from the early metaproteomic studies. In the first publication covering proteins from faecal samples from infants conducted in 2007, a few hundred protein spots were visualized on a 2D gel, but with only very few proteins actually being identified [9]. One decade later, while there are typically several thousands of proteins being identified [8,10,11], this is still far from potentially several hundreds of thousands of proteins, i.e. there are still many more to be explored. Following a short description of the proteomic principle, the following sections will show where we stand right now and what our goals should be when looking at microbes through proteomics, i.e. starting with examining the whole human microbiome through metaproteomics.

Proteomic principles

The study of proteins of human microbes is based on the use of mass spectrometry. Recent developments have been reviewed in detail (for a recent comprehensive introduction on mass spectrometry see [12]) and here only the principle will be introduced in brief. The most prominent choices for each step in the proteomics workflow in the context of human microbiome research are presented in more detail in [13] and will be discussed with respect to present achievements in metaproteomics in the next section. Before covering the typical (meta-)proteomics pipeline, I want to stress some crucial facts that are important in planning and interpreting (meta-)proteomics experiments. Each discipline wants to have as little bias as possible. For (meta-)proteomics, this means that one should have the representative proteins under study as completely available as possible for analysis, acquire these proteins in terms of coverage as accurately as possible and have the right tools available to analyse and interpret the data.

First, proteins need to be extracted from the matrix in which they are present. As proteins are too large for the subsequent mass spectrometric analysis, they need to be digested into peptides which are then introduced into a chromatographic column for reverse-phase chromatography from which they elute according to their hydrophobicity. Once they have reached the end of the column, the peptides are ionized by high voltage and are introduced to the mass spectrometer, typically a hybrid mass spectrometer where a quadrupole is combined with an Orbitrap, where the intact ions are measured as well as the subsequent fragmentation of the most intense ions. At the end of a mass spectrometric run, one receives a long list with up to thousands of mass-to-charge (m/z) ratios with their corresponding intensities. As the separation capability of the chromatographic system and the fragmentation speed from the mass spectrometer are limited, fractionation of the proteins/peptides increases the overall number of identified proteins. For each of the above steps, there are several options for reaching the result of having a long list of m/z ratios with intensity values. The next step typically comprises taking the genome of the analysed organism, digesting the proteins in silico with the help of search engines (e.g. Sequest or Andromeda, which are two of the most common), and matching it to the experimental m/z values obtained in the liquid chromatography−mass spectrometry (LC−MS) run. This is straightforward when the object of study is a model organism whose exact genome is known. The only additional complexity is possible post-translational modification (PTM); however, when looking at microbes from ecosystems, this is less trivial. If one studies a single organism, it is rather often the case that the exact strain has not been sequenced; the available genome might be from a related strain and there might be proteins missing which are unique to the strain under study. Therefore, the “search space” when analysing microbiomes is not likely to be predictable. Common strategies are either (a) sequencing the same microbiome and using this as sequence collection or (b) taking non-related metagenomes from published data and adding sequences from microbes expected to be in the respective environment. When one has a match between the experimental m/z value and peptide sequence from the sequence collection (which is called a peptide spectral match (PSM) and which is usually not 100%), statistical analysis is applied to give the result a certain probability. Usually samples from different experimental conditions are compared based on quantitative measures. A widely used approach is to count the number of PSMs per peptide, protein or taxonomic or functional class; this is called spectral counting. The next step is to make sense of the data. A classic proteomics experiment would involve applying statistical tests of the conditions being studied, but such tests are not usually applied in the study of proteins of microbial consortia. Typically, un-supervised techniques are used to introduce some structure into the complex data. Functional classification systems such as KEGG (the Kyoto Encyclopedia of Genes and Genomes) [14] or eggNOG (evolutionary genealogy of genes: Non-supervised Orthologous Groups) [15] are used to group the hundreds or even thousands of proteins into more comprehensive “bins” such as ‘Carbohydrate Metabolism' or ‘Translation'. One further step involves the definition of ecosystem-specific functional groups such as the gut-specific metabolic modules (GMMs) [1]. Assigning a taxonomic unit such as phylum or family to peptides and proteins can be informative if one knows what one is doing. The method of choice is the lowest common ancestor strategy which takes into account all possible origins for a proteomic unit, ideally the PSM, and reports the lowest common taxonomic unit of all hits. However, the strength of a metaproteomics approach is that it categorizes the functions of the proteins under investigation. As there are obviously many steps in metaproteomics analysis, different pipelines have been described e.g. the MetaProteomeAnalyser (MPA) [16] which starts with protein identification and ends with a functional and taxonomic classification. A comprehensive overview of data analysis approaches and tools is given in [17].

Proteomics of entire human microbiomes: where we are

As mentioned above, most human microbiome metaproteomics studies involve intestinal samples, mostly faecal samples, due to their more ready availability compared to those from other intestinal areas (Table 1). Basically, all metaproteomics studies are not purely biological descriptions; in most cases they also present new ways of approaching the proteins in faecal samples, starting from sample preparation [8] up to data analysis [18]. The biological findings are based mostly on the identification of the most abundant proteins in faecal samples from healthy individuals [10,19]. Energy conversion, especially carbohydrate and amino-acid metabolism, and pathways regulating the redox potential have been shown to be prominent in the human intestine. Proteins from intestinal microbiomes have also been used to couple the presence of microbes with their activity [1,11,20]. Although these results have to be viewed with caution due to biases on both sides, they highlight the current trend towards obtaining a systemic clarification of human-associated microbiomes. Metaproteomics research in the context of disease started with investigations into inflammatory bowel disease (IBD) [21,22]. The most intriguing finding of these was that although no differentiation between healthy individuals and IBD patients could be made observed based on microbial composition, one could distinguish patients from controls by their metaproteomics profile [22]. Other disease states that have been explored by metaproteomics include familial type 1 diabetes [23], cystic fibrosis [24], and adiposity [20]. A proteomics investigation conducted on mucosal lavage samples revealed that there are sophisticated metabolic networks in place which are obviously not similar across the gut and require further study [25].

Table 1
Overview of prominent studies on human microbiome metaproteomics
Microbiome under studyMain findingsReference
INTESTINAL MICROBIOME* 
Infants 
 2 infants Change of faecal protein profile over the first 100 days of life Klaassens et al. [9
 4 preterm infants Stronger time effect than individual effect in first three months Xiong et al. [11
Healthy adults 
 3 individuals, 3 time points over one year Individual effect stronger than time effect; establishment of 1D-LC-MS/MS pipeline
Grouping of proteins acc. peptide relation 
Kolmeder et al. [8
 16 individuals, 3 time points over six weeks Individual effect stronger than time effect from peptide to COG level; no direct effect of probiotic intervention Kolmeder et al. [1
Diseases and special phenotypes 
Adiposity 13 non-overweight, 16 overweight individuals Separation of protein profiles of obese and non-obese individuals; correlation of serum marker and proteins Kolmeder et al. [20
Cystic fibrosis 15 children and unaffected siblings (age 1.6–15.6) Potential human and bacterial faecal biomarkers for cystic fibrosis Debyser et al. [24
Type 1 diabetes 36 samples from 18 individuals Human and bacterial protein differences between cases and healthy controls Heintz-Buschart et al. [23
Methodology orientated studies 
 One healthy volunteer Under-representation of several Bacteroides peptides when using only bacterial pellet (by differential centrifugation) and not un-separated faecal material Tanca et al. [31
 2 preterm infants Sequential filtering of faecal suspension yielded improved protein identification and quantitation Xiong et al. [53
ORAL MICROBIOMES 
 Salivary supernatant from 6 healthy subjects Dominance of metabolic bacterial enzymes Jagtap et al. [2
 Saliva swabs from 8 healthy volunteers, 2 time points (waking and postprandial) High protein diversity in saliva; effect of sampling time-point Grassl et al. [3
Microbiome under studyMain findingsReference
INTESTINAL MICROBIOME* 
Infants 
 2 infants Change of faecal protein profile over the first 100 days of life Klaassens et al. [9
 4 preterm infants Stronger time effect than individual effect in first three months Xiong et al. [11
Healthy adults 
 3 individuals, 3 time points over one year Individual effect stronger than time effect; establishment of 1D-LC-MS/MS pipeline
Grouping of proteins acc. peptide relation 
Kolmeder et al. [8
 16 individuals, 3 time points over six weeks Individual effect stronger than time effect from peptide to COG level; no direct effect of probiotic intervention Kolmeder et al. [1
Diseases and special phenotypes 
Adiposity 13 non-overweight, 16 overweight individuals Separation of protein profiles of obese and non-obese individuals; correlation of serum marker and proteins Kolmeder et al. [20
Cystic fibrosis 15 children and unaffected siblings (age 1.6–15.6) Potential human and bacterial faecal biomarkers for cystic fibrosis Debyser et al. [24
Type 1 diabetes 36 samples from 18 individuals Human and bacterial protein differences between cases and healthy controls Heintz-Buschart et al. [23
Methodology orientated studies 
 One healthy volunteer Under-representation of several Bacteroides peptides when using only bacterial pellet (by differential centrifugation) and not un-separated faecal material Tanca et al. [31
 2 preterm infants Sequential filtering of faecal suspension yielded improved protein identification and quantitation Xiong et al. [53
ORAL MICROBIOMES 
 Salivary supernatant from 6 healthy subjects Dominance of metabolic bacterial enzymes Jagtap et al. [2
 Saliva swabs from 8 healthy volunteers, 2 time points (waking and postprandial) High protein diversity in saliva; effect of sampling time-point Grassl et al. [3
*

All studies based on faecal samples.

As mentioned above, even fewer studies have focused on human microbiomes beyond the intestine. One example is the analysis of the cerumen of five individuals [5]. Of the 2013 identified proteins, only 14% were of bacterial origin, with RNA and nucleotide binding, and antioxidant and catalytic activity being their main molecular functions. Another body-compartment, the oral cavity, has been slightly better researched, e.g. via saliva samples. When saliva from ten orally healthy individuals and ten patients with periodontitis or dental caries were analysed, around 4200 proteins were found, half of which were human proteins, the other half of them being bacterial [4]. The differentiation between saliva samples from healthy individuals and those with periodontitis was possible with human proteins. There are two studies on the proteome of the human urine microbiome associated with application of a catheter [6,26] and another study on female genital tract cervicovaginal lavage [27]. All three studies highlight some proteins that might be relevant in these ecosystems and linked to the inflamed state.

The aforementioned studies have been described in a very brief manner since, more often than not, the proteins or a group of proteins have been found to be different between conditions, while giving no clear answers on the observed differences in phenotype. When one peruses the original research papers, either there are rather brief or alternatively very long discussions when interpreting individual observations because results with hard statistics are missing. In addition, the number of reviews is plentiful but the number in research articles still needs a steep increase [2830].

I am therefore describing the methodological status quo. Possible strategies for resolving the remaining bottlenecks will be considered later in this article. When discussing faecal proteomics, one has to keep in mind that the complexity of the proteins contained therein is too great for identification of them all in a single protein-extraction LC−MS/MS run to be possible. There will always be a proportion of proteins not being analysed; this fraction is omitted to reduce the complexity of the sample. The fractionation of the sample can be done either right from the start or in the downstream processes. In order to concentrate on microbial proteins, some investigators have prepared bacterial pellets for sub-sequential protein extraction. Most commonly, density-centrifugation (applying several steps of low-spin centrifugation) has been used to remove most of the eukaryotic cells and fibers. Density centrifugation is a ‘cleaner' way of obtaining a bacterial pellet [22]. Instead of concentrating on the bacterial fraction, un-fractionated faecal material has been used for protein extraction. One study showed that this approach introduces less biases in the proteins derived from the phylum Bacteroidetes, one of the four major phyla in the human intestine [31].

The next step is protein extraction. Several approaches have been used to extract proteins from faeces or bacterial pellets starting with mechanical lysis without the addition of solvents, or in solvent-containing buffers in combination with sonication up to freeze drying combined with thermal treatment. A very recent study compared several of these strategies and recommended that ultra-sonication should be used in combination with SDS (sodium dodecyl sulfate) [32]. To reduce the complexity of proteins and peptides reaching the mass-spectrometer, the most common approach is to use a 1D gel to separate the proteins by size and then introduce the peptides from several fractions of the gel, one by one, into the mass spectrometer. Another common approach has been to use ion-exchange chromatography to produce several fractions for individual analysis in the mass spectrometer. Hardly any variation has seen the LC−MS/MS setup used for peptide analysis. NanoLC is used regularly, and basically the only differences in mass-spectrometers is seen by the fact that there is continuous upgrade in the capabilities of the mass analyzers. Generally, tandem mass spectrometers have become faster in their capacity for producing MS/MS, therefore potentially leading to more proteins being identified. As the downstream bioinformatics analysis is basically different in each study, this does not necessarily lead to higher numbers of proteins being identified. The next weak spot in a metaprotoemics analysis is assigning the measured spectra to peptides and their respective proteins. Matched metagenomes have been applied, i.e. metagenomes from the very same samples on which a metaproteomics analysis has been performed. It has been noted that the method used for processing the metagenomic data impacts the metaproteomics results [33]. Most often, un-matched metagenomes have been used in combination with individual representatives of the intestinal microbiota. The increases in available human metagenomes and individual representatives of the human microbiota have been one of the driving forces in metaproteomics research in the past decade. While the first paper on faecal proteomics did not have any metagenome available, there are now 10 million genes derived from faecal metagenomics in databases [34]. Ten million protein sequence entries is 100−1000 times more than what is contained in a sequence collection when studying a single organism. Initially however, the approaches for interrogating proteins in metaproteomics work have simply been taken from these studies on single organisms. It has been shown that this method reaches its limits in metaproteomics studies and that the sensitivity depends strongly on the applied search parameters and the size of the database [35]. Strategies have been described to approach a large search space [18,36,37] and the importance of having the correct database has been exemplified in [38]. Pipelines for protein interrogation have been proposed [39] but no state-of-the-art approach exists yet.

Emerging technologies and ways to explore more: where we would like to be

Metaproteomic research of human microbiomes has been ongoing for a mere decade. Pioneering works established a set of routine analyses although the number of analysed samples has been only several dozen per study. One of the observed bottlenecks has been the depth of observed proteins and ways of interpreting the measured proteins. Recent research is still oriented towards advancing methodologies. Generally, rodents have been used to investigate the possibilities available with these new methods. One of these methods is to apply metabolic in vivo labelling to track the newly synthesized proteins in the intestinal environment [40,41]. Rats consuming either a high-fat or normal chow diet were supplemented with isotopically labelled spirillum as the only nitrogen source, and by comparing light (14N) and heavy (15N) isotopes, differences were observed in protein expressions between the two feeding types over time [40]. The rationale for using stable isotopes is that they have higher sensitivity for detecting statistical significance as this does not rely on spectral counts but on the comparisons of the chromatographic peaks for light and heavy peptides. In a study conducted in a mouse model of IBD using the same stable isotopic labelling technique, the authors claimed that the application of stable isotopes made it possible to probe deeper than spectral counting, which would require five counts per peptide [41]. The same paper, provides an example of another way to narrow down the proteins under observation, i.e. the addition of activity-based probes after the proteins have been extracted from the sample. Active cysteines were labelled with a biotinylated glycine-chloromethyl ketone (BioGlyCMK) probe and passed through avidin columns so that those faecal proteins with an active cysteine were enriched. The authors highlighted that there was only a 51% overlap with the unenriched sample, emphasizing this approach's great potential of identifying more proteins from faecal samples in the enrichment phase. Therefore, the possibility of targeting different characteristics of proteins and enriching them may well reveal new perspectives of the intestinal samples.

Proteomics studies of individual representatives of the human microbiome should not be overlooked when it comes to driving the capabilities of metaproteomics studies. Perhaps, the best studied proteome is that of Lactobacillus rhamnosus GG. The challenges expected to occur in the intestine, such as the presence of bile acids, and the general repertoire of secreted and surface proteins through which it might interact with the host have both been studied [4244]. A regular gut inhabitant, Akkermansia muciniphila, has recently been described and its protein repertoire identified, which further emphasized its role in mucin degradation [45,46]. A secreted protein from A. muciniphila was able to improve metabolism in mice [47] and while the effects in humans still need to be demonstrated, this is a good example of the direction in which microbiome research is going. A good example of researchers first predicting the gene content and protein localizations, and then performing proteomics to search for the actual localization of the proteins is seen in the case of Veillonella parvula, which can be found in multiple human microbiomes (mouth, gastro-intestinal tract, lungs and vagina) [48]. Of the 1844 predicted proteins in this bacterium, 10% were identified as outer membrane proteins. These might have a role in the opportunistic pathogenic nature of this human commensal organism. Looking at individual species and their protein expressions under specific conditions is important when it comes to resolving the complex pathway links within human microbiomes. Although methods applied in ‘classical' proteomics are used in microbiome proteomics, the potential to adopt other techniques exists. One method that may increase the likelihood of identifying more proteins is data-independent acquisition. By applying this technique in metaproteomics, more of the information acquired in the mass spectrometric run could be utilized since normally only a fraction of the actual data can be used. Data-independent acquisition produces fragments of each observed ion; it is not limited to several tens of thousands of intense ions as has been the case in the previously routinely applied data-dependent acquisition. In addition, the use of different enzymes for protein cleavage may increase the range of proteins that can be identified [49]. At present trypsin is routinely used, and this introduces a bias for proteins with a low frequency of arginine and lysine residues (trypsin's cleavage amino acids). By using enzymes that have a different cleavage site, e.g. Glu-C, the range of potentially identified proteins presumably would increase.

Once the screening era in metaproteomics research has come to fruition, the focus will move to individual proteins. To date, a single study including patients with Crohn's disease applied the so-called single-reaction monitoring (SRM) approach [22]. In this work, a mass spectrometry-based method, similar to that regularly applied in human proteomics [50], examined several proteins of interest, with their quantities being determined by the abundance of the ab initio predicted peptides.

There are obviously ongoing methodological developments and new methods to be adapted in the field of human microbiome proteomics to drive this field forward. But where do we actually ultimately want to end up? Ideally, we would like to have proteins (concrete proteins down to the isoform level, including post-translational modifications) capable of discriminating between different health states/conditions and knowing how they contribute to the observed phenotype. What do we need to achieve that goal?

  1. An established protocol for retrieving proteins from human microbiome samples which has been tested with a critical amount of samples of different sample consistencies, and not simply having somewhat anecdotal reports on less than 100 different biological samples per study, which is the situation currently.

  2. Further ways to optimize how we target more proteins at a time and have approaches for covering sub-setting when reaching the limits. These will both relate to advances in mass spectrometers and chemistries for protein enrichment.

  3. Have a straightforward data-analysis pipeline that utilizes the mass-spectrometric raw data to identify peptides by taking into account the special necessities in metaproteomics concerning the huge amount of proteins to be expected. To achieve this goal, a sensitive and accurate strategy has to be found. This analytical pipeline would also apply appropriate statistical analysis and deal with the functional annotation such as that available for human and model organisms [51]. In addition, this pipeline would be able to deal with post-translational modifications. Furthermore, a functional grouping would be applied which would be microbiome specific. The aforementioned GMMs represent a start. The routinely used gene ontology systems KEGG and COG (Clusters of Orthologous Groups) are based on a collection of microbes, often not very likely to be found in human microbiomes. Ideally, statistical analysis could be applied to individual proteins, as one loses resolution when “binning” the proteins into groups, although this has been done to date since the coverage of proteins is presently not usually sufficiently high.

  4. Our knowledge of the physiology of human microbes has to increase tremendously. This is not something the metaproteomics community can achieve alone and will need input from the whole microbiome field.

  5. With items 1–4 in place, one can then proceed to study large cohorts (>100 individuals per test group). If such a study produces lists of 2500 proteins per sample with a protein of coverage >50% per sample and if we know the function(s) of all these proteins in a metabolic network, we will have come a long way.

  6. Expand the focus from faecal samples onto different areas of the intestine in order to recognize that there are spatial differences, both from mucus to lumen and distal versus proximal (with a potential gradient in both directions) and to reach to other body sites in addition to those studied currently.

Concluding remarks

In the non-metaproteomics community, metaproteomics has acquired the somewhat negative connotation of being low resolution. Several thousands of identified proteins as output from a metaproteomics analysis may appear to be a small number compared to several hundred thousands of metagenomics reads produced in next generation sequencing setups. At a first glance, that might be true, but the actual deficits in human microbiome research at present lie elsewhere. Fifty percent of microbes have probably not been described yet as they have never been cultured [52]; and even for the other 50%, we know very little. For example, what effect do they have when they occur in a certain microbial consortium. Furthermore, if we consider the contribution of the host genotype, we are basically clueless. Only if we combine tools studying genes, transcripts, proteins and metabolites, study microbial consortia and their environmental compounds as well as individual cells of these consortia, will we find answers to the currently unresolved questions (Figure 1). How many compensating mechanisms exist within microbiota? How much do gene variants affect homeostasis of the ecosystem? A single amino-acid change can make a difference (e.g. virulence factors of pathogens). The “Nobel Prize” may await those solving the puzzle of which proportions of the components of the ‘environment–host genes–microbiome' triangle are responsible for a certain phenotype. We must also realize that not everything which is possible actually needs to be done.

Synopsis of analysing host−microbiome relations.

Figure 1.
Synopsis of analysing host−microbiome relations.

In the active manipulation of the host−microbe ecosystem, we need to appreciate the different hierarchical levels: the composition (phylogeny) of the microbiome, transcriptional and translational regulation, and the input nutrients and their output metabolites. Combining the host and microbe site would help to clarify the whole, although it has to be appreciated that each alone is already complex; inter-regulation of many pathways makes it currently impossible to understand individual pathways. 2000 represents the average number of proteins encoded by a bacterial bacterial genome; Y = placeholder for number of bacterial species present in the microbiome; X = placeholder for number of splice variants; PTMs, post-translational modifications.

Figure 1.
Synopsis of analysing host−microbiome relations.

In the active manipulation of the host−microbe ecosystem, we need to appreciate the different hierarchical levels: the composition (phylogeny) of the microbiome, transcriptional and translational regulation, and the input nutrients and their output metabolites. Combining the host and microbe site would help to clarify the whole, although it has to be appreciated that each alone is already complex; inter-regulation of many pathways makes it currently impossible to understand individual pathways. 2000 represents the average number of proteins encoded by a bacterial bacterial genome; Y = placeholder for number of bacterial species present in the microbiome; X = placeholder for number of splice variants; PTMs, post-translational modifications.

The last 10 years have seen a significant improvement in the resolution of mass spectrometers and the introduction of several new innovative strategies for analysing human microbiomes by proteomics in combination with the expansion of genetic information in the form of human metagenomes and human commensals. With the ongoing efforts in harmonizing sample preparation and data analysis, by focusing on bacteria and host physiology, and by applying these strategies on large cohorts, I am looking forward to the next 10 years.

Summary
  • The sequencing efforts of the past decade have helped in providing background information for identifying proteins from human microbiome samples and several strategies for protein analysis have been suggested; these provide a good starting point for further developments.

  • Streamlining of efforts is presently ongoing but further work will be required to harmonize sample preparation and data analysis.

  • A prerequisite of embarking on examining large cohorts, is that research has to include the study of bacterial physiology to decrease the high number of proteins with unknown functions.

Abbreviations

     
  • eggNOG

    evolutionary genealogy of genes: Non-supervised Orthologous Groups

  •  
  • GMM

    gut-specific metabolic modules

  •  
  • IBD

    inflammatory bowel disease

  •  
  • KEGG

    Kyoto Encyclopedia of Genes and Genomes

  •  
  • LC

    liquid chromatography

  •  
  • MS

    mass spectrometry

  •  
  • m/z

    mass-to-charge ratio

  •  
  • PSM

    peptide spectral match

  •  
  • PTM

    post-translational modification

Acknowledgment

I thank Dr. Ewen Mac Donald for taking care of the English in this text.

Competing Interests

The Author declares that there are no competing interests associated with this manuscript.

References

References
1
Kolmeder
,
C.A.
,
Salojärvi
,
J.
,
Ritari
,
J.
,
de Been
,
M.
,
Raes
,
J.
,
Falony
,
G.
et al
(
2016
)
Faecal metaproteomic analysis reveals a personalized and stable functional microbiome and limited effects of a probiotic intervention in adults
.
PLoS ONE
11
,
e0153294
2
Jagtap
,
P.
,
McGowan
,
T.
,
Bandhakavi
,
S.
,
Tu
,
Z.J.
,
Seymour
,
S.
,
Griffin
,
T.J.
et al
(
2012
)
Deep metaproteomic analysis of human salivary supernatant
.
Proteomics
12
,
992
1001
3
Grassl
,
N.
,
Kulak
,
N.A.
,
Pichler
,
G.
,
Geyer
,
P.E.
,
Jung
,
J.
,
Schubert
,
S.
et al
(
2016
)
Ultra-deep and quantitative saliva proteome reveals dynamics of the oral microbiome
.
Genome Med.
8
,
44
4
Belstrøm
,
D.
,
Jersie-Christensen
,
R.R.
,
Lyon
,
D.
,
Damgaard
,
C.
,
Jensen
,
L.J.
,
Holmstrup
,
P.
et al
(
2016
)
Metaproteomics of saliva identifies human protein markers specific for individuals with periodontitis and dental caries compared to orally healthy controls
.
PeerJ
4
,
e2433
5
Feig
,
M.A.
,
Hammer
,
E.
,
Völker
,
U.
and
Jehmlich
,
N.
(
2013
)
In-depth proteomic analysis of the human cerumen—a potential novel diagnostically relevant biofluid
.
J. Proteomics
83
,
119
129
6
Fouts
,
D.E.
,
Pieper
,
R.
,
Szpakowski
,
S.
,
Pohl
,
H.
,
Knoblach
,
S.
,
Suh
,
M.-J.
et al
(
2012
)
Integrated next-generation sequencing of 16S rDNA and metaproteomics differentiate the healthy urine microbiome from asymptomatic bacteriuria in neuropathic bladder associated with spinal cord injury
.
J. Transl. Med.
10
,
174
7
Wang
,
D.-Z.
,
Kong
,
L.-F.
,
Li
,
Y.-Y.
and
Xie
,
Z.-X.
(
2016
)
Environmental microbial community proteomics: status, challenges and perspectives
.
Int. J. Mol. Sci.
17
,
1275
8
Kolmeder
,
C.A.
,
de Been
,
M.
,
Nikkilä
,
J.
,
Ritamo
,
I.
,
Mättö
,
J.
,
Valmu
,
L.
et al
(
2012
)
Comparative metaproteomics and diversity analysis of human intestinal microbiota testifies for its temporal stability and expression of core functions
.
PLoS ONE
7
,
e29913
9
Klaassens
,
E.S.
,
de Vos
,
W.M.
and
Vaughan
,
E.E.
(
2007
)
Metaproteomics approach to study the functionality of the microbiota in the human infant gastrointestinal tract
.
Appl. Environ. Microbiol.
73
,
1388
1392
10
Verberkmoes
,
N.C.
,
Russell
,
A.L.
,
Shah
,
M.
,
Godzik
,
A.
,
Rosenquist
,
M.
,
Halfvarson
,
J.
et al
(
2009
)
Shotgun metaproteomics of the human distal gut microbiota
.
ISME J.
3
,
179
189
11
Xiong
,
W.
,
Brown
,
C.T.
,
Morowitz
,
M.J.
,
Banfield
,
J.F.
and
Hettich
,
R.L.
(
2017
)
Genome-resolved metaproteomic characterization of preterm infant gut microbiota development reveals species-specific metabolic shifts and variabilities during early life
.
Microbiome
5
,
72
12
Pottiez
,
G.
(
2015
)
Mass Spectrometry: Developmental Approaches to Answer Biological Questions
.
Springer
13
Kolmeder
,
C.A.
and
de Vos
,
W.M.
(
2014
)
Metaproteomics of our microbiome — developing insight in function and activity in man and model systems
.
J. Proteomics
97
,
3
16
14
Kanehisa
,
M.
,
Goto
,
S.
,
Sato
,
Y.
,
Furumichi
,
M.
and
Tanabe
,
M.
(
2012
)
KEGG for integration and interpretation of large-scale molecular data sets
.
Nucleic Acids Res.
40
,
D109
D114
15
Powell
,
S.
,
Forslund
,
K.
,
Szklarczyk
,
D.
,
Trachana
,
K.
,
Roth
,
A.
,
Huerta-Cepas
,
J.
et al
(
2014
)
eggNOG v4.0: nested orthology inference across 3686 organisms
.
Nucleic Acids Res.
42
,
D231
D239
16
Muth
,
T.
,
Behne
,
A.
,
Heyer
,
R.
,
Kohrs
,
F.
,
Benndorf
,
D.
,
Hoffmann
,
M.
et al
(
2015
)
The MetaProteomeAnalyzer: a powerful open-source software suite for metaproteomics data analysis and interpretation
.
J. Proteome Res.
14
,
1557
1565
17
Heyer
,
R.
,
Schallert
,
K.
,
Zoun
,
R.
,
Becher
,
B.
,
Saake
,
G.
and
Benndorf
,
D.
(
2017
)
Challenges and perspectives of metaproteomic data analysis
.
J. Biotechnol.
261
,
24
36
18
Rooijers
,
K.
,
Kolmeder
,
C.
,
Juste
,
C.
,
Doré
,
J.
,
de Been
,
M.
,
Boeren
,
S.
et al
(
2011
)
An iterative workflow for mining the human intestinal metaproteome
.
BMC Genomics
12
,
6
19
Kolmeder
,
C.
(
2015
)
Metaproteomics of the Human Intestinal Tract to Assess Microbial Functionality and Interactions with the Host
.
Dissertationes Schola Doctoralis Scientiae Circumiectalis, Alimentarie, Biologicae
20
Kolmeder
,
C.A.
,
Ritari
,
J.
,
Verdam
,
F.J.
,
Muth
,
T.
,
Keskitalo
,
S.
,
Varjosalo
,
M.
et al
(
2015
)
Colonic metaproteomic signatures of active bacteria and the host in obesity
.
Proteomics
15
,
3544
3552
21
Erickson
,
A.R.
,
Cantarel
,
B.L.
,
Lamendella
,
R.
,
Darzi
,
Y.
,
Mongodin
,
E.F.
,
Pan
,
C.
et al
(
2012
)
Integrated metagenomics/metaproteomics reveals human host-microbiota signatures of Crohn's disease
.
PLoS ONE
7
,
e49138
22
Juste
,
C.
,
Kreil
,
D.P.
,
Beauvallet
,
C.
,
Guillot
,
A.
,
Vaca
,
S.
,
Carapito
,
C.
et al
(
2014
)
Bacterial protein signals are associated with Crohn's disease
.
Gut
63
,
1566
1577
23
Heintz-Buschart
,
A.
,
May
,
P.
,
Laczny
,
C.C.
,
Lebrun
,
L.A.
,
Bellora
,
C.
,
Krishna
,
A.
et al
(
2016
)
Integrated multi-omics of the human gut microbiome in a case study of familial type 1 diabetes
.
Nat. Microbiol.
2
,
16180
24
Debyser
,
G.
,
Mesuere
,
B.
,
Clement
,
L.
,
Van de Weygaert
,
J.
,
Van Hecke
,
P.
,
Duytschaever
,
G.
et al
(
2016
)
Faecal proteomics: a tool to investigate dysbiosis and inflammation in patients with cystic fibrosis
.
J. Cyst. Fibros.
15
,
242
250
25
Li
,
X.
,
LeBlanc
,
J.
,
Elashoff
,
D.
,
McHardy
,
I.
,
Tong
,
M.
,
Roth
,
B.
et al
(
2016
)
Microgeographic proteomic networks of the human colonic mucosa and their association with inflammatory bowel disease
.
Cell Mol. Gastroenterol. Hepatol.
2
,
567
583
26
Lassek
,
C.
,
Burghartz
,
M.
,
Chaves-Moreno
,
D.
,
Otto
,
A.
,
Hentschker
,
C.
,
Fuchs
,
S.
et al
(
2015
)
A metaproteomics approach to elucidate host and pathogen protein expression during catheter-associated urinary tract infections (CAUTIs)
.
Mol. Cell. Proteomics
14
,
989
1008
27
Kalyoussef
,
S.
,
Nieves
,
E.
,
Dinerman
,
E.
,
Carpenter
,
C.
,
Shankar
,
V.
,
Oh
,
J.
et al
(
2012
)
Lactobacillus proteins are associated with the bactericidal activity against E. coli of female genital tract secretions
.
PLoS ONE
7
,
e49506
28
Herbst
,
F.-A.
,
Lünsmann
,
V.
,
Kjeldal
,
H.
,
Jehmlich
,
N.
,
Tholey
,
A.
,
von Bergen
,
M.
et al
(
2016
)
Enhancing metaproteomics—the value of models and defined environmental microbial systems
.
Proteomics
16
,
783
798
29
Xiong
,
W.
,
Abraham
,
P.E.
,
Li
,
Z.
,
Pan
,
C.
and
Hettich
,
R.L.
(
2015
)
Microbial metaproteomics for characterizing the range of metabolic functions and activities of human gut microbiota
.
Proteomics
15
,
3424
3438
30
Lee
,
P.Y.
,
Chin
,
S.-F.
,
Neoh
,
H.-m.
and
Jamal
,
R.
(
2017
)
Metaproteomic analysis of human gut microbiota: where are we heading?
J. Biomed. Sci.
24
,
36
31
Tanca
,
A.
,
Palomba
,
A.
,
Pisanu
,
S.
,
Addis
,
M.F.
and
Uzzau
,
S.
(
2015
)
Enrichment or depletion? The impact of stool pretreatment on metaproteomic characterization of the human gut microbiota
.
Proteomics
15
,
3474
3485
32
Zhang
,
X.
,
Li
,
L.
,
Mayne
,
J.
,
Ning
,
Z.
,
Stintzi
,
A.
and
Figeys
,
D.
(
2017
)
Assessing the impact of protein extraction methods for human gut metaproteomics
.
J. Proteomics
Epub ahead of print
33
Cantarel
,
B.L.
,
Erickson
,
A.R.
,
Verberkmoes
,
N.C.
,
Erickson
,
B.K.
,
Carey
,
P.A.
,
Pan
,
C.
et al
(
2011
)
Strategies for metagenomic-guided whole-community proteomics of complex microbial environments
.
PLoS ONE
6
,
e27173
34
Li
,
J.
,
Jia
,
H.
,
Cai
,
X.
,
Zhong
,
H.
,
Feng
,
Q.
,
Sunagawa
,
S.
et al
(
2014
)
An integrated catalog of reference genes in the human gut microbiome
.
Nat. Biotechnol.
32
,
834
841
35
Muth
,
T.
,
Kolmeder
,
C.A.
,
Salojärvi
,
J.
,
Keskitalo
,
S.
,
Varjosalo
,
M.
,
Verdam
,
F.J.
et al
(
2015
)
Navigating through metaproteomics data: a logbook of database searching
.
Proteomics
15
,
3439
3453
36
Jagtap
,
P.
,
Goslinga
,
J.
,
Kooren
,
J.A.
,
McGowan
,
T.
,
Wroblewski
,
M.S.
,
Seymour
,
S.L.
et al
(
2013
)
A two-step database search method improves sensitivity in peptide sequence matches for metaproteomics and proteogenomics studies
.
Proteomics
13
,
1352
1357
37
Tanca
,
A.
,
Palomba
,
A.
,
Deligios
,
M.
,
Cubeddu
,
T.
,
Fraumene
,
C.
,
Biosa
,
G.
et al
(
2013
)
Evaluating the impact of different sequence databases on metaproteome analysis: insights from a lab-assembled microbial mixture
.
PLoS ONE
8
,
e82981
38
Pible
,
O.
and
Armengaud
,
J.
(
2015
)
Improving the quality of genome, protein sequence, and taxonomy databases: a prerequisite for microbiome meta-omics 2.0
.
Proteomics
15
,
3418
3423
39
Zhang
,
X.
,
Ning
,
Z.
,
Mayne
,
J.
,
Moore
,
J.I.
,
Li
,
J.
,
Butcher
,
J.
et al
(
2016
)
MetaPro-IQ: a universal metaproteomic approach to studying human and mouse gut microbiota
.
Microbiome
4
,
31
40
Oberbach
,
A.
,
Haange
,
S.-B.
,
Schlichting
,
N.
,
Heinrich
,
M.
,
Lehmann
,
S.
,
Till
,
H.
et al
(
2017
)
Metabolic in vivo labeling highlights differences of metabolically active microbes from the mucosal gastrointestinal microbiome between high-fat and normal chow diet
.
J. Proteome Res.
16
,
1593
1604
41
Mayers
,
M.D.
,
Moon
,
C.
,
Stupp
,
G.S.
,
Su
,
A.I.
and
Wolan
,
D.W.
(
2017
)
Quantitative metaproteomics and activity-based probe enrichment reveals significant alterations in protein expression from a mouse model of inflammatory bowel disease
.
J. Proteome Res.
16
,
1014
1026
42
Koskenniemi
,
K.
,
Laakso
,
K.
,
Koponen
,
J.
,
Kankainen
,
M.
,
Greco
,
D.
,
Auvinen
,
P.
et al
(
2011
)
Proteomics and transcriptomics characterization of bile stress response in probiotic Lactobacillus rhamnosus GG
.
Mol. Cell. Proteomics
10
,
M110.002741
43
Koponen
,
J.
,
Laakso
,
K.
,
Koskenniemi
,
K.
,
Kankainen
,
M.
,
Savijoki
,
K.
,
Nyman
,
T.A.
et al
(
2012
)
Effect of acid stress on protein expression and phosphorylation in Lactobacillus rhamnosus GG
.
J. Proteomics
75
,
1357
1374
44
Espino
,
E.
,
Koskenniemi
,
K.
,
Mato-Rodriguez
,
L.
,
Nyman
,
T.A.
,
Reunanen
,
J.
,
Koponen
,
J.
et al
(
2015
)
Uncovering surface-exposed antigens of Lactobacillus rhamnosus by cell shaving proteomics and two-dimensional immunoblotting
.
J. Proteome Res.
14
,
1010
1024
45
Derrien
,
M.
,
Vaughan
,
E.E.
,
Plugge
,
C.M.
and
de Vos
,
W.M.
(
2004
)
Akkermansia muciniphila gen. nov., sp. nov., a human intestinal mucin-degrading bacterium
.
Int. J. Syst. Evol. Microbiol.
54
,
1469
1476
46
Ottman
,
N.
,
Davids
,
M.
,
Suarez-Diez
,
M.
,
Boeren
,
S.
,
Schaap
,
P.J.
,
Martins Dos Santos
,
V.A.P.
et al
(
2017
)
Genome-scale model and omics analysis of metabolic capacities of Akkermansia muciniphila reveal a preferential mucin-degrading lifestyle
.
Appl. Environ. Microbiol.
83
,
e01014-17
47
Plovier
,
H.
,
Everard
,
A.
,
Druart
,
C.
,
Depommier
,
C.
,
Van Hul
,
M.
,
Geurts
,
L.
et al
(
2017
)
A purified membrane protein from Akkermansia muciniphila or the pasteurized bacterium improves metabolism in obese and diabetic mice
.
Nat. Med.
23
,
107
113
48
Poppleton
,
D.I.
,
Duchateau
,
M.
,
Hourdel
,
V.
,
Matondo
,
M.
,
Flechsler
,
J.
,
Klingl
,
A.
et al
(
2017
)
Outer membrane proteome of Veillonella parvula: a diderm firmicute of the human microbiome
.
Front. Microbiol.
8
,
1215
49
Glatter
,
T.
,
Ludwig
,
C.
,
Ahrné
,
E.
,
Aebersold
,
R.
,
Heck
,
A.J.R.
and
Schmidt
,
A.
(
2012
)
Large-scale quantitative assessment of different in-solution protein digestion protocols reveals superior cleavage efficiency of tandem Lys-C/trypsin proteolysis over trypsin digestion
.
J. Proteome Res.
11
,
5145
5156
50
Kusebauch
,
U.
,
Campbell
,
D.S.
,
Deutsch
,
E.W.
,
Chu
,
C.S.
,
Spicer
,
D.A.
,
Brusniak
,
M.-Y.
et al
(
2016
)
Human SRMAtlas: a resource of targeted assays to quantify the complete human proteome
.
Cell
166
,
766
778
51
Tyanova
,
S.
,
Temu
,
T.
,
Sinitcyn
,
P.
,
Carlson
,
A.
,
Hein
,
M.Y.
,
Geiger
,
T.
et al
(
2016
)
The Perseus computational platform for comprehensive analysis of (prote) omics data
.
Nat. Methods
13
,
731
740
52
Lagkouvardos
,
I.
,
Overmann
,
J.
and
Clavel
,
T.
(
2017
)
Cultured microbes represent a substantial fraction of the human and mouse gut microbiota
.
Gut Microbes
8
,
493
503
53
Xiong
,
W.
,
Giannone
,
R.J.
,
Morowitz
,
M.J.
,
Banfield
,
J.F.
and
Hettich
,
R.L.
(
2014
)
Development of an enhanced metaproteomic approach for deepening the microbiome characterization of the human infant gut
.
J. Proteome Res.
14
,
133
141

Author notes

*

Present address: Max Planck Institute for Developmental Biology, Germany.