The principle of monoclonality with regard to bacterial infections was considered immutable prior to 30 years ago. This view, espoused by Koch for acute infections, has proven inadequate regarding chronic infections as persistence requires multiple forms of heterogeneity among the bacterial population. This understanding of bacterial plurality emerged from a synthesis of what-were-then novel technologies in molecular biology and imaging science. These technologies demonstrated that bacteria have complex life cycles, polymicrobial ecologies, and evolve in situ via the horizontal exchange of genic characters. Thus, there is an ongoing generation of diversity during infection that results in far more highly complex microbial communities than previously envisioned. This perspective is based on the fundamental tenet that the bacteria within an infecting population display genotypic diversity, including gene possession differences, which result from horizontal gene transfer mechanisms including transformation, conjugation, and transduction. This understanding is embodied in the concepts of the supragenome/pan-genome and the distributed genome hypothesis (DGH). These paradigms have fostered multiple researches in diverse areas of bacterial ecology including host–bacterial interactions covering the gamut of symbiotic relationships including mutualism, commensalism, and parasitism. With regard to the human host, within each of these symbiotic relationships all bacterial species possess attributes that contribute to colonization and persistence; those species/strains that are pathogenic also encode traits for invasion and metastases. Herein we provide an update on our understanding of bacterial plurality and discuss potential applications in diagnostics, therapeutics, and vaccinology based on perspectives provided by the DGH with regard to the evolution of pathogenicity.
Established paradigms of monoclonality may have adequately described a subset of acute bacterial infections, however, they have been found lacking when dealing with persistent and chronic bacterial infections, including biofilms. Monoclonality implies that colonizing bacteria all belong to the same genotype; however, this is an oversimplified view of most bacterial infections, especially those that are chronic. Traditional diagnostic methods that relied on bacterial cultures helped to entrench these paradigms in medical microbiology. However, newer diagnostic and research tools, such as polymerase chain reaction (PCR), advanced microscopic techniques for metabolic profiling, and whole-genome sequencing (WGS), have demonstrated extensive phenotypic and genotypic diversity among bacterial strains within a population or species [1–3]. Genotypic diversity within infecting bacterial populations suggests these populations are in fact polyclonal, rather than monoclonal; i.e. multiple independent strains of the same bacterial species are present in these populations at the same time [1,3–7]. This diversity also implies that even mono-species bacterial populations are neither phenotypically or genotypically clonal .
Genetic diversity within a prokaryotic species incorporates two distinct phenomena: (1) genetic heterogeneity, i.e. individuals in a population possess different alleles (variant forms of the same gene); and (2) genomic plasticity, i.e. individuals in a population possess different genes [1,8]. The rubric of genomic plasticity, thus teaches that the genome of one strain within a species cannot account for the entire set of genes for the species as a whole, and that studies of individual isolates will greatly underestimate the biological properties of the species. Therefore, the genomes of multiple independent isolates need to be fully characterized to even estimate the total biological complexity of a bacterial species [8–16]. Collectively, the genes contained in the genomes of all individuals within a species comprise the supragenome (or pan-genome) of that species [8,9,17]. Multiple mathematical models have been developed to estimate the species-level supragenome based on the sequencing of a limited number of strains chosen to be unclustered with respect to geographic location and phenotypic properties [13,14,18,19]. The concepts of genomic plasticity and the supragenome/pan-genome are integral components of the distributed gene hypothesis (DGH), which posits that not all individuals in a population have the same set of genes, and that no one individual strain has all the genes of a species. Individual members of the population have ‘access to’ a species supragenome from which genetic material can be taken up and recombined as a means of generating genomic diversity [1,2,7]. This genomic plasticity, propagated by horizontal gene transfer (HGT), is highly conserved among bacterial species. There are multiple examples wherein HGT has been shown to be the mechanism that has provided for bacterial adaptation, survival, and evolution through the transfer of genes that provide for adaptation to various environmental pressures including host immune surveillance, antibiotic resistance, and changes in pH [20–26]. Obviously, vertical transmission of the core genome is required to maintain species viability and to provide a framework within which HGT mechanisms operate.
In this review, we provide an updated perspective with regard to bacterial genomic plasticity, and how HGT mechanisms together with a large, species-level, supragenome provides for bacterial adaptation. In addition, we show how these new understandings provide for targeted clinical interventions that have the potential to fundamentally change how we approach microbial infections such that it should be possible to ‘surgically’ eliminate highly pathogenic strains without undue damage to the host's microbiome
Bacterial genomic plasticity
Bacterial genomic plasticity is characterized by each strain in a population possessing a unique set of distributed/accessory genes from the population supragenome [1,7,13,15]. We have demonstrated high degrees of genomic plasticity within multiple species that underlies the observation that these species have both pathogenic and commensal strains, these include: Haemophilus influenzae [10,14,19], Pseudomonas aeruginosa [12,27,28], Streptococcus pneumoniae [11,18–20,29], Moraxella catarrhalis [30,31] among others. Genomic diversity results from a balance between genome expansion (addition of genes) and genome reduction (deletion of genes) [14–15]. Our analyses comparing type strains of both H. influenzae (Rd) and S. pneumoniae (Sp6) to clinical isolates (of each of their respective species) revealed only minor changes in genome size, but more than 200 INDELS per genome-pair comparison (on average) indicating that there is very extensive recombination occurring [14,29]. Genomic plasticity is generated via three types of horizontal gene transfer (HGT) processes: (1) transformation; (2) conjugation; and (3) bacteriophage-mediated transduction (Figure 1). A large proportion of novel bacterial genes arise from DNA duplication and other DNA-modifying processes , or via acquisition from mobile genetic elements (e.g. transposons, insertion elements, etc.) and transferable plasmids all of which provide genomic fodder for the evolution of novel functions as they are not associated with gene loss.
Vertical gene transfer is the standard mechanism by which a mother cell replicates its entire DNA complement and passes identical (or nearly identical) copies of each chromosome and plasmid/episome (an episome is usually described as a plasmid that is able to integrate into the main bacterial chromosome) to each daughter cell during cell division . In contrast, the HGT processes involve unidirectional gene movement between two, often unrelated, bacterial cells in which one or more blocks of donor chromosomal DNA (range: several hundred bases to >100 Kb) (and/or plasmid and episomal DNAs in the case of conjugation) are transferred into the recipient cell resulting in either the partial replacement of the recipient bacterium's chromosome or the acquisition of a new replicon [1,7]. Mell et al.  using high molecular mass DNA for in vitro transformation of H. influenzae found that the mean recombination tract length was 8.1 ± 4.5 Kb. Hiller et al.  in a study of S. pneumoniae evolution in situ during a polyclonal pediatric infection found 23 transformation events ranging in size from 0.4 to 235 Kb, with a mean size of 28 Kb and a median size of 13 Kb.
It is important to understand that the cell from which the transferred DNA comes does not necessarily have to be viable at the time of transfer (and never is in the case of transduction). Each of the HGT mechanisms can occur between different strains of the same species and also between related species. Conjugation, however, particularly when it involves episomal transfers, can also occur between very divergent species that exist in different phyla [32,33]. Transformation rates decline rapidly with increased levels of genetic heterogeneity as they rely on homologous recombination machinery within the recipient cell [34–37]. Transduction is based on the host-range of the infecting temperate bacteriophage and is, therefore, usually confined to a single species or closely related sister species.
Some bacterial species use a single HGT process, while others use two or even all three. For example, Staphylococcus aureus principally relies on transduction — and accordingly has a much smaller pan-genome than many other bacterial species [19,38]. In contrast, Escherichia coli uses both mating  and transduction [40,41] whereas H. influenzae utilizes all three HGT processes [37,42–44]. Mating and transformation are active processes, requiring that the donor and recipient organisms live in close proximity to one another. Both processes require substantial energetic expenditures by one or the other of the participating bacteria. Thus, either the donor or the recipient bacteria, and indirectly the bacterial population engaging in these processes must receive an evolutionary advantage from these processes, or they would not persist. Any mechanistic process that requires energetic expenditures, by definition must provide an evolutionary advantage for its continued propagation. In the case of mating, the primary energy expenditure is via the male (DNA donating) bacteria. This process has been described as parasitic and an example of selfish genes ensuring their own propagation . This is because the genes for the mating apparatus are parts of transposable elements (either chromosomally or episomally located) that collectively ensure their own horizontal propagation — as well as transferring physically adjacent genes that have been selected for their ability to provide for survival advantages to the host bacteria in stressful environments. Thus, mating often also results in the transfer of genes that enhance the survival of the recipient demonstrating that a singular process can have multiple ecological outcomes. In the mating process, two live bacteria are joined temporarily by a pilus (or similar structure) through which one bacterium sends a copy of its DNA (chromosomal and/or episomal) into the other bacterium. Therefore, the evolutionary advantage accrues to both the donor (selfish gene) as it propagates itself, and the recipient as it gains genes enabling survival such as those encoding resistance for antibiotics and heavy metals. For these reasons, we have referred to these HGT processes as population-level virulence (or survival) traits . Similarly, the benefit of transformation accrues to the recipient and the population.
In the case of competence and transformation, the energy expenditure is via the recipient bacteria as it uptakes DNA from its environment (competence) and homologously recombines the exogenous DNA into its chromosome (transformation). This results in changing the genotype and phenotype of the cell which may provide a selective advantage in a stressful environment. Multiple transfers distributed around the chromosome may occur during a single competence event [20,7,46–48]. Competence, the metabolic state of being able to take up foreign DNA from the environment into the cell, is triggered by nutrient limitation or other stress conditions as part of the bacterial cell's SOS response. One of the molecular triggers of competence among the Streptococci, and related gram-positive bacteria, is the production of a quorum-sensing pheromone, competence stimulating peptide (CSP). CSP is a seventeen amino acid peptide that serves as an intercellular activating signal leading to the expression of the ComABCDE regulon that encodes genes that control and produce the cellular machinery necessary for competence and transformation. In S. pneumoniae there are several CSP alleles that divide the species into specificity groups as each CSP has a corresponding receptor (ComD) which is specific for a given CSP peptide. In some cases, particularly among the Streptococci and Vibrio spp. competence is also an auto-parasitic process, as the first bacteria in a population to become competent kill their neighbors to ensure a source of DNA for transformation [49–52]. Finally, viral transduction results when a lysogenic (temperate) bacterial virus, or bacteriophage, excises itself from the host genome and inadvertently takes some of the host's genes along with its own genes and then reinserts these genes into the recipient's genome when it establishes a lysogenic state [1,7,15]. Through transduction (specialized or generalized), bacteriophages play a major role in HGT, especially regarding antibiotic resistance, virulence factors, and invasion-related functions [13,15]. Viral transduction is a bacterially passive process for both the donor and recipient bacteria (in that it results from viral infection) which can result in bacterial species living in different environments to exchange genetic material [7,13,15].
DNA exchange via these HGT mechanisms provides for diversity generation within a bacterial population; and thus, they provide the same advantages as sex in eukaryotic organisms and can be thought of analogous processes [53,54]. Recently Colnaghi et al.  demonstrated that eukaryotic sex developed to provide the same population-level benefits as HGT in prokaryotes, including protection form Muller's ratchet, because as genome size increased there was a need for increases in recombination length. Without the ability to rapidly generate diversity, all individuals within the population would have the same fitness level with respect to environmental challenges which would decrease a population's chance of survival during times of environmental change. Moreover, the lack of HGT would result in the bacterial equivalent of in-breeding as there would be no mechanism to replace mutant loss-of-function alleles; this would, according to Muller's ratchet, result in their eventual extinction caused by the accumulation of slightly deleterious mutations via genetic drift . The universality of the need for genetic recombination was recently reinforced when it was found that there is extensive HGT among metazoan ‘asexual’ bdelloid rotifers. The authors of this work, and those commenting on its implications, have clearly and unequivocally construed HGT as the equivalent of sex among these metazoans, and used it to explain their long-term survival as it creates diversity preventing evolutionary bottlenecks associated with the lack of sex [57,58]. Thus, bacterial HGT mechanisms help to ensure that the population survives both during periods of environmental and nutritional challenges  by providing an ‘evolutionary shortcut’ that enables organisms to quickly adapt to a changing or new environment  and by providing for replacement of mutant alleles [56,59]. HGT can also induce/produce major lifestyle changes for recipient cells and permit radiation into different ecological niches  such as occurs when commensal organisms within a holobiont [60–62] acquire virulence genes and become pathogens. Taken together, these observations suggest that evolutionary pressures select for mechanisms that generate diversity .
A species’ supragenome contains three different sets of genes: (1) the core genes, which are found in all strains of the species; (2) the distributed/accessory genes, which are found in a subset of strains of the species; and (3) the shell/unique genes, which are found in only a very small fraction of the strains of the species. Thus, in addition to the set of core genes, each strain has its own unique set of noncore genes from the supragenome [1,9,63]. The core genome consists of ‘essential’ genes responsible for the basic aspects of a species’ metabolism and major phenotypic traits [13,15], including genes for housekeeping functions, such as energy production, amino acid metabolism, nucleotide metabolism, lipid transport, and translational machinery . In contrast, the noncore (distributed, accessory, or adaptive) genome includes genes encoding for supplementary or modified biochemical functions that may be useful in contexts other than basic survival, such as adaptation to new environments, antibiotic resistance, or colonization of new environments and hosts [13,15]. Being noncore implies that these genes can be deleted from the genome, but such deletions may result in the loss of important phenotypic traits, such as the ability to grow on nontraditional nutrients and substrates, virulence and antibiotic resistance [13,15]. Early studies demonstrated that the majority of noncore genes within a species supragenome have been evolving with the core genes of that species  and that many of the unannotated distributed genes are associated with survival in different environments [25,65]. The noncore genes also include those with parasitic functions, the selfish genes. These include those that promote their own transfer and propagation  as well as those that run a ‘protection racket’ — the toxin–antitoxin genes .
WGS of multiple strains of multiple bacterial species has demonstrated that, with each new strain sequenced, multiple new genes are found. The number of novel genes added per genome, ranges from many hundreds at the start of a pan-genome project to just a handful as the growth becomes asymptotic [9–12,9–14,16,19–20,31,68]. These early observations led to the recognition that in most cases thousands of genomes would be needed to fully describe the supragenomes/pan-genomes of many bacterial species . Recently Park et al.  performed pan-genome analyses on seven pathogenic bacterial species using data downloaded from some 27 000 genomes from the NCBI prokaryotic genome database. They calculated pan-genome sizes of 22 000 genes for S. aureus and 128 193 genes for E. coli with the others being intermediate. It is interesting to note, that even the smallest of these bacterial pan-genomes contain as many genes as the human genome based on these recent estimates .
Our laboratory's recent sequencing of more than 2000 H. influenzae clinical isolates has demonstrated that we have yet to fully characterize this species’ supragenome, albeit we are getting very close as our recent analyses have not identified any novel MLST types (unpublished observations); moreover, the number of novel genes identified with each additional strain sequenced is now in the single digits. This analysis includes specimens from all six permanently inhabited continents, and from patients with every known disease type that H. influenzae has been associated with; therefore we think the probability is high that there are very few widely divergent genotypes that have not been characterized, and that the number of additional novel genes to be identified is relatively small compared with the number that has been found. With this knowledge, the best means to approximate a bacterial species genome in terms of gene numbers are mathematical models, including the Pan-Genome Model [13,18] and the Finite Supragenome Model [14,16,19] both of which have proven to be quite accurate in predicting the total number of genes within a species using data from WGS of a few independent isolates.
As no two strains in a species contain the same complement of genes (per the DGH), collectively the species’ supragenome/pan-genome is often quite large and can actually exceed the number of genes in mammalian genomes. For example, as noted above, there are over 100 000 genes in the E. coli supragenome. However, it is also easy for the prokaryotic cell to undergo genomic deletions, disposing of unnecessary or deleterious genes . The size of a species’ supragenome/pan-genome, relative to its core genome size, is highly variable among bacterial species. In an analysis of 295 species-specific supragenome projects published from 2005 to 2019, the supragenome was found to be substantially larger than the core genome in essentially all cases with the core genome making up from <20% to >60% of the supragenome  (Figure 2). It is interesting to note that both genome and supragenome size are associated with a species biology. Free-living environmental bacterial species tend to have the largest genomes 4–12 megabases (Mb); with commensal and pathogenic bacteria having intermediate size genomes 1.5–4 Mb; and obligate and intracellular pathogens having the smallest genomes 0.6–1.5 Mb. These rules, however, are not hard and fast as there are exceptions as not all pathogens have a reduced genome size .
In summary, the DGH provides a theoretical framework for understanding bacterial genomic plasticity and the supragenome/pan-genome are the functional constructs that embody the genic diversity that the DGH predicts. The DGH states, with respect to chronic bacterial pathogens, that they utilize a survival strategy wherein a majority of their genes are distributed among a population and are not found in all members of a species; thus there exists a supragenome at the population level which is greater than the genome of any one organism and that this distribution of genes among a population serves as a population-level virulence factor  that provides for improved population survival through continual HGT mechanisms which provide for rapid adaptation to environmental conditions through the reassortment of genes (and alleles). Moreover, the set of genes in a species’ supragenome can expand through the introduction of genes via inter-species exchange or via gene duplication and the evolution of paralogous functions within a species. This is not to say that all genes and all different gene combinations will lead to correspondingly large phenotypic differences as multiple genes can often provide the same functionality; moreover, it is quite possible that some genes that enter a genome via HGT mechanisms may not be functional in that genome due to the need for other corresponding gene products or signaling pathways that are not present in the new host. With those caveats, the shuffling of noncore genes and alleles (different forms of the same gene) of core genes generates new combinations that are subsequently subjected to the forces of selection. Genomic plasticity represents a successful strategy for bacteria to adapt, survive, and evolve.
Research applications based on the supragenome
In the 15 years since the initial characterizations of bacterial supra/pan-genomes in both Gram-negative and Gram-positive species [9,10] the integration of the DGH with our parallel increases in understanding of microbial ecology has revolutionized thinking about many aspects of bacteriology. These include: (1) basic bacterial biology, e.g. their possession of a life cycle, evolution, population genetics, and taxonomy/phylogeny; (2) host–bacterial interactions both at the species and microbiome levels encompassing bacterial colonization, persistence and invasion into host tissues; and (3) clinical applications, e.g. the diagnosis, treatment, prevention, and epidemiology of bacterial infections [1,7,13,15,63]. It is important in these contexts to make the distinction between the bacterial meta-genome which refers to the collective genomes of multiple species assemblages, i.e. microbiomes, and the supra/pan-genome which refers to the collective genome of a single species, or population of that species within a particular microbiome. Here, we discuss several research applications from the current literature using these perspectives as theoretical bases.
Supragenome-wide association studies
The DGH teaches that each individual strain within a species and even within a polyclonal bacterial population possesses a unique complement of distributed genes. This means that each strain possesses variation with respect to gene possession with respect to all other strains. Thus, it follows in the case of pathogens that each strain has a unique set of heritable traits with regard to antigen presentation to the host; virulence, including antibiotic resistances and serum resistances; and tissue tropisms, all which contribute to a strain's pathogenicity affecting its ability to colonize, persist in the face of antimicrobial therapy, invade cells and tissues, metastasize to distant sites via systemic spread, and evade or disarms various aspects of the host's innate and adaptive immune systems [1,7,64]. Based on the DGH we developed a comparative and functional genomics program that has provided the data establishing the veracity of the postulate that the genetic determinants of virulence and antibiotic resistance are unique to each strain for multiple pathogenic bacterial species (reviewed in ). These include: the Gram-negative pathogens, H. influenzae [14,19,62,65,70–73], M. catarrhalis [30,31], P. aeruginosa [12,27,28], and Burkholderia cenocepacia ; and the Gram-positive pathogens S. pneumoniae [19–20,29,74–77], S. aureus , and Gardnerella vaginalis . This information can be exploited for the development of targeted prevention and treatment strategies [62,65,71].
We first proposed the use of SGWAS (SupraGenome-Wide Association Studies) for the identification of bacterial virulence and tropism genes over a decade ago . SGWAS, as with GWAS, can analyze large numbers of genetic variants (including gene possession variants) to test for a statistical association between each variant and a phenotype of interest  being careful to apply methodologies that account for multiple comparisons such as the Bonferroni correction or the Benjamini–Hochsberg method to decrease the false discovery rate. Since that time, we and others have used SGWAS as a valuable tool to identify specific bacterial genes that gives rise to specific phenotypic traits [65,71,78,79]. To date, bacterial GWAS studies have principally focused on identifying genes that are associated with clinically relevant phenotypes, such as virulence and antibiotic resistance [62,65,71,78,79] (Table 1).
Recently, we have conducted SGWAS studies using multiple advanced algorithms including machine-learning approaches. In Lee et al.  we conducted a SGWAS using Spearman rank correlation to study 215 B. cenocepacia strains isolated from 16 CF patients and observed recurrent loss-of-function mutations that were associated with decreases in biofilm formation. In two random-forest-based machine-learning SGWAS studies, both involving large numbers of H. influenzae strains (the first with >1600 genomes and the second with >200 genomes) that were performed to associate clinical provenance with gene presence/absence, we identified a preponderance of unannotated genes among the most important classifiers. This finding is of immense theoretical importance as it teaches us two important lessons. First, that these unbiased methods can point us to specific genes within the enormous background of genomic ‘dark matter’ that are relevant clinically, and second that their examination is highly likely to lead to novel biology, as we previously demonstrated . In this study, we applied statistical genetic analysis methods to clinical meta-data on large numbers of H. influenzae strains from which we identified multiple unannotated genes that were associated with virulence. Characterization of one of these genes, which we named msf1, demonstrated that it was a major virulence factor providing for invasion and survival in human macrophages, and also increased trafficking of H. influenzae to the brain in the Chinchilla lanigera model of otitis media leading to increased morbidity and mortality.
Microbiomes are polykingdom communities, often including bacteria, archaea, fungi, protozoa, and viruses, which colonize particular environments in or on animal bodies (skin, oral cavity, gastrointestinal tract, respiratory tract, urogenital tract, etc.), higher plants, soils and other terrestrial and aquatic environments. The microbiome can also be thought of as the combined genetics and metabolic capabilities of the community of organisms. For the host/holobiont, defined as host-microbiota symbioses [60–62], individual members of the microbiome can display a wide range of symbiotic relationships — from mutualism to commensalism to parasitism to pathogen.
Microbes inhabiting the human body (1) play roles in multiple important physiological functions, such as digestion, metabolism and immunity; (2) vary according to body site; and (3) in most cases establish an equilibrium with healthy hosts [80,81]. When this homeostasis is disrupted by an overgrowth of pathogenic microorganisms or by a lack of sufficient numbers of mutualistic or commensal microorganisms at particular body sites or expression of accessory genes triggered by some environmental stimulus, the consequences of the resulting microbial dysbiosis are malfunctioning physiological processes and perhaps ultimately disease [82–86]. Thus, understanding host–microbiome interactions provides insights into disease diagnoses, treatment, and prevention. Disturbances in microbiomes at various human body sites have been linked to the development of various traits and diseases, including weight gain, obesity, inflammatory bowel disease, diabetes, liver cirrhosis, cardiovascular disease, rheumatoid arthritis, cancer, depression, autism, asthma [80,84], and even premature and still birth [86,87]. Treatments involving the human oral and gut microbiomes include probiotics, which have become popular with the public in recent years. More recently microbiome transplantations (although the Chinese have been using them for thousands of years) have also captured the public imagination, however, they are still largely in the experimental stages at this point, but they have produced promising results as a potential therapy for diseases caused, or exacerbated by, microbial dysbiosis . Transplantation of the intestinal and skin microbiomes have been used to treat a range of diseases. Fecal microbial transplants (FMT) have shown promising results in patients with obesity, C. difficile infections, and ulcerative colitis [89,90]. There are ongoing trials to measure the potential therapeutic effects of FMT in a host of other diseases; however, the benefits have yet to be determined . Though some benefits of FMT have been shown, there have also been adverse effects, such as obesity post-transplant and infection  and even infection and death . Hopefully, with a better understanding of strain-specific genetic profiles, donors and mock-laboratory-built fecal microbiomes could be more carefully and strategically selected/constructed to avoid adverse effects owing to pathogenic distributed genes in strains from donor samples. This could be accomplished using a combination of 16S microbiome analyses with metagenomic sequencing [94,95] to provide data on species/strain composition and gene content which could be used to aid in the creation of a safer fecal transplant.
While the gut microbiome has been investigated extensively, the skin microbiome has been the focus of more recent research. The skin is colonized by a large number of diverse microorganisms, of which most are beneficial or harmless [96,97]. The composition of the abundant species is relatively stable over time, although varies with anatomical site. However, skin-associated diseases, such as acne vulgaris, eczema, psoriasis, and dandruff are associated with strong and specific microbiome alterations. Thus, manipulation of the skin microbiome holds promise as a novel therapeutic approach for these diseases (e.g. [98,99]). Paetzold et al.  used mixtures of different skin microbiome components to alter the composition of recipient skin microbiomes and showed that, after sequential applications of donor microbiomes, recipient microbiomes became more similar to those of the donors. As the degree of engraftment depended on the recipient and donor microbiome composition, applied bacterial load, and application site, these parameters will need to be explored more fully in future experiments.
Microbiome-sparing therapeutics and vaccines
While HGT drives the evolution of many virulent and drug-resistant bacterial strains that contribute to increasing levels of morbidity and mortality [7,15], studies of pathogen supragenomes/pan-genomes, on the other hand, help to identify distributed genes that could serve as biomarkers of virulence, and perhaps more importantly as targets for precision medicine-based treatments and preventions [15,65]. For species-level diagnostic purposes the core genome can be utilized as it contains genes possessed by every member of the species, but we need to use the noncore (distributed/accessory) genes to identify the strains with specific phenotypes if we wish to target specific populations for intervention (since different genes and gene combinations produce different disease phenotypes and tissue tropisms) [48,65]. For global prevention and treatment, core genes can be used to target an entire species whereas targeting distributed genes allows for selective strain targeting, ensuring that only strains containing the gene of interest are affected (microbiome-sparing approach) [48,65]. As a technical note, the risk of misclassifying genes as core or distributed/accessory (when they are not) has decreased considerably as the field moves to long-read genomic sequencing methodologies such as PacBio and Oxford Nanopore. These methodologies which have been employed for the last half-dozen years routinely provide closed circular genomes directly from the initial sequencing run. Thus, they have eliminated the need to start genome assemblies with alignments to a reference genome which can be problematic when there is extensive genomic plasticity, and instead provide high-quality sequences for de novo assemblies [25,68,73].
Antibiotics have been considered the standard-of-care for the treatment of bacterial infections caused by drug-susceptible organisms since World War II. However, this situation is changing due to the worldwide spread of antibiotic resistance driven by widespread use (and misuse) of antibiotics [101,102]. This emergence of drug-resistant bacterial pathogens has led to a decline in the efficacy of traditional antimicrobial therapies and has greatly limited the repertoire of antibiotics available to effectively treat patients [101,102]. The rapid rise in multi-drug-resistant bacteria is a direct consequence of natural selection operating on HGT-driven mechanisms of gene exchange whereby multiple antibiotic resistance genes have become clustered together with the genes that promote conjugation. Thus, a single gene transfer event can provide for the survival of the recipient in the face of combination antibiotic therapy.
Much of the rise in antibiotic resistances (both individual resistances and multi-drug resistances) could have been avoided, or at least greatly delayed, by the employment of common-sense guidelines for treatment. Easily the greatest mistake made by antibiotic stewards worldwide for most of the last 75 years was due to their slavish adherence to the ‘one antibiotic at a time’ doctrine. On the face of it, this was a statistically flawed approach from the start. If resistance to a given antibiotic were to arise spontaneously in one in 107 bacteria (a widely agreed upon rate), then an infection with 109 bacteria (not an unreasonable number) would produce 100 resistant bacteria that would survive, and could go on to colonize and infect other individuals — which is precisely what has occurred over and over again. If, on the other hand, the patient had been treated with two (or even better three) different antibiotics with non-overlapping mechanisms of action then to produce a doubly resistant organism it would require an infection with 1014 bacteria (107×107 = 1014) — which is a number far greater than any infection, even during sepsis; and to produce a triply resistant organism it would require a starting population of 1021 bacteria which is equivalent to 1000 metric tons of bacteria! Unfortunately, it is now too late to universally adopt such an approach and have a uniform positive patient outcome. This is because our past mistakes, caused largely by an unawareness of the teachings of the DGH, have resulted in the evolution of transferable plasmids and transposons that not only promote HGT of their own core dispersal genetic machinery (selfish genes), but also include multiple genetically encoded resistances to antibiotics. Thus, a single HGT event will result in the formation of a new multiply-drug-resistant bacterial strain [103,104]. However, one need only look at HIV HAART (highly active anti-retroviral therapy) to understand the utility of such a multi-target approach to pathogen treatment. The simultaneous targeting of multiple HIV enzymatic functions, in the late 1990s, changed HIV-1-related disease from a near certain death sentence to a treatable chronic disease.
Historically, antibiotics have been identified by screening natural compounds for their ability to kill bacteria grown in vitro. Recently, genome sequencing and supragenomic characterization of microorganisms have enabled the collection of detailed information regarding the physiological repertoire of entire microbial species (vide supra). This has led to a shift in the discovery of novel anti-microbials from an empirical approach to a knowledge-based approach based on specified targets. In these ‘target-based’ approaches, potential drugs are identified, or preferentially, designed using in silico modeling algorithms [105,106] that take advantage of structural information of the target molecule predefined via its roles in a key metabolic process. Comparative metabolomics  combined with comparative genomics  can be used to identify genes essential for pathogen survival and pathogenicity, which are then tested as targets of specific compounds derived from large chemical libraries [105,106].
Aside from the antibiotic resistance crisis, research has revealed the harmful effects of broad-spectrum antibiotic therapy on the community structure of beneficial host microbiota, which in turn can have negative effects on long-term host health . To combat these threats, the microbiome-sparing approach aims to modify or replace broad-spectrum antibiotics with precision anti-microbials that selectively target and remove pathogenic strains while leaving the community structure of the surrounding microbiota unchanged . These new therapeutic strategies include the development of anti-virulence compounds that inhibit specific bacterial pathogenesis and persistence traits of targeted strains encoded by distributed genes, which therefore spare other strains of the same species that do not encode such virulence genes. As such these strategies are designed to identify compounds that are bactericidal or bacteriostatic to a minimal number of bacterial strains , and thus provide precision treatments, based on druggable small molecules and natural compounds that are bacterial strain specific (Table 2). By preserving patients’ microbiota, this microbiome-sparing strategy of identifying pathogen-specific targets has the potential to improve patient health during and after bacterial infections [48,65,102].
Whole-genome sequencing in bacterial diagnostics
Identification and characterization of infection-causing microorganisms are crucial for successful treatment, recovery, and safety of patients. Culture fails to detect an organism in 80% of cases in which a patient has signs and/or symptoms of infection. This under-detection has multiple causes, including: antibiotic treatment; bacteria growing as biofilms; and slow-growing, fastidious organisms that cannot be cultured or literally take weeks to culture [2,64,108–129]. Conventional clinical diagnostic culture methods are biased toward the 2% of microorganisms able to grow rapidly in standard culture media. Particularly for chronic or biofilm-related infections, these cultured bacteria are usually presumed to be relevant. However, we now know that in many chronic infections the organisms that grow out rapidly are often not the representative of the species that are driving the infection [120–122,130–141]. In part, this understanding is based on the reduction in time to closure of chronic wounds when patients are treated based on the results of molecular diagnostics and WGS as opposed to culture [130–133]. A meta-analysis currently ongoing in the authors’ laboratories covering multiple studies of multiple infectious diseases over more than a decade, in which multiple molecular diagnostic methods were used to diagnose over 7000 infections across a wide range of clinical presentations and anatomic sites demonstrated that under the best conditions culture would detect S. aureus ∼50% of the times it was actually present, and could detect other staphylococcal and most streptococcal species ∼20% of the time. For nearly all other pathogens the detection rate was less than 20% and for anaerobes such as Cutibacterium acnes (formerly Proprionibacterium acnes), and multiple Treponena spp. Prevatella spp. it was far less. Culture is also inadequate to rapidly and accurately distinguish among multiple strains and sub-strains of even the most common pathogens, in part because they often evolve rapidly through HGT mechanisms during the infectious process [7,16,20,25,28,68,142].
Despite its shortcomings, culture is entrenched as the primary diagnostic technique for the identification and characterization of bacterial pathogens. Nucleic acid amplification techniques (principally PCR) overcame many of these shortcomings, and DNA sequencing methodologies, such as multiple-locus sequence typing (MLST), were a step in the right direction, but they lack the resolution needed for strain-specific diagnostics [63,143–145]. Thus, there was a need for better molecular diagnostic tools to improve the accuracy and efficiency of diagnoses.
WGS promises to provide the ultimate in resolution for strain, and even sub-strain identification, as it takes into account the genomic plasticity and supragenomes of bacterial species [63,142] and can be used to identify novel strains as they appear. The superior resolution of WGS in the identification and characterization of pathogens has great potential for routine use in diagnostic laboratories (e.g. [143,144]). However, despite this promise, WGS has not completely superseded current diagnostic methods in most clinical microbiology laboratories. There are several obstacles to its routine implementation, including the high cost of WGS, a lack of training in bioinformatics among clinical microbiologists, a lack of the necessary computational infrastructure in most hospitals, and the difficulty of establishing the proper bioinformatic protocols . Much of this can be overcome through the use of standardized or centralized ‘cloud-based’ computational systems. The European Society of Clinical Microbiology and Infectious Diseases (ESCMID) has recently published a review covering the need for such systems and a framework for their implementation [143,145]. As with any system of analysis, appropriate technical requirements need to be instituted to prevent the caveats associated with poor quality sequencing that can lead to low-coverage and lost sequences. Doyle et al.  have highlighted the need for quality sequence for clinical diagnostics wherein they showed missed WGS-based antibiotic resistance calls due to short read, low-coverage data. This has led to the push to adopt long-read, and even circular consensus sequencing, which obviate these issues [37,147–148].
Reverse and microbiome-sparing approaches to bacterial vaccinology
For over a century, infectious diseases have been controlled by vaccination and the administration of antibiotics . Nevertheless, pathogenic microorganisms are still the most important threat to health worldwide . Conventional vaccinology approaches were successful in conferring protection against some but not all infectious diseases . The vaccine counterpart to antibiotic resistance — ‘vaccine escape’ — is partly responsible for vaccine failure. Vaccine escape implies that the target pathogen has mutated such that it no longer expresses the same form of the antigen as used to prepare the vaccine. This type of vaccine failure is completely different from those associated with host-based ‘immunological windows’ wherein antigen presentation does not induce a memory immune response to the vaccine antigen.
Until very recently essentially all vaccines were based on killed or live-attenuated microorganisms, or their chemically deactivated toxins [101,149]. With the genomic era we and others developed the concept of the supragenome/pan-genome [8–12,14,18] and from this came the recognition that distributed/accessory genes associated with virulence could be used to target specific pathogenic strains in a microbiome-sparing approach that would not result in the elimination of commensal strains [65,71,149]. Conversely, the pan-genome can be used to identify core genes that are universally present among all strains if the wish is to eliminate a bacterial species in its entirety. These potential vaccine antigens, whether core or distributed, are often identified in reverse manner starting from an analysis of the supragenome/pan-genome of a species as opposed to the use of a single whole organism — a process called reverse vaccinology [150,151]. Reverse vaccinology makes use of bioinformatics and for microbiome-sparing approaches, statistical methods using strain meta-data with regard to clinical provenance, i.e. commensal or pathogen [71,65], to utilize the information derived from the supragenome/pan-genome of the target bacterial species to predict potential vaccine candidates ‘in silico’. The first the application of reverse vaccinology was for a vaccine against serotype B Neisseria meningitidis . For a vaccine against the more complex Streptococcus agalactiae, reverse vaccinology using the supragenome/pan-genome was employed for the first time . Somewhat later reverse vaccinology was used to compare pathogenic and nonpathogenic strains of the same species to find antigens that truly affect pathogenesis These successes led to application of reverse vaccinology to other pathogens including N. gonorrheae and Mycoplasma pneumoniae [154,155]. It is important to point out that reverse vaccinology and microbiome-sparing vaccine approaches are still susceptible to the evolution of vaccine escape mutants. Thus, it will always be necessary to constantly surveil the circulating pathogen population for the evolution of mutants.
Bacterial single-cell ‘omics’
The vast majority (>99%) of microbial species in the biosphere cannot be cultured in the laboratory with current culturing methods and thus are contained within what has been referred to as the microbial ‘dark matter’ [156–158]. This cultural inability has limited our ability to understand the biology of these organisms. Microbiologists have traditionally studied populations of bacterial cells, typically using millions to billions of cells for analysis in bulk, rather than individual cells, as it has been assumed that individual cells are representative of the population. However, this assumption neglects any heterogeneity present in the population [159–161]. The individual behavior of single cells, particularly in spatially and taxonomically complex assemblages are substantially different from that of the whole population; thus, conclusions based on average molecular or phenotypic measurements of a population can be biased, as the patterns of subpopulations would not be revealed .
The recent development of single-cell meta-omics has greatly enhanced our understanding of the individuality and heterogeneity of microbes in multiple biological systems .
Single-cell omic technologies (genomics, transcriptomics, metabolomics) help reveal this hidden information from both unculturable organisms and low-abundance organisms in complex microbial communities . These technologies are providing new perspectives with regard to our understanding of population diversity by bringing the power of meta-omics  to the single-cell level for studies of taxonomically and metabolically complex biofilms, microbiomes, and holobionts [60–62]. The ability to comprehensively characterize single cells or small populations of cells within a more complex system is revolutionizing our understanding of bacterial metabolic differentiation and how this contributes to the robustness of the biology of microbiomes and their holobionts [60–62,160]. Single-cell sequencing (SCS) is one of the tools used in single-cell omic studies and complements metagenomic deep-sequencing methods . The three main applications of single-cell omics in relation to bacterial populations are to: (1) investigate the genomes of unculturable microorganisms; (2) delineate cell-to-cell diversity within diverse populations [159–162]; and (3) compare the transcriptional activities of genomically identical cells based on their spatial orientations with respect to nutrient availability and access to the substrate. SCS has been applied in widely diverse biological and environmental contexts, including human microbiomes (e.g. [156,164]), seawater and marine sediments (e.g. [165–167]), and even a hospital sink [168–169].
The advent of the concepts of the supragenome/pan-genome and the DGH have revolutionized not only our understanding of bacterial genomics, evolution and adaptability, but they have also provided the framework for novel approaches to diagnosis, precision medicine, and vaccinology. As the scientific community continues to expand upon the past 20 years’ accomplishments in this field, humanity stands to reap substantial rewards with regard to personalized medicine and public health.
The DGH and its prediction of the bacterial supragenome/pan-genome together with the biofilm paradigm has resulted in the formation of a new rubric, bacterial plurality. Bacterial plurality encompasses the concepts that persistent bacterial infections require both genotypic and metabolic heterogeneity, as well as evolution in situ, to explain what had been previously paradoxical findings associated with chronic infections.
Horizontal gene transfer (HGT) mechanisms provide the engine for robust and continuous recombination among bacteria that provides for continuous strain evolution during polyclonal colonizations and chronic infections as a means to adapt to changing environmental and host immune pressures.
The realization that there are differences in gene content and gene expression between commensal and pathogenic strains of the same species provides for specific targeting of the pathogenic strains in the design of drugs and vaccines, resulting in microbiome-sparing approaches.
The authors declare no competing interests
J.A.H. together with E.A.G. wrote the first draft; K.M.S. made the figures; J.C.M. provided data necessary for the response to the reviewers and critical reading of the revisions; G.D.E. outlined the manuscript and provided many of the references; edited the entire original submission; and was responsible for all revisions for the subsequent versions. All authors read and approved the final version.
The authors would like to thank Ms. Carol Hope, MBA, for administrative support in the preparation and submission of this paper. We are also grateful to Josh Earl for his principal role in the development and implementation of much of the bioinformatic software that underlies so much of the data discussed herein. Finally, we thank the entire faculty and staff of the Center for Genomic Sciences and the Center for Advanced Microbial Processing for generating much of the data that is summarized in this review. This work was supported by Drexel University College of Medicine; the Oskar Fisher Project, a gift from Dr. James Truchard; the Bill and Marion Cook Foundation; and NIH R01 DC-02148 and NIH U01 DK-082316 to GDE.