Interactions between pathogens and their hosts can induce complex changes in both host and pathogen states to privilege pathogen survival or host clearance of the pathogen. To determine the consequences of specific host–pathogen interactions, a variety of techniques in microbiology, cell biology, and immunology are available to researchers. Systems biology that enables unbiased measurements of transcriptomes, proteomes, and other biomolecules has become increasingly common in the study of host–pathogen interactions. These approaches can be used to generate novel hypotheses or to characterize the effects of particular perturbations across an entire biomolecular network. With proper experimental design and complementary data analysis tools, high-throughput omics techniques can provide novel insights into the mechanisms that underlie processes from phagocytosis to pathogen immune evasion. Here, we provide an overview of the suite of biochemical approaches for high-throughput analyses of host–pathogen interactions, analytical frameworks for understanding the resulting datasets, and a vision for the future of this exciting field.
Infectious diseases remain one of the leading causes of death worldwide. The diversity of pathogens that cause these diseases is expansive, ranging from viruses to eukaryotic parasites. This pathogenic diversity is matched by the types of host cells that harbor these pathogens. This combinatorial complexity has fueled decades of research to understand generalizable principles about the host response to pathogens as well as the unique properties of these pathogens encoded by their diverse genomes. Compounding the evolution of these relationships, both host and pathogen genomes are rich with defensive, evasive, and resistive mechanisms against each other. Despite decades of research into host– pathogen interactions, many unanswered questions remain surrounding the biochemical choreography of pathogenic attack that prevent the development of new treatment regimens. Complementary to focused genetic and biochemical investigations to individual mechanisms, high-throughput technologies, such as next-generation sequencing (NGS) or mass spectrometry, offer a systems approach to exploring the complex dynamics between the host and the pathogen (Figure 1). The preceding decades have seen a dramatic increase in the application of systems biology to understand host–pathogen interactions largely driven by improvements and availability of new technologies for understanding the host and the pathogen. Here, we review examples of these tools in host–pathogen interactions and present a perspective on the future of the field.
Example experimental workflow for the analysis of host–pathogen interactions using omics tools.
NGS and genome-wide association studies identify disease-associated mutations during infection
Genomics offers a platform for studying the structure, function, and evolution of host and pathogen genomes. As the cost of NGS continues to decrease, these technologies collectively serve as a powerful tool for understanding the host and the pathogen biology especially via the lens of disease dynamics in human populations. Whole genome sequencing (WGS) of pathogens has enabled the study of viral transmission chains and the diverse metabolism of Mycobacterium tuberculosis (Mtb) [1,2]. Furthermore, genome-wide association studies (GWAS) have been deployed to understand the diversity of host responses during an array of infections. In an 11 000 patient GWAS study in malaria, a resistance locus was identified in African children near a cluster of genes which encoded receptors for erythrocyte invasion . While many GWAS studies have been applied to understand host genomes, some studies have begun to apply these techniques to understand the pathogen. With the rise of drug resistance, an important question that remains is what mutations in a pathogen give rise to antibiotic resistance. In a GWAS study of Mtb isolates in a Chinese cohort, mutations in transcription factor PrpR were associated with isoniazid resistance. In fact, PrpR mutations allowed drug tolerance of other effective antibiotics by altering propionyl-CoA metabolism .
As a result of the decreasing cost of WGS, the technology is now being deployed in clinical settings as well [5,6]. Clinical genomics utilizes a patient's genomic data to help inform clinical decisions. This information can be used in a myriad of ways from preventing disease in patients with a strong predisposition to adjusting a medication depending on the disease. Similar to how oncogenomics utilizes sequencing to understand genetic mutations in cancer; clinical genomics can be used to study host–pathogen interactions by identifying pathogen antibiotic resistance mutations thus defining customized treatment regimens. Ultimately, clinical genomics offers an exciting path to understanding host–pathogen interactions in vivo which may help identify new strategies for treatment and diagnosis.
Changes in chromatin architecture underlie the functional transitions in cell state during infection
The study of epigenetics, discussed here as the modulation of gene expression by histone or DNA modifications, has also benefited from advances in NGS. These techniques, such as bisulfite sequencing, ATAC-seq, and ChIP-seq, allow for the interrogation of DNA-binding behavior, accessibility, and modifications. Despite these advances, one of the remaining technical challenges that remains in the field is the collection, analysis, and correct annotation of large-scale epigenomic data. While several fields have highlighted the role of the epigenome in inflammation broadly, the epigenome can be altered extensively upon infection [7,8]. Epigenetics represent another modality by which both the host and the pathogen can modulate their state during infection. Bacterial modulation of host histone acetylation and deacetylation, reviewed elsewhere, can occur either directly or indirectly as a means of virulence and adaptation to the host . In other instances, pathogens use epigenetic modification to switch adaptation behaviors . Oncogenic implications have motivated many studies into modifications stemming from viral modification. Generally, as with any stimulus, both the host and the pathogen react to modulate their gene expression through epigenetic modifications. The challenge becomes identifying the signaling processes and virulence factors that manipulate this process during infection.
Nevertheless, insights into the modulation of host innate immunity through epigenetic mechanisms have revealed new potential for the century-old BCG (Bacillus Calmette–Guérin) vaccine. Recent studies demonstrated that an alternative route of administration of BCG in mice resulted in epigenetic reprogramming of bone marrow-derived macrophages with enhanced antimicrobial capacity [11,12]. Another recent study demonstrated that the initiation of antiretroviral therapy in HIV+ patients is coincident with altered epigenetic modification of the TNF-α promoter in a cohort followed longitudinally . More broadly, the field of trained innate immunity continues to propose novel mechanisms by which innate stimuli can trigger epigenome-wide modifications with implications for cytokine release and pathogen control. While further investigation is needed, these studies represent clinically relevant associations between epigenetic changes and immune responses during infection. As epigenomic studies continue in host–pathogen interactions, it will be critically important to functionally associate epigenetic changes with specific responses before, during, and after infection.
Transcriptional profiling enables facile study of reciprocal responses in the host and the pathogen in response to each other
Transcriptomics offers an accessible method to catalog gene expression during pathogen challenge. An early demonstration of RNA profiling on microarrays outlined a conserved transcriptional response of dendritic cells to Escherichia coli, Candida albicans, and influenza virus . Soon after the explosion of microarray studies in host–pathogen interactions, Jenner and Young compiled a variety of these studies to generate conserved and differential host signatures to pathogens . RNA profiling has also been extensively used to reveal the global state of pathogens. Rohde et al.  profiled the Mtb transcriptome over 2 h post-infection, revealing global changes in the bacterium under host stress. Compared with profiles from broth culture, pH was responsible for inducing several of these bacterial gene programs. Transcriptomic profiling of mRNAs, miRNAs, siRNAs in both the pathogen and the host have surveyed the dynamic adjustments made to invade and defend, respectively. While microarray and RNA-seq technology have typically focused on profiling changes in the host or the pathogen, dual RNA-sequencing of both the host and the pathogen has recently been established [17,18]. These dual RNA-seq approaches have been successfully implemented in tissue and clinical samples as well, offering greater in vivo relevance [19,20]. Dual RNA-seq generally requires substantial input material and suffers from sensitivity challenges to the relative abundance of pathogen transcriptomes. Most methods, where the pathogen's mRNAs are not polyadenylated, require additional processing steps to capture pathogen mRNA or require deep sequencing to capture sufficient amounts of pathogen mRNA to adequately describe pathogen state. It may prove more effective to perform targeted sequencing for pathogen transcripts when profiling bulk populations to measure gene expression across multiple timepoints or conditions. Nevertheless, capturing both the host and the pathogen gene expression simultaneously offers a more complete snapshot of the transcriptional programs enacted upon these interactions.
While bulk approaches have been tremendously powerful, many infections are propagated by the infection of a single host cell by a pathogen. Single-cell RNA-sequencing has uncovered critical cell states that drive or describe infection status, the impact of pathogen variability on host responses, and comprehensive landscapes of infected cells and their responses in vivo [21–23]. Clinically, these approaches will become particularly vital in analyzing the response and effectiveness of vaccines and host-directed therapies in an effort to understand how a particular therapy homogenizes responses in a population of cells. Recently, Avital et al.  demonstrated a method to capture both host and bacterial transcriptomes on a single-cell level. This report was a promising start, but suffered from low capture and coverage. Unfortunately, for these same reasons, microbial single-cell sequencing remains out of reach for common use. These technical challenges in capturing pathogen mRNAs at the single-cell level represent an opportunity for the field going forward to obtain a more comprehensive picture of host–pathogen interactions.
While these studies have elucidated critical factors and markers for infection progression in various systems, the substantial sequencing data generated have been primarily analyzed using conventional methods, such as differential expression analysis and clustering; however, continued integration of these data into mechanistic models of host–pathogen interactions is needed. As RNA-seq approaches increase sensitivity and throughput, analyzing and integrating the underlying gene programs could be fruitful to identify correlated and potentially interacting gene regulation and response between the host and the pathogen.
Forward genetic screens provide a powerful platform to identify master regulators of host–pathogen interactions
Forward genetic screens have been a valuable tool to elucidate the role of host and pathogen proteins during infection. With the advent of transposon sequencing, linking genotype to phenotype has explored genes at the center of host–pathogen interactions. One of the first demonstrations of this technology with NGS identified a variety of genes required for Haemophilus influenzae infection in vivo . The use of transposon mutagenesis remains the gold standard to uncover gene requirements in unique conditions. Deeply interrogating function in essential genes, however, requires complementary tools. Generating hypomorphs of essential genes enables more functional analysis and screening opportunities. Donald et al.  employed antisense strains induced by xylose to identify mechanisms of action for 59 compounds. Recently, applications of the CRISPR–Cas systems have emerged to map genotype to phenotype in host–pathogen interactions. Some studies have focused on defining gene essentiality, while others have focused on host factors required for various infection phenotypes. Other modalities are available for forward genetic investigations in host–pathogen interactions, such as RNAi, ORF libraries, or other variants of the CRISPR–Cas systems. Despite most applications of these newer technologies focusing on host factors, these tools are being implemented within the pathogen . For example, Sidik et al.  labeled 200 novel essential genes of Toxoplasma gondii, functionally characterized 16 of these, and further described CLAMP, a gene required for invasion of host cells. These methods suffer from challenges in generating and validating these genetic libraries. Implementing robust controls and analyzing the large sequence or image-based data typically generated from these screens requires significant attention and consideration. These challenges are commonly supplemented by focused validation on specific targets following the larger screen. While applications of CRISPR-based perturbations to host–pathogen interactions lag behind other fields, the potential has been well demonstrated. We can expect these forward screens to grow and expand in scope as costs drop and ease-of-use increases. Potentially, we may begin to see the use of focused genetic perturbations in both the host and the pathogen for the combinatorial forward dissection of host–pathogen interactions.
Proteomic analyses identify activated signaling networks and biophysical interactions between the host and the pathogen during infection
While transcriptional profiling has been a mainstay in systems-biology analyses of host–pathogen interactions due to the relative ease of sample preparation and availability of NGS facilities, proteomic approaches provide a direct measure of the protein expression during infection as a measure of the physical units of the cell with the capacity of modulating cellular behavior. Understanding protein expression dynamics during infection can open avenues for therapeutic intervention points and diagnostics . A variety of techniques have been developed to better study these changes both qualitatively and quantitatively, with a focus on the use of mass spectrometry for the unbiased measurement of protein expression and flow cytometry for single-cell resolution.
While many in vitro models of host–pathogen interactions rely on incubating soluble pathogen-derived molecules with host cells, interactions between the host and the intact pathogen drive dynamic changes in both organisms. Typically, host cells vastly outnumber the invading pathogen which can limit the potential of proteomics to capture both host and pathogen dynamics simultaneously. Due to the abundance of host protein and relatively low levels of pathogen protein present, many studies focus on the analysis of either host or pathogen proteome exclusively, although chemical approaches are making headway in deciphering both simultaneously . These changes in expression reveal both direct and indirect reprogramming due to host–pathogen interactions. More sensitive methods monitor the dynamics of post-translational modifications which enables the identification of associated signaling networks [31,32].
A successful pathogen may only induce virulence programs in settings when essential for their survival, such as when in proximity to their host. This enables the pathogen to conserve resources while evading host defenses as long as possible. Host adaptation is particularly vital for intracellular pathogens to counterbalance host strategies to eliminate them. To deploy proteomic strategies to study pathogen responses in the context of the host, many methods seek to first enrich the pathogen fraction of the complex mixture. Methods to accomplish this include centrifugation, immunomagnetic separation, and flow-assisted cell sorting of fluorescently labeled pathogens .
For those pathogens that reside within host phagosomes, the pathogen must be able to enact changes that disrupt phagolysosome maturation and fusion in order to survive. Fernández-Arenas et al.  characterized the proteome of C. albicans within murine macrophages just 3 h after infection. In this timeframe, the pathogen up-regulated proteins associated with detoxification and protein degradation when compared with C. albicans grown in axenic culture. Additionally, metabolic protein regulation reflected the change in carbon and lipid availability within the phagosome. Similarly, comparison of intracellular Salmonella enterica within the HeLa epithelial cell line at 1 and 6 h post-infection revealed its dependence on iron scavenging, a limited intracellular resource . This study found that by preventing expression of the iron-associated genes, the pathogen was attenuated intracellularly, highlighting the potential of those proteins for therapeutic intervention. Identifying these expression dynamics provides a means to identify the critical responses necessary to adjust to intracellular growth and survival and provides a means to identify candidate pathways that can be exploited to improve pathogen control.
Host cells experience dynamic changes in their proteome during infection which can have both beneficial consequences for the host or the pathogen. Host cells can up-regulate molecules to recruit other immune cells or experience pathogen hijacking of cellular machinery to promote its own survival needs . While the host cell may globally alter protein expression to respond to infection, more nuanced changes may occur during infection such as the protein composition of organelles or post-translational modification of proteins. For example, the composition of Legionella containing vacuole (LCV) has been shown to be modulated by interferon signaling, despite limited changes in the transcript level of many of the proteins have been shown to be differentially comprising the LCV . Meanwhile, pathogens use effectors to disrupt host responses. By comparing the phosphoproteome of uninfected, S. enterica infected, and S. enterica lacking the effector SopB in HeLa cells, Rogers et al.  demonstrated decoupling of pathogen recognition from SopB-mediated disruption of intracellular signaling networks in the host. Decoupling host proteomic responses from pathogen-mediated hijacking of the responses may identify better intervention points for therapeutic development.
While protein expression changes may shed light on the overall responses between the two organisms, it does not necessarily explain how these responses are elicited. Most pathogens use secretion systems to control host responses through the interaction of their nucleic acids and proteins with those of the host. These interactions can then be observed and measured through co-immunoprecipitation studies and yeast 2 hybrid (Y2H) systems to gain insight into their modulatory capabilities on one another. Using viral influenza proteins as baits, a Y2H system identified that influenza proteins interacted with a large array of host proteins including those involved in WNT signaling . In an unbiased approach to uncover flavivirus interactions with human and mosquito hosts, Shah et al. utilized affinity precipitation-mass spectrometry to compare the conserved and divergent interactions of Dengue and Zika proteomes with their hosts. This study identified how both viruses were able to suppress human interferon-stimulated gene expression by viral NS5 interaction with a transcription complex . Probing these protein–protein interactions between the host and the pathogen reveals the biological mechanisms behind the associated phenotypic understandings of their interaction.
While unbiased methods for single-cell transcriptional profiling of the host–pathogen interface are being rapidly developed and improved upon, there remains considerable room for advancement for unbiased or even more highly multiplexed single-cell protein studies of host–pathogen interactions. Currently, the available technologies for protein level measurements include multiplexed immunofluorescence, flow cytometry, and mass cytometry. These approaches rely on the availability of high-quality antibodies for the targeted analysis of specific proteins with single-cell resolution. Sen et al.  used single-cell mass cytometry to identify patterns in memory T cell surface marker expression during infection with Varicella zoster virus to identify a spectrum of responses in these cells including PI3K signaling and TCR activation. Further improvements to single-cell protein measurements at the host–pathogen may identify surface markers that mark infected cells that could subsequently be used for immunotargeting of infected cells.
In addition to proteomics, other mass-spectrometry-based approaches, such as metabolomics, lipidomics, and glycomics, provide additional insight into the dynamics of host–pathogen interactions by monitoring changes in the abundance of biomolecules which provide essential nutrients which contribute to pathogen and host viability.
Identification of potential networks in host–pathogen omics data is enhanced by statistical analyses and computational tools
Many, if not all, of the aforementioned techniques have the potential to generate large volumes of complex data. Given the high-throughput nature of omics data, it is essential to establish an analytical framework for validating the trends observed in these analyses and integrating these data into molecular networks. To validate these data, two experimental axes should be considered. First, it is essential to curate these data for data quality; for example, the quality of particular sequencing reads should be considered for NGS, while mass spectrometry data should be manually validated to ensure accurate peptide identification. Appropriate statistical considerations are also mandatory for these techniques. Multiple technical and biological replicates (in general, three or more) should be conducted, and the resulting P-values should be corrected for multiple hypothesis testing using techniques such as the Benjamini–Hochberg correction. These methods then provide a curated set of molecular targets for use in integrative network analyses.
Clustering serves as a generalizable analytical tool that can be used across many omics techniques to uncover potential relationships between genes, proteins, or other biomolecules. Clustering can identify biomolecules with shared expression dynamics in time-series analyses. Biomolecules with shared expression dynamics can be interrogated for shared molecular function or upstream regulators to identify candidate hypotheses for how these networks are regulated. Tools to enable identification of shared molecular functions or upstream regulators include the Gene Ontology database and Enrichr [42,43]. For example, Avraham and colleagues used clustering of single-cell RNA-sequencing data and mined these data to identify an enriched Type I interferon signature in Salmonella-infected macrophages. Databases of protein–protein interaction data can also be utilized to reconstruct molecular networks in host–pathogen interactions. For example, proteins with differential expression in the presence or absence of pathogen infection can be interrogated using databases such as STRING to define putative molecular interactions that may be present during infection . Finally, when multi-omics data are available for a particular host–pathogen interaction, it may be possible to reconstruct networks that integrate receptor activation all the way to transcription factor activation to gene expression. While these approaches have not been broadly applied for host–pathogen interactions, Huang and Fraenkel  developed a flexible framework to integrate these data to identify candidate regulators in the yeast pheromone response. A diverse array of additional techniques is available for constructing molecular networks ranging from Bayesian network reconstruction to metabolic network reconstruction [46,47]. It is important to note that the amount of input data required for these robust of these methods can vary dramatically. Like the biochemical validation for verifying the trends observed in omics data, validation of these predicted networks using small molecule inhibitors or genetic techniques can further enhance or extend the molecular predictions that result from these bioinformatic tools.
A broad array of high-throughput techniques is available to interrogate host–pathogen interactions across a variety of biochemical layers (genomic, epigenomic, transcriptomic, proteomic and metabolomic).
Due to the relative abundance of pathogen biomass relative to host biomass, methods to enrich for pathogen-derived biomolecules are essential. Dual RNA-seq techniques are a great start in this direction.
As our appreciation for heterogeneity in cellular systems increases, methods to probe host and pathogen responses with single-cell resolution are needed.
Careful execution of algorithms for data processing is paramount for obtaining interpretable data from high-throughput experiments. Computational methods to reconstruct host–pathogen interaction networks will be needed as improvements are made for simultaneous analysis of host and pathogen dynamics.
J.M.P., S.L.S., C.Y.I., and B.D.B. all co-wrote the manuscript, and B.D.B. guided the preparation.
This work was funded by startup funding provided by the MIT Department of Biological Engineering and the Ragon Institute of MGH, MIT, and Harvard.
The authors thank members of the Bryson Laboratory for their helpful discussions and feedback.
The Authors declare that there are no competing interests associated with the manuscript.
These authors contributed equally to this work.