The transcriptome represents the entire set of RNA transcripts expressed in a cell, reflecting both the underlying genetic and epigenetic landscape and environmental influences, providing a comprehensive view of functional cellular states at any given time. Recent technological advances now enable the study of the transcriptome at the resolution of individual cells, providing exciting opportunities to characterise cellular and molecular events that underpin immune-medicated diseases. Here, we draw on recent examples from the literature to highlight the application of advanced bioinformatics tools to extract mechanistic insight and disease biology from bulk and single-cell transcriptomic profiles. Key considerations for the use of available analysis techniques are presented throughout.
Often, new discoveries are facilitated through technological advances. Some prominent examples include improvements in microscope technology that enabled the first observations of white blood cells and platelets , and the development of monoclonal antibodies for phenotypic characterisation of cells and the development of therapeutics . Finally, the advent of next-generation sequencing revolutionised the study of the transcriptome, B-cell and T-cell receptor repertoires , and is arguably one of the most exciting technological advances in recent history, as reflected in the exponential rise in publications citing RNA-sequencing (RNA-Seq) over the past decade. While the transcriptome represents just one layer of biology that ultimately determines the form and function of a cell, its dynamic nature reflects both the effects of environmental influences as well as the underlying genetic and epigenetic landscape. Thus, the transcriptome is a middle-ground rich in information that can reveal important insights into disease biology. The advent of new technologies that perform RNA-Seq at single-cell resolution  enables the profiling of transcriptional states that underpin cellular and phenotypic states with unprecedented clarity. In this regard, transcriptomics lends itself particularly well to immunology; a field dedicated to understanding the behaviour of and functional interactions between diverse cell types. While transcriptomics provides exciting opportunities to unlock disease mechanisms, it also brings significant challenges, particularly around analysing and interpreting the vast volume of data generated employing an ever-growing array of analysis tools. Here, we present an overview of bioinformatics tools that can yield novel and mechanistic insights into the biology of immune-mediated diseases.
Gene expression signatures reveal immunophenotypes, biomarkers, and avenues for therapeutic intervention
A common application of transcriptomics is to compare gene expression profiles between two or more groups of samples, i.e. between resting versus stimulated immune cells, in samples collected from subjects during acute illness versus convalescence, or in subjects with or without clinical manifestations of the disease. Asthma is a pertinent example of how this approach can inform on disease mechanisms and motivate the development of novel treatment strategies. Asthma is a remarkably heterogenous disease, arising through complex gene–environment interactions, yielding multiple phenotypes. These phenotypes often reflect differential activation of discrete inflammatory pathways, the prototypical example being the identification of Th2-high and Th2-low forms of atopic asthma through assessment of IL-13 response genes in airway brushings . Importantly, Th2-low asthmatics were unresponsive to inhaled corticosteroid therapy, linking molecular immunophenotypes with clinically relevant features. Since the seminal paper by Woodruff et al., transcriptomic profiling has been performed on various tissues including sputum [6,7], blood [8,9], bronchoalveolar lavage , bronchial biopsies , and epithelial brushings [12–14] to reveal molecular immunophenotypes beyond Th2-high and Th2-low subgroups, particularly in severe asthmatics; a subgroup of patients with the poorly controlled disease for which new and effective therapies are desperately needed. These studies typically employ unsupervised analysis approaches, where samples are clustered into subgroups based on their gene expression profiles without considering any characteristics of the samples (Figure 1A). When performing unsupervised analyses, care must be taken to ensure that the analysis is driven by signatures that capture disease biology and is not confounded by unrelated sample characteristics (e.g. age, gender, ethnicity, medication use, variations in cellular composition). Jevnikar et al.  recently identified a novel subset of asthma patients characterised by high expression of IL-6 trans-signalling pathway genes within bronchial epithelium through unsupervised clustering. These patients were characterised by frequent exacerbations and exhibited reduced expression of epithelial barrier function genes alongside increased airways inflammation. This finding has important implications for clinical trials aiming to target the IL-6 pathway in exacerbation-prone asthmatics. Indeed, the discovery of distinct molecular phenotypes in asthma has motivated a shift in clinical trials toward endotype-based strategies (reviewed ). The PrecISE study (Precision Interventions for Severe and/or Exacerbation-prone Asthma)  is one such example, where six different treatments are now being tested across severe asthma phenotypes defined on the basis of variations in blood eosinophils, IL-6 levels in plasma, fractional exhaled nitric oxide, and genotypes in an adaptive Phase 2 clinical trial. These treatments target a variety of inflammatory mechanisms and include a tyrosine kinase inhibitor, anti-IL-6, a Janus kinase 1-selective inhibitor, medium-chain triglycerides, a modulator of the cystic fibrosis transmembrane conductance regulator (CFTR) protein, and Broncho-Vaxom; an endotoxin-low lyophilised extract from eight major respiratory bacterial pathogens. Embedding transcriptomics into adaptive clinical trial designs will enable the matching of disease endotypes with response signatures for each therapeutic agent, which will accelerate precision medicine approaches.
An overview of applications for transcriptomics to better understand disease biology and extract mechanistic insight.
In addition to unveiling distinct immunophenotypes ripe for therapeutic intervention, transcriptomics is a powerful tool for biomarker discovery. In contrast with the unsupervised analyses used to detect novel immunophenotypes, biomarker discovery focuses on supervised analysis techniques to perform group-wise comparisons (Figure 1A). A clinical setting in which transcriptomic biomarkers have important diagnostic, prognostic and therapeutic utility is sepsis, a notoriously complex and life-threatening disease. In this regard, Sweeney et al.  leveraged publicly available gene expression data derived from >660 whole blood/buffy coat samples to generate an 11-gene signature that could distinguish sepsis from non-infectious systemic inflammation; mean area under the receiver operating characteristic curve (AUC) of 0.87 (range 0.7–0.98). Accurate identification of infectious from sterile systemic inflammation has critical implications for guiding appropriate therapy, avoiding the unnecessary use of antibiotics. In addition, a recent study utilised transcriptomic data across 21 sepsis cohorts to develop four predictive models of 30-day mortality with AUCs between 0.765 and 0.89, each using <20 genes . Each model was validated using multiple external cohorts across both community- and hospital-acquired infections, a critical step for evaluating the performance of any biomarker. Furthermore, when combined with clinical severity scores, each of the prognostic models significantly outperformed those based on clinical features alone, highlighting the prognostic value of gene expression-based features to improve risk stratification for sepsis patients. Collectively, these studies demonstrate the value in assessing levels of gene expression through both supervised and unsupervised approaches. However, evaluation of the transcriptome on a gene-by-gene basis is limited because genes do not exist or function in isolation, they function as components of interconnected systems .
Mapping gene co-expression networks to unveil unique immunobiological insights
Network analysis provides a holistic framework to elucidate the topology and functional organisation of gene expression programs. Notably, changes in network structure can occur in the absence of changes in mean expression levels. Therefore, network analysis methods capture covert disease-associated signatures that remain undetected by other analysis methods [20,21]. Network theory lends itself to transcriptomics, where gene products (mRNA transcripts) represent network nodes and relationships between genes represent edges (or connections) between nodes (Figure 1B). These relationships can be drawn from prior knowledge or derived from gene co-expression patterns across samples. This latter approach is used by one of the most widely used tools in this space: weighted gene correlation network analysis (WGCNA) . Briefly, WGCNA is an unsupervised algorithm that constructs a network-based on Pearson correlations between gene pairs across the data. The correlation matrix is raised to a power to achieve scale-free topology, i.e. the correlation structure of the network follows a power-law degree distribution where a subset of genes known as ‘hubs’ dominate the connectivity patterns. Finally, functionally enriched modules of highly correlated genes are identified through hierarchical clustering. By focusing on the correlation structure of the data, WGCNA can unveil mechanisms that are not detected through differential expression analysis. Jones et al.  demonstrate this principle in a recent longitudinal study of CD4+ T-cell responses to house dust mite (HDM) during HDM-specific immunotherapy in allergic rhinitis patients. The authors reported that no differentially expressed genes were identified with respect to treatment effect during immunotherapy. Yet through network analysis, they demonstrated that initially topologically isolated type 1 interferon, Th2 and IL-2/Treg networks were rewired over time to form a single highly interconnected network by the end of immunotherapy, concomitant with symptom stabilisation relative to baseline. Drawing further examples from asthma, Peters et al. recently applied WGCNA to sputum transcriptomic data to better understand the contributions of distinct immune cell types to type-2 inflammation (T2) high and T2-low asthma phenotypes . The authors identified a network of 92 genes enriched for gene expression signatures derived from eosinophils, mast cells/basophils and monocyte-derived dendritic cells, which correlated with both asthma status and lung function. Based on this 92-gene network, hierarchical clustering of study participants revealed separation between T2-high and T2-low asthmatics, but also identified a new third ‘ultra-high’ T2 phenotype. Furthermore, a network module enriched for cytotoxic CD8+ T-cells was significantly under-expressed in the T2-low asthmatics. While WGCNA provided the foundation for the analysis performed by Peters et al., the immunological insights gained relied on delineating the contributions of each cell type to the gene expression networks. The authors generated their own cell type-specific gene signatures through analysis of publicly available data from purified cell populations; however, it is worth noting that tools such as CIBERSORTx , xCell , and EPIC  are available as a form of ‘digital cytometry’ to infer the contributions of leukocyte populations to gene expression profiles, albeit predominantly designed for analysis of malignant tissues. An important consideration for WGCNA is the sample size required to generate robust results; at least 30 samples are required, however, larger sample sizes will lead to more informative results. Furthermore, while WGCNA provides a global view of gene co-expression networks, additional analyses are required to identify the causal pathways that drive gene network patterns.
Identifying molecular drivers of transcriptome regulation
Cellular states are determined by complex and co-ordinated signalling cascades that are tightly regulated under homeostatic conditions, and that adapt during perturbations. While differential gene expression and co-expression network analyses can identify the molecular changes that underpin phenotypic states, they cannot predict what is driving these changes. Several tools are available in this regard, some leveraging prior knowledge to generate causal networks, while others use data-driven approaches to infer regulatory mechanisms (Figure 1C). Upstream Regulator Analysis is one of the most comprehensive prior knowledge-based tools for predicting upstream molecular drivers of downstream expression patterns . Here, a variety of molecule types (e.g. ligands, receptors, kinases, transcription factors (TFs), microRNAs, drug compounds) are considered candidate upstream transcriptional regulators based on experimental evidence, and through comparing their observed influence on gene expression with a user-defined list of input genes and their associated log-fold changes, an overlap p-value is calculated for each regulator alongside an activation Z-score, which considers the direction of change in expression (up/down-regulation) to predict whether a regulator is activated or inhibited. Zemek et al.  employed this approach on a gene signature associated with positive responses to immune checkpoint blockade therapy in murine mesothelioma prior to the onset of treatment, which was predicted to be driven by activation of STAT1 and inhibition of IL-10. By administering the STAT1 activator IFNγ and blocking IL-10 to mimic this molecular phenotype, tumours were sensitised to immune checkpoint blockade leading to improved cure rates. Other prior knowledge-based tools include ChEA3 (ChIP-X Enrichment Analysis 3)  which ranks TFs based on pooled information across six TF-target gene set libraries to identify those responsible for observed changes in gene expression, and CARNIVAL (CAusal Reasoning pipeline for Network identification using Integer VALue programming) which takes an alternative approach by integrating both TF and pathway activity scores to infer the entire signalling cascade . While prior knowledge-based tools can provide valuable mechanistic insight, they rely on data generated from diverse experimental systems, and a limitation of this approach is that the data lack context-specificity and are biased towards the identification of well-characterised pathways. Data-driven approaches whilst less accurate, are better suited to identify novel, context-dependant mechanisms, and in this domain, VIPER (Virtual Inference of Protein-activity by Enriched Regulon analysis)  is one of the best performing tools. VIPER analysis entails the construction of networks between TF and target genes using ARACNE (Algorithm for the Reconstruction of Accurate Cellular Networks) , and then for each TF, a regulon is defined comprising the complete set of predicted downstream target genes for that TF. The regulons are then scored for differential enrichment between experimental conditions or on a single-sample basis using gene set analysis methods, to infer the biological activity of each TF. A limitation of VIPER is that hundreds (or thousands) of samples are required for accurate network construction using ARACNE, however, in many applications, it is possible to use large data sets available in the public domain derived from the same tissue type to construct a tissue-specific network suitable for VIPER.
Towards personalised immunology through single-sample transcriptomics
The analyses considered above focus on group-wise comparisons, and in general, these methods are not well suited to elucidate disease mechanisms that are operating in small numbers of patients. Where inter-sample heterogeneity is viewed as a limitation in group-wise analyses, it becomes the focus of single-subject analyses (Figure 1D). As we continue to transition toward personalised medicine (the understanding of disease and the tailoring of treatments at an individual level), the insights gained through single-subject analyses will be invaluable. In the context of transcriptomics, several tools have emerged for this purpose including n-of-1-pathways , ssGSEA (single-sample Gene Set Enrichment Analysis) , and LIONESS (Linear Interpolation to Obtain Network Estimates for Single Samples) [36,37]. The n-of-1-pathways algorithm  requires a pair of samples from a single subject and generates a personal profile of dysregulated pathways. Jones et al. demonstrate the utility of n-of-1-pathways to study paediatric viral bronchiolitis, where transcriptomic profiles of PBMC collected during an acute episode as well as after convalescence were analysed on a single-subject basis, followed by unsupervised clustering to reveal two clear subgroups of study participants . These groups exhibited striking variations in the expression of innate immune pathways (interferon, inflammation, toll-like receptors) effectively yielding a ‘low-response’ group which importantly, was not separable based on available clinical metrics including steroid use, pathogen type or age, highlighting the power of n-of-1-pathways to reveal novel insights. In a similar vein, ssGSEA  calculates an enrichment score for each sample gene set pair, reflecting the level of co-ordinated up- or down-regulation of that gene set (provided by the user). Kobayashi et al. demonstrate how ssGSEA can be used to derive ‘immunogram’ scores for solid tumours based on 10 gene sets spanning pathways integral to immune function and cancer biology, to deliver a personalised view of cancer-immunity status and identify potential avenues for therapeutic intervention . Finally, in contrast with n-of-1-pathways and ssGSEA which interrogate predefined biological pathways, the recently developed tool LIONESS [36,37] provides an opportunity to infer single-sample networks from experimental data. To achieve this, LIONESS first constructs an aggregate network across the entire dataset, and then systematically re-builds the network n times, leaving out a different sample upon each re-build. By subtracting each n − 1 network from the aggregate network, LIONESS estimates the relative influence of an individual sample, yielding sample-specific network edge weights. Using transcriptional profiles from 153 lymphoblastic cell lines, the authors illustrate the power of single-sample networks to find biologically meaningful data structures not otherwise identifiable based on aggregate networks or evaluation of gene expression levels. Specifically, the data separated into two distinct groups based on clustering of individual network edge weights, which were strikingly different from the clusters that formed based on gene expression levels. Comparisons of the most highly targeted genes (those with the highest sum of edge weights) in each network-based cluster revealed enrichment for ∼50 biological pathways, whereas no pathways were enriched in the differentially expressed genes between expression-based clusters. A general limitation of personalised analyses is the potential for incidental findings, and naturally, there is a higher degree of uncertainty when the statistical universe is a single subject.
Single-cell transcriptomics: a revolutionary tool for immunology
Single-cell RNA-sequencing (scRNA-seq) is a major technological advance that overcomes many of the limitations of ‘bulk’ profiling, providing not only phenotypic, but functionally relevant information on individual cells, enabling researchers to study the underlying heterogeneity that exists within and between cell populations with unprecedented clarity. The Human Cell Atlas [40,41] is an international collaboration aiming to create maps of all human cells, largely through single-cell transcriptomics. The availability of single-cell atlases across biological and clinical states will enable the identification of novel cell subsets, shed light on ontologies and differentiation trajectories, and elucidate cell-to-cell interactions that govern tissue and organ function (Figure 1E). In a landmark paper, Villani et al.  present an unbiased re-classification system for human dendritic cell and monocyte subsets based on the transcriptomic profiles from ∼2400 individual HLA-DR+ cells from healthy adult donors. This work has major implications for the field, establishing a new taxonomy to capture the previously unknown heterogeneity of dendritic cells comprising six distinct subsets, and monocytes comprising four distinct subsets. In a disease-relevant setting, scRNA-seq of the human bronchial epithelium has recently led to the discovery of the pulmonary ionocyte; a rare cell type with high CFTR gene expression . The authors demonstrate that these cells were responsible for the majority of CFTR activity in airway epithelium, are dependent on Notch signalling and express high levels of the transcription factor FOXI1; a master regulator of the proton pump V-ATPase suggesting a role for the ionocyte in the regulation of luminal pH, and a potential target for therapeutic intervention in cystic fibrosis . Furthermore, the authors identified two distinct differentiation trajectories of conducting airway cells in an animal model of airway injury and repair using a method called SPRING , which generates k-nearest neighbour graphs to visualise the continuum of cell states based on their transcriptional profiles. A similar tool in this space is Monocle , an unsupervised algorithm that uses pseudotemporal ordering of single cells to capture transitional cellular states such as differentiation. Kwok et al.  recently employed Monocle to delineate the fate of granulocyte-monocyte progenitors, and combined this with flow cytometry to identify novel early committed (proNeu1) and intermediate (proNeu2) neutrophil progenitors. In a sepsis model, the authors demonstrate the preferential and specific expansion of proNeu1 cells during early systemic inflammation at the expense of monocyte differentiation, yielding new insights into the pathogenesis of sepsis. Single-cell genomics has also unveiled new insights into the complexity and heterogeneity of autoimmune diseases such as systemic lupus erythematosus (SLE), lupus nephritis and rheumatoid arthritis. Nehar-Belaid et al.  profiled ∼276 000 PBMC from children with SLE and matched healthy controls, and through sub-clustering of leukocytes based on their molecular profiles identified the expansion of monocyte, dendritic cell (conventional and plasmacytoid), CD4+ and CD8+ T-cell, NK cell, B-cell and plasma cell sub-clusters in SLE patients that specifically expressed high levels of interferon-stimulated genes. While an elevated blood type I interferon gene signature is a common feature of SLE , through scRNA-seq Nehar-Belaid et al. were able to pinpoint the precise cellular compartments altered in this regard, reporting novel leukocyte variations associated with SLE that were most extreme with increased disease severity. Similarly, through scRNA-seq of renal biopsies from patients with lupus nephritis, Der et al.  observed that high expression of a type I IFN signature alongside fibrotic pathway genes in tubular cells was significantly associated with non-response to treatment. The authors note the potential clinical utility of the fibrotic gene signature as an early prognostic marker; logistic regression using four of these genes could predict response to treatment 6 months post-biopsy (accuracy 92%, AUC 0.96) even though tuberinterstitial damage (linked to poor prognosis) was undetected via histological scoring for some of the treatment-refractory patients at the time of sampling. In the context of rheumatoid arthritis, Zhang and colleagues integrated scRNA-seq data with bulk RNA-seq and mass/flow cytometry to comprehensively map the functional cellular landscape of synovial tissue and delineate populations driving joint inflammation . They identified the specific expansion of pro-inflammatory monocytes producing IL1B, sublining fibroblasts (THY1+, HLA-DRhi) producing IL6, peripheral and follicular helper T-cells (PD-1+) and autoimmune-associated B-cells (ITGAX+TBX21+) that also expressed interferon-stimulated genes within arthritic synovia. Interestingly, while the up-regulation of some inflammatory mediators was restricted to discrete subpopulations (e.g. IL1B and IL6), others including TNF were globally increased within synovial cell types, providing a deeper understanding of the relative contributions of different cell populations to joint inflammation.
Finally, the advent of scRNA-seq provides new opportunities to understand cellular interactions through cell-to-cell communication networks that are mediated by ligand–receptor signalling, which has become a field of study in its own right (reviewed ). Understanding interactions between cells of the adaptive and innate immune system, and interactions between immune and non-immune cells within diseased organs, malignant cells and even transplanted tissues plays a critical role in disease progression, resolution, and chronicity. Here, we highlight three methods for studying cell–cell communication: CellPhoneDB [53,54], NATMI (Network Analysis Toolkit for Multicellular Interactions) , and NicheNet . All three are built on known ligand–receptor interactions but are each unique in their approach. CellPhoneDB compares the expression of ligands in one cell type with expression of receptors in another, to generate a ranked list of active ligand–receptor pairs. Uniquely, CellPhoneDB considers the protein architecture of ligand–receptor pairs, recognising that they often exert functions as heteromeric complexes. The application of CellPhoneDB to bronchial biopsies from healthy and asthmatic individuals demonstrates unique airway cell–cell communication networks in asthma, dominated by interactions between Th2 and epithelial or mesenchymal cell populations that are absent in healthy lung tissues, including some novel epithelial-Th2 ligand–receptor pairs (CXCL2-DPP4, CXCL17-GPR35) . These findings extend previous observations from bulk RNA-Seq of highly interlinked epithelial-Th2 gene networks specific to airway mucosa of atopic asthmatics . NATMI also compares the expression of ligands and receptors between cell clusters, but its advantage lies in the use of connectomeDB2020 (currently the largest literature curated database for ligand–receptor pairs containing 2293 interactions) and its ability to generate and visualise global cell–cell communication networks through systematic assessment of ligand and receptor expression in every cell cluster. NATMI also allows for direct comparison of two distinct cell–cell communication networks to identify condition-specific interactions. Whilst NATMI provides a global view of intercellular communication networks, NicheNet leverages a prior network model to link cell-to-cell communication though specific signalling paths (Ligand–Receptor-TF-Target gene) to predict responses in target cells. Across all three methods, accurate cell type annotations are critical for delineating cell–cell interactions and generating valid novel hypotheses. Collectively, these studies demonstrate the power of scRNA-seq to elucidate the cellular and molecular events that underpin inflammation. However, some important limitations of the technology should be considered. Firstly, sample processing protocols required for tissue dissociation can introduce artefacts into the data (e.g. activation of stress response genes) . Secondly, the data are sparse and zero-inflated, resulting in many false negatives which complicate downstream statistical and computational analyses. Even so, as the cost of scRNA-seq declines and analysis techniques evolve, this technology will cement itself as an integral tool for surveying the functional immune repertoire, and the intersection of cellular and molecular mechanisms that govern immunopathologies.
Remarkable advances in genomic profiling technologies and bioinformatics have revolutionised the study of gene expression, which in the context of immunology has led to a deeper understanding of disease biology, disease heterogeneity, and the identification of therapeutic targets. Furthermore, the adaptation of RNA-Seq to individual cells provides many opportunities for discovery, as the unbiased analysis of cell populations can reveal novel functional subsets and transitional states. A diverse range of tools are available to interrogate gene expression data, with advantages and limitations that should be carefully considered. In this regard, there are several challenges when applying transcriptomics to the study of immune-mediated diseases. First, projects should be based on well-designed questions, with careful consideration of the specimen type (e.g. blood, biopsy, swabs), immunological context (e.g. baseline versus challenge; inactive/active disease), and the collection of comprehensive sample metadata (e.g. demographic and clinical variables, medication use, technical variables). Second, samples should be randomised and processed in large batches to minimise the potential for batch effects to confound analysis. Third, a data analysis plan should be developed prior to collecting the samples, to ensure that the sample size is large enough to address the aims of the study, considering disease heterogeneity and patient subgroups. Fourth, mechanistic insights derived from computational analyses represent predictions that require experimental validation. Finally, while transcriptomics alone is a powerful tool to uncover biological insight, technologies for studying additional layers of biology in the same sample are maturing such as single-sample multi-OMICS, which together have the potential to transform our understanding of immune-mediated diseases.
The transcriptome is a dynamic layer of biological regulation, the study of which has significantly advanced our understanding of immunological processes and immune-mediated diseases. Furthermore, transcriptomics can generate clinically relevant features in the form of disease biomarkers and endotypes that may be stratified for targeted therapeutics.
Analyses that move beyond differential expression to elucidate gene network patterns and driver genes can yield a unique level of insight, alongside personalised approaches that support single-subject analyses.
scRNA-seq will replace bulk profiling and will challenge current paradigms pertaining to the basic mechanisms and principles that govern the functionality of the immune system in health and disease. Furthermore, as technologies for studying additional layers of biology in the same sample mature (e.g. epigenome, proteome, metabolome), this will enable a much deeper understanding of immune regulation.
The authors declare that there are no competing interests associated with the manuscript.
Conceptualisation and drafting of the manuscript: E.d.J. and A.B.; Critical revision of manuscript: E.d.J. and A.B.
Open access for this article was enabled by the participation of the University of Western Australia in an all-inclusive Read & Publish pilot with Portland Press and the Biochemical Society under a transformative agreement with CAUL.