Due to genetic heterogeneity across patients, the identification of effective disease signatures and therapeutic targets is challenging. Addressing this challenge, we have previously developed a network-based approach, which integrates heterogeneous sources of biological information to identify disease specific core-regulatory networks. In particular, our workflow uses a multi-objective optimization function to calculate a ranking score for network components (e.g. feedback/feedforward loops) based on network properties, biomedical and high-throughput expression data. High ranked network components are merged to identify the core-regulatory network(s) that is then subjected to dynamical analysis using stimulus–response and in silico perturbation experiments for the identification of disease gene signatures and therapeutic targets. In a case study, we implemented our workflow to identify bladder and breast cancer specific core-regulatory networks underlying epithelial–mesenchymal transition from the E2F1 molecular interaction map.

In this study, we review our workflow and described how it has developed over time to understand the mechanisms underlying disease progression and prediction of signatures for clinical decision making.

Introduction

Technological advances generate a remarkable variety of data to characterize the components of molecular and cellular systems. When we started our work in cancer systems biology 20 years ago, we used these data to describe small networks, in particular signaling pathways. The argument has been that these small networks are involved in cellular functions (e.g. apoptosis) that are in turn relevant to higher level process (e.g. metastasis). Over the years, the variety of data becoming available grow exponentially, making it increasingly difficult to rely only on small models built from ordinary differential equations (ODEs). Twenty years on, the boundary between bioinformatics and systems biology approaches is blurred, and for most projects we use a variety of methodologies combining statistical analyses, graph theory, dynamical systems theory, Boolean modeling, sequence analyses and database searches.

As we progress with our understanding of the molecular mechanisms underlying diseases like cancer, our approaches are becoming more complex. Given the complexity of cancer, we are still in the business of ‘intelligent guessing’ and it would be irresponsible to see another major breakthrough right around the corner. In our work, a noticeable trend is the development of ‘workflows’ that combine various approaches and integrate a variety of data. The iterative cycle of data-driven modeling and model-driven experimentation, which characterized systems biology 20 years ago, is still there but embedded in a workflow, designed and implemented by data scientists that handle tools from machine learning to systems theory.

For the present text, we review our efforts toward mechanistic understanding and the development of workflows that combine systems biology and bioinformatics approaches to process data, including (i) tools to identify expression patterns for normal and diseased states; (ii) graph theoretical analyses of biochemical networks to identify important hub nodes and small recurring regulatory motifs; and (iii) dynamical systems theory to simulate the dynamics of core-regulatory networks. Individually these tools and methods cannot capture the entire biological complexity associated with regulatory processes involved in diseases.

For example, the expression patterns identified by data analysis tools cannot explain the underlying mechanisms and causes, the utility of graph theory is confined only to static analysis of the system, and also each of the modeling formalism offered by dynamical systems theory has its own limitations. Integration of multiple systems biology and bioinformatics tools/methods, here named as ‘integrative workflows’, are a promising tool to analyze large-scale and complex biochemical networks to provide insights into complex diseases. The integrative methods in systems biology are scarce but highly demanded [1,2].

In this context, we describe an integrative workflow that we have previously developed and published [3] by combining heterogeneous sources of biological information to analyze large-scale biochemical networks to reveal mechanisms underlying disease progression (see Figure 1). The workflow combines network structural analysis with high-throughput and biomedical data to identify tumor-type specific ‘core-regulatory networks’ that are amenable for analysis with dynamical systems theory. Using suitable modeling formalism, the dynamics of molecular interactions in core networks can be analyzed to detect disease specific molecular signatures (i.e. sets of network derived diagnostic/prognostic biomarkers) and therapeutic targets. The proposed molecular signatures and drug targets are validated by clinical data that are further subjected to in vitro experiments. The integrative workflow cannot provide an exact representation of cellular events but it can guide the formulation and falsification of hypotheses and suggest validation experiments.

Network-based integrative workflow

Figure 1
Network-based integrative workflow

The blue boxes represent the processes or operations, the red boxes represent the data sources and the green boxes represent the output of the workflow.

Figure 1
Network-based integrative workflow

The blue boxes represent the processes or operations, the red boxes represent the data sources and the green boxes represent the output of the workflow.

Here, we review our own workflow in detail that was applied to a large-scale E2F1 molecular interaction map, created over several years and which was then used to identify core-regulatory networks followed by distinctive molecular signatures for epithelial–mesenchymal transition (EMT) regulation and potential therapeutic targets in invasive bladder and breast cancer [3].

Integrative workflow for the identification of disease signatures and therapeutic targets

In order to exploit the advantages of large-scale biochemical networks, high-throughput data, in combination with mechanistic modeling, we require integrative approaches and computational workflows to identify disease specific small regulatory/function modules that can be subjected to a more detailed analysis, followed by the prediction of molecular signatures and therapeutic targets. The integrative workflow can be realized in the following steps:

Network construction, curation and annotation

Cellular functionality is realized due to a large variety and number of components interacting in various ways. To elucidate the structure and functional roles of interacting components in a given cellular (mal) function, it is essential to initially organize data and knowledge about them in a network (i.e. vertices connected to edges). Biochemical networks are the formalized representation of large number of experimental studies in a diagrammatic format, which are computer readable and can be analyzed using computational tools. These networks serve as a knowledge base and provide the foundation for computational analyses, providing hints about the causal interactions among components that realize certain functionality. Such networks contain physical associations underlying protein–protein, transcription factor-genes or signaling and metabolic pathways.

Biochemical network construction is a painstaking exercise of manual validation and encoding. Various approaches are used to map out molecular interactions underlying certain biological processes. A number of computational techniques have been developed to infer biochemical networks from high-throughput data [4,5]. Despite their importance, they face many challenges, for example false positive and false negative interactions largely influence the results of network inference. More detailed and highly focused network, centered around particular disease or cellular process, can be constructed by expert domain knowledge (functional and structural information), diligent manual search of published literature and publically available databases, e.g. STRING [6], Reactome [7], IntAct [8], and HPRD [9]. To avoid the laborious manual curation for network construction, some methods are developed to automatically reconstruct networks by retrieving interactions or sub-networks from existing maps and models [10,11]. Combining automatic reconstruction with domain knowledge, manual search of literature and databases would provide a reasonable strategy to construct detailed and fully annotated large-scale biochemical networks. Depending on the context, one can also add various regulatory layers, for example, transcriptional regulation (from TRANSFAC [12], TRRUST [13] and HTRIdb [14]) and post-transcriptional regulation by miRNA and lncRNA (from miRTarBase [15], miRWalk [16], TriplexRNA [17], lncRNADisease [18] and lnc2Catlas [19]).

The standard representation of these maps in the form of Systems Biology Graphical Notation (SBGN) allows for an easy re-usability and exchange. A large number of pathway tools provide SBGN compatibility (see http://sbgn.github.io/sbgn/software_support/). The CellDesigner [20], SBGN-ED [21] and VANTED [22] are the widely used tools to draw them in SBGN format.

Network analysis and motif detection

After gathering and managing information regarding biological systems in the form of interactions networks, structural and dynamical analyses provide useful information about the network architecture and dynamics. The structural analysis of networks allows the identification of functional modules, regulatory motifs (including feedback and feedforward loops) and associated node properties (including node degree (ND) and betweenness centrality (BC)). Node degree is the number of edges (or links) connected to a node. For example, in Figure 2 node ‘B’ has six edges, thus its degree is six (i.e. ND (B) = 6).

Network structural properties

Figure 2
Network structural properties

The gray circles represent nodes of a network connected by edges. Node ‘A’ has highest betweenness centrality that means node ‘A’ will occur in maximum number of shortest paths if information flow from one end to another end. Node ‘B’ has highest degree (i.e. maximum number of edges). The green box represents a module, which is a cluster of closely connected nodes. If these nodes are involved in the regulation of a particular cellular function, then it is called a functional module. The loop formed by nodes B, E and F represents feedback loop; and the loop formed by A, C and D indicates feedforward loop.

Figure 2
Network structural properties

The gray circles represent nodes of a network connected by edges. Node ‘A’ has highest betweenness centrality that means node ‘A’ will occur in maximum number of shortest paths if information flow from one end to another end. Node ‘B’ has highest degree (i.e. maximum number of edges). The green box represents a module, which is a cluster of closely connected nodes. If these nodes are involved in the regulation of a particular cellular function, then it is called a functional module. The loop formed by nodes B, E and F represents feedback loop; and the loop formed by A, C and D indicates feedforward loop.

Similarly, BC identify gate keeper nodes in the communication between different parts in a network. More specifically, BC of a node is the number of shortest paths from all nodes to all others that pass through that node. For example, in Figure 2 node ‘A’ connects different parts of a network, so it gets the highest BC value [23]. In a network G, a node υ that occurs in many of the shortest paths has highest betweenness centrality. The betweenness centrality BC() of a node is computed as follow:

 
formula
(1)

where σij(υ) denotes the total number of shortest paths between i and j that pass through node and σij denotes the total number of shortest paths between i and j.

Further, it is established that biological networks can be described in terms of modules. A module is the aggregation of densely interconnected neighboring nodes as shown by the green box in Figure 2. It is observed that functionally akin nodes are located in close proximity and, thus, form functional modules [24,25]. These functionally related genes can be associated with the same biological pathway and can have similar effects on certain disease phenotypes and may be targeted by structurally similar drugs [26,27]. Many methods are developed to identify novel disease molecules based on this ‘guilt by association’ approach from the biological network. For example, gene prioritization (GP) is a method to identify important genes or proteins in a network based on the network structure and a set of known markers. Instead of local neighborhood, it uses a global diffusion method to identify genes based on random walk from known genes. The genes that are most frequently traversed by random walks from known genes, are assumed to be important that may be associated with a disease or a phenotype, for more detail on gene prioritization see [26].

Further, biological networks are enriched in recurring structural patterns called network motifs [28]. The feedback (in Figure 2 loop form by nodes B, E and F) and feedforward (in Figure 2 loop form by nodes B, C and D) loops are important network motifs. In Figure 2 loop form by nodes B, E and F is a feedback loop and feedforward loop is formed by nodes A, C and D. These network motifs induce non-intuitive behavior and play a crucial role in system dynamics [28,29].

Network structure analysis revealed that the functioning and regulation of a network are governed by a certain set of organizing principles [30]. To understand the mechanisms of these organizing principles, mechanistic dynamical models are used to analyze the systems. The dynamical properties of a network characterize the temporal behavior of a network under certain conditions, which can help to explain the nature of regulation of interacting components in response to stimuli/perturbations that affect the functionality of cells, and ultimately their consequences on the cellular phenotype.

Disease specific core-regulatory network prediction

We have developed a flexible and extendable integrated workflow (Figure 3) that combines network structural analysis with high-throughput data and biomedical data to identify smaller ‘module(s)’ for specific disease phenotypes, we hereafter refer this as ‘core-regulatory network’ that is amenable for analysis with dynamical systems theory [31–33].

Integrative workflow for identification of disease specific core-regulatory network

The workflow prioritizes network recurring structures (i.e. motifs including feedback/feedforward loops) using the network structural properties and high-throughput data, more specifically, it includes network structural parameters for each node (Nn) including (i) node degree (ND), (ii) betweenness centrality (BC), (iii) gene prioritization (GP) and disease specific parameters, such as (iv) relatedness to disease pathway (DP) and (v) high-throughput fold-change (FC) expression data.

The relatedness to disease pathway determines the association of motifs (Mk) with a certain disease pathway as follows:

 
formula
(2)

where n is the number of nodes in a motif present in a disease pathway, and its value is (0 ≤ n ≤ length of a motif). k = 1,…m, for motifs M.

Based on these parameters, we created four prioritization sub-functions (fh) (fND+BC; fGP; fDP; fFC) that are used in multi-objective function in eqn (3) to calculate the ranking score () of a motif:

 
formula
(3)

where h = 1…4 number of sub-functions. w is the user-defined weighting factor for l number of scenarios that are used to optimize the function (F). One can iteratively modify the values of the weighting factors to optimize the function (F) for one or multiple sub-functions. In this case, no single solution exists that simultaneously optimizes the function (F). Thus, one can have number of Pareto optimum solutions. A solution is called Pareto optimal, if none of the objective functions can be improved in value without degrading some of the other objective values. For example, to optimize (maximize) F for each objective function f1 and f2:

 
formula
(4)

where the values of w1 and w2 can be iteratively modified to maximize the function (F) for each set of weighting factor values. The Pareto set is obtained by merging all the non-identical solutions.

In the presented case study, in order to approximate the Pareto set of all non-dominated functions (), we used 13 different weighting scenarios (where one or two sub-functions are highly weighted, i.e., assign high importance for ranking). From the Pareto set of motifs P(Mkl), top ranked motifs (e.g. top 10 highest score motifs) are selected. The core-regulatory network is obtained by taking union of the top ranked motifs. The core-regulatory networks are amenable for dynamical analysis to identify disease specific molecular signatures and therapeutic targets, which are subjected to experimental validation.

In one case study, we used the E2F1 molecular interaction map [3] to identify tumor specific core-regulatory networks for EMT regulation in bladder and breast cancer. We identified a large set of feedback loops (n = 444; 213 positive, 228 negative and 3 neutral feedback loops) from the map using Cytoscape plugin NetDS [66] (v3.0). We considered only three-node feedback loops due to the fact that larger sized network loops are typically composed of one or more smaller sized loops [29,46]. For the disease pathway association, we used KEGG’s cancer disease pathway (KEGG: hsa05200) and based on each node association with disease pathway, the DP score for every motif was calculated. We integrated gene expression profiles of two cancer cell lines for E2F1-driven highly aggressive tumors in bladder and breast. We used the ArrayExpress database (ArrayExpress accession number: E-MTAB-2706) [34] to find suitable gene expression data in non-invasive and invasive bladder and breast cancer cell lines. In particular, we used RT-4 as non-invasive and UM-UC-3 as invasive cell lines in bladder cancer, while MCF-7 as non-invasive and MDA-MB231 as invasive cell lines in breast cancer. For the multi-objective optimization function we used 13 different weighting scenarios and selected top 10 ranked motif in each iteration for the identification of core-regulatory network in bladder and breast cancer (Figure 4) (for more detail, please see the original publication [3]).

Core-regulatory networks that drive tumor invasiveness in (A) bladder and (B) breast cancer

Figure 4
Core-regulatory networks that drive tumor invasiveness in (A) bladder and (B) breast cancer

The networks were generated using Cytoscape plugin NetDS. Nodes in gray color are common in both regulatory core networks, whereas those in white color are unique. Edges in blue color are common in both core networks, while those in black color are unique to each core network.

Figure 4
Core-regulatory networks that drive tumor invasiveness in (A) bladder and (B) breast cancer

The networks were generated using Cytoscape plugin NetDS. Nodes in gray color are common in both regulatory core networks, whereas those in white color are unique. Edges in blue color are common in both core networks, while those in black color are unique to each core network.

Dynamical analysis of the core-regulatory network

Mathematical models are a well-established tool to elucidate the counter-intuitive non-linear dynamical features of complex biochemical networks in living cells. Models are an abstract representation of reality, and thereby their validity and usefulness depend strongly on the context and the assumptions being made. Modeling is the art of making appropriate assumptions. A variety of modeling formalisms have been proposed [35,36] and applied depending on the availability of data, the type of research question as well as the size and the structure of the system. Two commonly used modeling formalisms are: (i) detailed quantitative model of (small) functional modules using, for instance, ODEs [32–34]; and (ii) coarse grained qualitative model of (large) sub-cellular processes that can be encoded through logic-based formalisms [37,38]. If a network is large and contains important non-linear network motifs whose proper characterization is crucial for biological problem under investigation, then a hybrid modeling strategy, which combines ODEs and logic-based models, seems appropriate [39].

The case study presented here, we used a logic-based model to analyze the stimulus–response behavior of regulatory networks in tumor invasion in bladder and breast cancers. The simplest logic-based model is the Boolean model, popularized by Kauffmann [40] where the nodes (X1,…n) of a network can be in one of two states: (i) ‘on’ state (also denoted by 1 or ‘true’ state), which represents the ‘active’ or ‘expressed’ state of the component; and (ii) ‘off’ state (also denoted by 0 and also referred to as the ‘false’ state), which represents the ‘inactive’ or ‘not expressed’ state of the component. The state of each node at t + 1 is determined by a Boolean function (BF) that links the incoming effect from its regulators:

 
formula
(5)

Boolean functions determine the states of the node using Boolean gates (NOT, OR and AND). The Boolean gate NOT is assigned based on the network structure while the rules for OR and AND gates can be determined by training/calibrating the model with qualitative data [41], see Figure 5. Numerous simulation tools are available to simulate logic-based models, for example, the CellCollective [42], CellNetAnalyzer [43], CellNOpt [41], GINSim [44], BoolNet [45], BooleanNet [46], SimBoolNet [47], SQUAD [48] and ADAM [49].

Boolean functions trained with expression data

Figure 5
Boolean functions trained with expression data

The Boolean function (BF) of nodes that have multiple regulators is determined by the expression levels of their regulators. Suppose we have differential fold change expression data from normal to cancer cell for species, as shown in the table on right. The expression levels are binarized as 1 for up expressed and 0 for down expressed. In BF3, species X1 and X4 are regulating X3. The expression levels of X1 and X4 are high but the expression level of X3 is down, therefore logical gate AND is used to represent this relationship. In BF4, X4 is regulated by species X1, X2 and X3, where X1 and X2 are up while X3 is down, therefore, this relationship is represented by an OR gate or (BF4): X4(t+1) = (X1(t) AND X2(t)) OR X3(t).

Figure 5
Boolean functions trained with expression data

The Boolean function (BF) of nodes that have multiple regulators is determined by the expression levels of their regulators. Suppose we have differential fold change expression data from normal to cancer cell for species, as shown in the table on right. The expression levels are binarized as 1 for up expressed and 0 for down expressed. In BF3, species X1 and X4 are regulating X3. The expression levels of X1 and X4 are high but the expression level of X3 is down, therefore logical gate AND is used to represent this relationship. In BF4, X4 is regulated by species X1, X2 and X3, where X1 and X2 are up while X3 is down, therefore, this relationship is represented by an OR gate or (BF4): X4(t+1) = (X1(t) AND X2(t)) OR X3(t).

For the case study, we derived Boolean functions for signals originating from the input layer and their propagation through nodes constituting the regulatory layer of the disease specific core-regulatory networks using Boolean gates (for details see Khan et al. [3]).

We determined the steady states of each variable in the model for different initial values of the input nodes using CellNetAnalyzer [50]. We consider two sets of scenarios characterized by (i) high expression and (ii) low expression of all the input nodes (E2F1 and receptor proteins present in the core-regulatory network). We obtained 64 input vectors for bladder cancer and 128 for breast cancer. Next, we simulated the network to determine the impact of the input vectors on the level of EMT (see Table 1). Our simulation results suggest that when E2F1, TGFBR1 and FGFR1 are simultaneously active, bladder cancer cells become highly invasive (EMT = 3). A similar effect was observed in breast cancer when E2F1, TGFBR2 and EGFR are simultaneously active.

Table 1
The effect of E2F1 and receptor molecules in relevant combinations on the EMT phenotype in bladder and breast cancer model
 
 

Active state of the molecule is represented by ‘1’ and the inactive state by ‘0’. The phenotype output (EMT) can take four ordinal levels ranging from ‘0’ (non-invasive) to ‘3’ (highly invasive). Table (a) is the summary of 64 in silico simulations of bladder cancer. Each row represents the results of eight simulations where for the given Boolean state of E2F1, TGFBR1 and FGFR1, all eight combinations of EGFR, CXCR1 and RARA result in the same phenotypical output. Table (b) is the summary of 128 in silico simulations of breast cancer. Each row represents the results of 16 simulations where for the given Boolean state of E2F1, TGFBR2 and EGFR, all 16 combinations of HMMR, THRB, IL1R1 and RARA result in the same phenotypical output.

Furthermore, we carried out in silico perturbation experiments to identify important nodes that can be exploited for therapeutic interventions. Perturbation experiments were performed for highly invasive phenotype (EMT = 3) by changing the Boolean state of each node in the regulatory layer to reduce invasiveness (Table 2). We used single and double perturbation iteratively and observed the greatest reduction of EMT in the latter case. Our perturbation results suggest that in bladder cancer (i) double knockout of ZEB1 in combination with either SNAI1, TWIST1 or NFKB1; (ii) knockout of ZEB1 and activation of CDH1; or (iii) knockout of SMAD2/3/4 in combination with TWIST1 or NFKB1 reduces EMT to 1. In case of breast cancer, double perturbation by silencing SRC, FN1, SNAI1 and SNAI2 or activation of CDH1 in any of the combinations reduces EMT to 1. All the model-based predictions were experimentally validated in non-invasive and invasive tumor cell lines (for more detail, please see the original publication [3]).

Table 2
Double in silico perturbations of highly invasive (EMT = 3) phenotype in bladder and breast cancer model
 
 

The active state of the molecule is represented by ‘1’ and the inactive state by ‘0’, while the highest EMT phenotype is represented by ‘3’ and low phenotype by ‘1’. (a) and (b) are the perturbation simulations of bladder and breast cancer. The first row of each table represents the molecular signature (i.e. in case of table (a) high expression of E2F1-TGFBR1/2-FGFR1) for highest EMT level. The brown boxes represent the perturbed (knockin = 1 or knockout = 0) state of genes, and their effect on EMT phenotype is shown in the last column.

Patient stratification and validations

Based on the molecular signatures identified through model simulations, we successfully stratified bladder and breast cancer patients into early and advance stages of disease. For a bladder cancer patient cohort, we used data from the Lee Bladder study (GEO ID: GSE13507) [51] and for a breast cancer we used data from the TRANSBIG network (GEO id: GSE7390) generated by Desmedt et al. [52]. We grouped the patients into high- and low-expression profiles of model predicted signatures, i.e. E2F1-TGFBR1-FGFR1 in the case of bladder cancer and E2F1-TGFBR2-EGFR in the case of breast cancer cohort. The progression-free survival probability of each subgroup reveals that patients with high expression of these signatures had low survival probability in comparison with patient subgroup with low expression in respective cancer types (Figure 6A,B).

Kaplan–Meier plots of progression-free survival of patients with bladder & breast cancer

Figure 6
Kaplan–Meier plots of progression-free survival of patients with bladder & breast cancer

Plots (A and B) show the survival curves for patients with high and low expression of combined signatures (E2F1-FGFR1-TGFBR1) in bladder cancer patients and (E2F1-EGFR-TGFBR2) breast cancer patients. In both cases, patients with low expression of the molecular signatures have high mean survival times and vice versa. High expression of molecular signature(s) is represented as ‘_H’ (black curve) and low expression as ‘_L’ (red curve). ‘N’ is the number of patients observed with high/low expression of molecular signatures and ‘MSM’ is the mean survival month form the patient group. P-values shown in the figures are for log rank test. (C and D) Validation of molecular signatures in TCGA bladder and breast cancer cohorts. In the bladder cancer cohort, we first filtered patients diagnosed with early (stage II) and advanced (stage IV) stages. Further in both stages, we identified subsets of patients where expression of genes of the molecular signature (i.e. E2F1, TGFBR1 and FGFR1) was above (signature) or below (signature*) the respective mean expression value. In case of the breast cancer cohort, we classified patients into invasive (basal & Her2) versus less-invasive (luminal A & B) molecular subtypes based on PAM50 stages provided. Afterward, we identified those subsets of patients where expression of genes of the molecular signature (i.e. E2F1, TGFBR2 and EGFR) was above (signature) or below (signature*) the respective mean expression value. We calculated distribution difference (%) of patients within pathological stages in both cancer types.

Figure 6
Kaplan–Meier plots of progression-free survival of patients with bladder & breast cancer

Plots (A and B) show the survival curves for patients with high and low expression of combined signatures (E2F1-FGFR1-TGFBR1) in bladder cancer patients and (E2F1-EGFR-TGFBR2) breast cancer patients. In both cases, patients with low expression of the molecular signatures have high mean survival times and vice versa. High expression of molecular signature(s) is represented as ‘_H’ (black curve) and low expression as ‘_L’ (red curve). ‘N’ is the number of patients observed with high/low expression of molecular signatures and ‘MSM’ is the mean survival month form the patient group. P-values shown in the figures are for log rank test. (C and D) Validation of molecular signatures in TCGA bladder and breast cancer cohorts. In the bladder cancer cohort, we first filtered patients diagnosed with early (stage II) and advanced (stage IV) stages. Further in both stages, we identified subsets of patients where expression of genes of the molecular signature (i.e. E2F1, TGFBR1 and FGFR1) was above (signature) or below (signature*) the respective mean expression value. In case of the breast cancer cohort, we classified patients into invasive (basal & Her2) versus less-invasive (luminal A & B) molecular subtypes based on PAM50 stages provided. Afterward, we identified those subsets of patients where expression of genes of the molecular signature (i.e. E2F1, TGFBR2 and EGFR) was above (signature) or below (signature*) the respective mean expression value. We calculated distribution difference (%) of patients within pathological stages in both cancer types.

We also validated molecular signatures in large bladder cancer (BLCA; n=426) and breast cancer (BRCA; n=1218) cohorts available at TCGA data repositories accessible through UCSC Xena http://xena.ucsc.edu. The simulation based-signatures were able to distribute patients into early versus advanced stages in bladder and breast cancer (P-value < 0.005) (Figure 6C,D).

Conclusion

The approach and methodology for the identification of disease gene signature and therapeutic targets have changed drastically over the past decade. On the one hand, high-throughput technological advances resulted in heterogeneous data to understand the mechanisms of complex, multi-factorial diseases. On the other hand, integration, interpretation and understanding of these heterogeneous datasets are becoming more and more challenging. With the advanced network and database interaction tools (e.g. Cytoscape Bisogenet), it is easy to populate disease networks with multiple regulatory layers and large number of interactions. However, methods and workflows to identify key disease gene signatures and therapeutic targets from large disease networks are still in the development stage. In this study, we reviewed our workflow that combines parameters from network structure analysis, disease-related biomedical information and high-throughput expression data to identify disease specific core-regulatory network(s) amenable for dynamical analyses. The major bottleneck to our workflow is the preparation of a directed regulatory network. This requires huge manual efforts as many state-of-art interaction databases provide false positive/negative interactions complied by text-mining algorithms from published literature. We believe that new methods will be available soon for defining regulatory directions to interacting bio-molecules. This may be possible by extracting regulatory information from databases (e.g. BioModels database) and/or by mapping time series expression data of interacting bio-molecules. Once a directed map is prepared, other steps of our workflow can be easily implemented to determine the core-regulatory network driving the disease phenotype. Using a case study on tumor metastasis driven by E2F1, we presented and discussed various steps of our workflow that can be implemented for any complex disease/clinical phenotype under investigation.

Summary

  • With the advancement in high-throughput technology, heterogeneous data are available to construct disease networks composed of large number of interactions.

  • We reviewed our workflow that combines parameters from network structure analysis, disease-related biomedical information and high-throughput expression data to identify disease specific core-regulatory network(s).

  • With a case study, we demonstrated logic-based dynamical analyses (stimulus–response and perturbations) of core-regulatory networks for predicting disease gene signatures and therapeutic targets.

We are very thankful to Prof. Brigitte M. Pützer group for their huge efforts in the E2F1 map construction and the experimental validations of in silico predictions.

Funding

This work was supported by the German Federal Ministry of Education and Research (BMBF) as part of the project eBio:SysMet [0316171]; eBio:MelEVIR [031L0073B]; and the University of Rostock, Rostock Germany.

Competing Interests

The authors declare that there are no competing interests associated with the manuscript.

Abbreviations

     
  • EMT

    epithelial–mesenchymal transition

  •  
  • ODE

    ordinary differential equation

References

References
1
Mitra
K.
,
Carvunis
A.-R.
,
Ramesh
S.K.
and
Ideker
T.
(
2013
)
Integrative approaches for finding modular structure in biological networks
.
Nat. Rev. Genet.
14
,
719
732
[PubMed]
2
Karczewski
K.J.
and
Snyder
M.P.
(
2018
)
Integrative omics for health and disease
.
Nat. Rev. Genet.
19
,
299
310
[PubMed]
3
Khan
F.M.
,
Marquardt
S.
,
Gupta
S.K.
,
Knoll
S.
,
Schmitz
U.
,
Spitschak
A.
et al.  (
2017
)
Unraveling a tumor type-specific regulatory core underlying E2F1-mediated epithelial-mesenchymal transition to predict receptor protein signatures
.
Nat. Commun.
8
,
198
[PubMed]
4
Zahiri
J.
,
Bozorgmehr
J.H.
and
Masoudi-Nejad
A.
(
2013
)
Computational Prediction of Protein-Protein Interaction Networks: Algo-rithms and Resources
.
Curr. Genomics
14
,
397
414
[PubMed]
5
Lee
W.-P.
and
Tzou
W.-S.
(
2009
)
Computational methods for discovering gene networks from expression data
.
Brief. Bioinform.
10
,
408
423
[PubMed]
6
Franceschini
A.
,
Szklarczyk
D.
,
Frankild
S.
,
Kuhn
M.
,
Simonovic
M.
,
Roth
A.
et al.  (
2013
)
STRING v9.1: protein-protein interaction networks, with increased coverage and integration
.
Nucleic Acids Res.
41
,
D808
D815
[PubMed]
7
Joshi-Tope
G.
,
Gillespie
M.
,
Vastrik
I.
,
D’Eustachio
P.
,
Schmidt
E.
,
de Bono
B.
et al.  (
2005
)
Reactome: a knowledgebase of biological pathways
.
Nucleic Acids Res.
33
,
D428
D432
[PubMed]
8
Orchard
S.
,
Ammari
M.
,
Aranda
B.
,
Breuza
L.
,
Briganti
L.
,
Broackes-Carter
F.
et al.  (
2014
)
The MIntAct project-IntAct as a common curation platform for 11 molecular interaction databases
.
Nucleic Acids Res.
42
,
D358
D363
[PubMed]
9
Keshava Prasad
T.S.
,
Goel
R.
,
Kandasamy
K.
,
Keerthikumar
S.
,
Kumar
S.
,
Mathivanan
S.
et al.  (
2009
)
Human Protein Reference Database-2009 update
.
Nucleic Acids Res.
37
,
D767
D772
[PubMed]
10
Ritz
A.
,
Poirel
C.L.
,
Tegge
A.N.
,
Sharp
N.
,
Simmons
K.
,
Powell
A.
et al.  (
2016
)
Pathways on demand: automated reconstruction of human signaling networks
.
NP. Syst. Biol. Appl.
2
,
16002
[PubMed]
11
Supper
J.
,
Spangenberg
L.
,
Planatscher
H.
,
Dräger
A.
,
Schröder
A.
and
Zell
A.
(
2009
)
BowTieBuilder: modeling signal transduction pathways
.
BMC Syst. Biol.
3
,
67
[PubMed]
12
Wingender
E.
,
Chen
X.
,
Hehl
R.
,
Karas
H.
,
Liebich
I.
,
Matys
V.
et al.  (
2000
)
TRANSFAC: an integrated system for gene expression regulation
.
Nucleic Acids Res.
28
,
316
319
[PubMed]
13
Han
H.
,
Cho
J.-W.
,
Lee
S.
,
Yun
A.
,
Kim
H.
,
Bae
D.
et al.  (
2018
)
TRRUST v2: an expanded reference database of human and mouse transcriptional regulatory interactions
.
Nucleic Acids Res.
46
,
D380
D386
[PubMed]
14
Bovolenta
L.A.
,
Acencio
M.L.
and
Lemke
N.
(
2012
)
HTRIdb: an open-access database for experimentally verified human transcriptional regulation interactions
.
BMC Genomics
13
,
405
[PubMed]
15
Hsu
S.-D.
,
Lin
F.-M.
,
Wu
W.-Y.
,
Liang
C.
,
Huang
W.-C.
,
Chan
W.-L.
et al.  (
2011
)
miRTarBase: a database curates experimentally validated microRNA-target interactions
.
Nucleic Acids Res.
39
,
D163
D169
[PubMed]
16
miRWalk - Database: Prediction of possible miRNA binding sites by “walking” the genes of three genomes - ScienceDirect [Internet] [cited April 24, 2018], https://www.sciencedirect.com/science/article/pii/S1532046411000785
17
Schmitz
U.
,
Lai
X.
,
Winter
F.
,
Wolkenhauer
O.
,
Vera
J.
and
Gupta
S.K.
(
2014
)
Cooperative gene regulation by microRNA pairs and their identification using a computational workflow
.
Nucleic Acids Res.
42
,
7539
7552
[PubMed]
18
Chen
G.
,
Wang
Z.
,
Wang
D.
,
Qiu
C.
,
Liu
M.
,
Chen
X.
et al.  (
2013
)
LncRNADisease: a database for long-non-coding RNA-associated diseases
.
Nucleic Acids Res.
41
,
D983
D986
[PubMed]
19
Ren
C.
,
An
G.
,
Zhao
C.
,
Ouyang
Z.
,
Bo
X.
and
Shu
W.
(
2018
)
Lnc2Catlas: an atlas of long noncoding RNAs associated with risk of cancers
.
Sci. Rep.
8
,
1909
[PubMed]
20
Funahashi
A.
,
Morohashi
M.
,
Kitano
H.
and
Tanimura
N.
(
2003
)
CellDesigner: a process diagram editor for gene-regulatory and biochemical networks
.
BIOSILICO
1
,
159
162
21
Czauderna
T.
,
Klukas
C.
and
Schreiber
F.
(
2010
)
Editing, validating and translating of SBGN maps
.
Bioinformatics
26
,
2340
2341
[PubMed]
22
Rohn
H.
,
Junker
A.
,
Hartmann
A.
,
Grafahrend-Belau
E.
,
Treutler
H.
,
Klapperstück
M.
et al.  (
2012
)
VANTED v2: a framework for systems biology applications
.
BMC Syst. Biol.
6
,
139
[PubMed]
23
Barabási
A.-L.
and
Oltvai
Z.N.
(
2004
)
Network biology: understanding the cell’s functional organization
.
Nat. Rev. Genet.
5
,
101
113
[PubMed]
24
Kotlyar
M.
,
Fortney
K.
and
Jurisica
I.
(
2012
)
Network-based characterization of drug-regulated genes, drug targets, and toxicity
.
Methods
57
,
499
507
[PubMed]
25
Mitra
K.
,
Carvunis
A.-R.
,
Ramesh
S.K.
and
Ideker
T.
(
2013
)
Integrative approaches for finding modular structure in biological networks
.
Nat. Rev. Genet.
14
,
719
732
[PubMed]
26
Wang
X.
,
Gulbahce
N.
and
Yu
H.
(
2011
)
Network-based methods for human disease gene prediction
.
Brief. Funct. Genomics
10
,
280
293
[PubMed]
27
Oti
M.
and
Brunner
H.
(
2007
)
The modular nature of genetic diseases
.
Clin. Genet.
71
,
1
11
[PubMed]
28
Alon
U.
(
2007
)
Network motifs: theory and experimental approaches
.
Nat. Rev. Genet.
8
,
450
461
[PubMed]
29
Yeger-Lotem
E.
,
Sattath
S.
,
Kashtan
N.
,
Itzkovitz
S.
,
Milo
R.
,
Pinter
R.Y.
et al.  (
2004
)
Network motifs in integrated cellular networks of transcription-regulation and protein-protein interaction
.
Proc. Natl. Acad. Sci. U.S.A.
101
,
5934
5939
30
Kitano
H.
(
2007
)
A robustness-based approach to systems-oriented drug design
.
Nat. Rev. Drug Discov.
6
,
202
210
[PubMed]
31
Tyson
J.J.
,
Chen
K.C.
and
Novak
B.
(
2003
)
Sniffers, buzzers, toggles and blinkers: dynamics of regulatory and signaling pathways in the cell
.
Curr. Opin. Cell Biol.
15
,
221
231
[PubMed]
32
Raia
V.
,
Schilling
M.
,
Böhm
M.
,
Hahn
B.
,
Kowarsch
A.
,
Raue
A.
et al.  (
2011
)
Dynamic mathematical modeling of IL13-induced signaling in Hodgkin and primary mediastinal B-Cell lymphoma allows prediction of therapeutic targets
.
Cancer Res.
71
,
693
704
[PubMed]
33
Zi
Z.
and
Klipp
E.
(
2007
)
Constraint-based modeling and kinetic analysis of the smad dependent TGF-β signaling pathway
.
PLoS One
2
,
e936
[PubMed]
34
Klijn
C.
,
Durinck
S.
,
Stawiski
E.W.
,
Haverty
P.M.
,
Jiang
Z.
,
Liu
H.
et al.  (
2015
)
A comprehensive transcriptional portrait of human cancer cell lines
.
Nat. Biotechnol.
33
,
306
312
[PubMed]
35
Le Novère
N.
(
2015
)
Quantitative and logic modelling of molecular and gene networks
.
Nat. Rev. Genet.
16
,
146
158
[PubMed]
36
Alfieri
R.
,
Bartocci
E.
,
Merelli
E.
and
Milanesi
L.
(
2011
)
Modeling the cell cycle: from deterministic models to hybrid systems
.
Biosystems
105
,
34
40
[PubMed]
37
Zañudo
J.G.T.
,
Steinway
S.N.
and
Albert
R.
(
2018
)
Discrete dynamic network modeling of oncogenic signaling: mechanistic insights for personalized treatment of cancer
.
Curr. Opin. Syst. Biol.
9
,
1
10
38
Wang
R.-S.
,
Saadatpour
A.
and
Albert
R.
(
2012
)
Boolean modeling in systems biology: an overview of methodology and applications
.
Phys. Biol.
9
,
055001
[PubMed]
39
Khan
F.M.
,
Schmitz
U.
,
Nikolov
S.
,
Engelmann
D.
,
Pützer
B.M.
,
Wolkenhauer
O.
et al.  (
2014
)
Hybrid modeling of the crosstalk between signaling and transcriptional networks using ordinary differential equations and multi-valued logic
.
Biochim. Biophys. Acta
1844
,
289
298
40
Kauffman
S.A.
(
1969
)
Metabolic stability and epigenesis in randomly constructed genetic nets
.
J. Theor. Biol.
22
,
437
467
[PubMed]
41
Terfve
C.
,
Cokelaer
T.
,
Henriques
D.
,
MacNamara
A.
,
Goncalves
E.
,
Morris
M.K.
et al.  (
2012
)
CellNOptR: a flexible toolkit to train protein signaling networks to data using multiple logic formalisms
.
BMC Syst. Biol.
6
,
133
[PubMed]
42
Helikar
T.
,
Kowal
B.
,
McClenathan
S.
,
Bruckner
M.
,
Rowley
T.
,
Madrahimov
A.
et al.  (
2012
)
The Cell Collective: Toward an open and collaborative approach to systems biology
.
BMC Syst. Biol.
6
,
96
[PubMed]
43
Klamt
S.
,
Saez-Rodriguez
J.
and
Gilles
E.D.
(
2007
)
Structural and functional analysis of cellular networks with CellNetAnalyzer
.
BMC Syst. Biol.
1
,
2
[PubMed]
44
Chaouiya
C.
,
Naldi
A.
and
Thieffry
D.
(
2012
)
Logical modelling of gene regulatory networks with GINsim
.
Methods Mol. Biol. Clifton NJ
804
,
463
479
45
Müssel
C.
,
Hopfensitz
M.
and
Kestler
H.A.
(
2010
)
BoolNet-an R package for generation, reconstruction and analysis of Boolean networks
.
Bioinforma. Oxf. Engl.
26
,
1378
1380
46
Albert
I.
,
Thakar
J.
,
Li
S.
,
Zhang
R.
and
Albert
R.
(
2008
)
Boolean network simulations for life scientists
.
Source Code Biol. Med.
3
,
16
[PubMed]
47
Zheng
J.
,
Zhang
D.
,
Przytycki
P.F.
,
Zielinski
R.
,
Capala
J.
and
Przytycka
T.M.
(
2010
)
SimBoolNet-a Cytoscape plugin for dynamic simulation of signaling networks
.
Bioinforma Oxf. Engl.
26
,
141
142
48
Di Cara
A.
,
Garg
A.
,
De Micheli
G.
,
Xenarios
I.
and
Mendoza
L.
(
2007
)
Dynamic simulation of regulatory networks using SQUAD
.
BMC Bioinformatics
8
,
462
[PubMed]
49
Hinkelmann
F.
,
Brandon
M.
,
Guang
B.
,
McNeill
R.
,
Blekherman
G.
,
Veliz-Cuba
A.
et al.  (
2011
)
ADAM: Analysis of Discrete Models of Biological Systems Using Computer Algebra
.
BMC Bioinformatics
12
,
295
[PubMed]
50
Klamt
S.
,
Saez-Rodriguez
J.
,
Lindquist
J.A.
,
Simeoni
L.
and
Gilles
E.D.
(
2006
)
A methodology for the structural and functional analysis of signaling and regulatory networks
.
BMC Bioinformatics
7
,
56
[PubMed]
51
Lee
J.-S.
,
Leem
S.-H.
,
Lee
S.-Y.
,
Kim
S.-C.
,
Park
E.-S.
,
Kim
S.-B.
et al.  (
2010
)
Expression signature of E2F1 and its associated genes predict superficial to invasive progression of bladder tumors
.
J. Clin. Oncol. Off. J. Am. Soc. Clin. Oncol.
28
,
2660
2667
52
Desmedt
C.
,
Piette
F.
,
Loi
S.
,
Wang
Y.
,
Lallemand
F.
,
Haibe-Kains
B.
et al.  (
2007
)
Strong time dependence of the 76-gene prognostic signature for node-negative breast cancer patients in the TRANSBIG multicenter independent validation series
.
Clin. Cancer Res. Off. J. Am. Assoc. Cancer Res.
13
,
3207
3214