A novel prognostic risk score model based on immune-related genes in patients with stage IV colorectal cancer

Abstract Purpose: The aims of the present study were to explore immune-related genes (IRGs) in stage IV colorectal cancer (CRC) and construct a prognostic risk score model to predict patient overall survival (OS), providing a reference for individualized clinical treatment. Methods: High-throughput RNA-sequencing, phenotype, and survival data from patients with stage IV CRC were downloaded from TCGA. Candidate genes were identified by screening for differentially expressed IRGs (DE-IRGs). Univariate Cox regression, LASSO, and multivariate Cox regression analyses were used to determine the final variables for construction of the prognostic risk score model. GSE17536 from the GEO database was used as an external validation dataset to evaluate the predictive power of the model. Results: A total of 770 candidate DE-IRGs were obtained, and a prognostic risk score model was constructed by variable screening using the following 12 genes: FGFR4, LGR6, TRBV12-3, NUDT6, MET, PDIA2, ORM1, IGKV3D-20, THRB, WNT5A, FGF18, and CCR8. In the external validation set, the survival prediction C-index was 0.685, and the AUC values were 0.583, 0.731, and 0.837 for 1-, 2- and 3-year OS, respectively. Univariate and multivariate Cox regression analyses demonstrated that the risk score model was an independent prognostic factor for patients with stage IV CRC. High- and low-risk patient groups had significant differences in the expression of checkpoint coding genes (ICGs). Conclusion: The prognostic risk score model for stage IV CRC developed in the present study based on immune-related genes has acceptable predictive power, and is closely related to the expression of ICGs.


Introduction
Colorectal cancer (CRC) ranks third for both morbidity and mortality among cancers worldwide [1]. The main cause of death in patients with CRC is distant metastasis. Currently, most CRC patients (approximately 50-60%) have stage IV disease at initial diagnosis [2][3][4]. Remarkably, stage IV CRC has a better prognosis, relative to other metastatic gastrointestinal malignancies [1]. At present, both the European Society for Medical Oncology (ESMO) and the National Comprehensive Cancer Network (NCCN) have established specific clinical practice guidelines for patients with stage IV CRC and recommended the stratified management of such patients with different disease characteristics, typifying current connotation of 'precision medicine' . CRC is a heterogeneous disease, and prognosis varies significantly among patients. Thus, development of prognostic risk models can further improve treatment strategies for stratified management; however, most existing prognostic risk models were constructed based on data from patients with CRC overall or those with postoperative stage II CRC [5][6][7][8]. Hence, it is necessary to establish a prognostic prediction model for individual with stage IV CRC. Malignant tumors can induce immune tolerance through a variety of mechanisms, as well as evading immune killing, leading to disease progression and directly affecting patient prognosis. In the process of immune tolerance induction, immune-related genes (IRGs) expressed by tumor cells directly influence the functions of various immune cells [9,10]. In CRC, some IRGs are confirmed to affect the development of tumor cells via immune regulation, such as FAP, high expression of FAP can lead to enrichment of macrophages, monocytes, and Tregs, as well as depletion of Th1 cells and natural killer T cells, generating to an immunosuppressive environment in CRC cells [11]. In another example, high expression of SMAD7 are associated with an inflamed gut, with immune cells infiltration that can elicit antitumor responses, thereby inhibiting CRC cell growth [12]. Hence, construction of a tumor prognostic model based on IRGs is a robust, widely recognized approach; however, no such study has been conducted for stage IV CRC. Therefore, in the present study we mined for IRGs involved in stage IV CRC and constructed a prognostic risk score model based on the identified genes.

Screening of candidate genes, Gene Ontology (GO) term enrichment analysis, Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway analysis, protein-protein interaction (PPI) network analysis, and identification of core genes
The limma package was used to analyze differentially expressed genes (DEGs) between tumor and paracancer tissue samples from TCGA [13]. Genes with |log2 fold changes (FC)| ≥ 1 and adjusted P<0.05 were considered as significantly DEGs. Candidate differentially expressed IRGs (DE-IRGs) were obtained by taking the intersection of DEGs and IRGs. The Ggplot2, enrichplot, and clusterprofiler packages were used to perform GO and KEGG analyses, with candidate DE-IRGs as input [14]. GO analysis included three parts: cellular component (CC), molecular function (MF), and biological process (BP). PPI Network analysis was performed by inputting candidate DE-IRGs into the STRING database (https://string-db.org/) [15]. A value of 0.7 was set as the cutoff for confidence scores. Cytoscape software was used to further analyze the resulting PPI network [16]. Finally, genes with higher node degrees were identified as hub genes.

Establishment and verification of prognostic risk score model
Two datasets, including data from 85 and 38 patients with stage IV CRC from TCGA and GSE17536, were used as the training and verification sets, respectively, for establishment of the prognostic risk score model. In the training set, univariate Cox, LASSO Cox, and multivariate regression analyses were used to select the minimum gene subset [17]. Multivariate regression analysis was performed using the stepwise method. The prognostic score model formula, established after the identification of variables and risk score calculation, was as follows: The model was validated using the validation set. The survminer package was used for survival analysis of patients in the high-and low-risk groups, while the survival ROC package was used to construct time dependent receiver operating characteristic (ROC) curves and calculate the area under the ROC curve (AUC) at 1-, 2-, and 3-year overall survival (OS). Univariate and multivariate Cox regression survival analyses were performed by adding the factor of risk score in the training set.

Immune-related characteristics of high-and low-risk patients based on the prognostic risk score model
Data from 85 patients with stage IV CRC patients from TCGA were divided into high-and low-risk groups by setting the median risk score as the cut-off value. The CIBERSORT algorithm was used to analyze the proportion of 22 immune cells types in tumor tissue samples from the two groups [18]. A Mann-Whitney test was used to evaluate the significance of differences in the composition of the 22 immune cells types and expression levels of immune checkpoint coding genes (ICGs) between the high-and low-risk groups. Pairwise relationships between the expression levels of ICGs were analyzed using Spearman correlation analyses in the ggcorrplot package. P<0.05 was considered statistically significant. Finally, the TISIDB database (http://cis.hku.hk/TISIDB/index.php) was employed to analyze the relationship between clinical outcomes and expression levels of the 12 genes included in the prognostic risk score model [19].

Statistical analysis
R software v3.6.1 (www.r-project.org) was used for statistical analyses in the present study. Numerical variables were described as means + − SD. The Kaplan-Meier method was used for survival analysis, and the log-rank test applied to evaluate differences between groups. Mann-Whitney test was used for nonparametric comparisons. All statistical tests were two-tailed; P<0.05 was considered to indicate a significant difference.

Clinical characteristics of patients with stage IV CRC
Analyzed clinical characteristics of patients with stage IV CRC from TCGA included age, sex, ethnicity, primary site, carcinoembryonic antigen level (CEA), T stage, N stage, overall survival, and survival status. Among the 85 patients with stage IV CRC with complete clinical information in TCGA, mean age was 64.05 + − 12.82 years, the proportion of males was 58.82%, and median OS was 16.8 months. As shown in Figure 1A, there was a significant difference in survival between the 85 patients with stage IV CRC and the remaining 493 patients with non-stage IV disease in TCGA (P<0.0001). The 38 patients with stage IV CRC from GSE17536 had an average age of 64.55 + − 11.08 years, 65.79% were male, and median OS was 16.45 months. Details for each clinical parameter are presented in Table 1.

Identification of candidate IRGs for stage IV CRC
Comparison of gene expression levels between tumor and paracancer tissue samples identified a total of 12,938 DEGs, including 6931 up-regulated and 6007 down-regulated genes. A total of 199 up-regulated and 571 down-regulated genes were obtained on taking intersection of DEGs and 1811 IRGs, which were subjected to further analysis as candidate DE-IRGs for stage IV CRC ( Figure 1B-D).

GO enrichment analysis, KEGG pathway enrichment analysis, PPI network construction, and identification of hub genes
GO enrichment analysis of the 770 candidate DE-IRGs from stage IV CRC indicated that lymphocyte-mediated immunity, immunoglobulin complex, and receptor ligand activity were the most significantly functionally enriched of BP, CC, and MF, respectively. Cytokine-cytokine receptor interaction was the most relevant pathway identified by KEGG enrichment analysis (Figure 2A,B and Table 2). A PPI network containing 203 nodes and 999 edges was constructed based on these candidate DE-IRGs. The top 30 genes with the largest number of adjacent nodes were visualized. After a comprehensive literature review, 10 of these genes were identified as hub genes, including: POMC, FLT3, CCR9, FCGR2B, MPO, CCL28, OPRM1, GNAI1, HTR3A, and FGF10 ( Figure 2C).

Establishment of prognostic risk score model
As shown in Figure 3, univariate Cox regression analysis was performed using the 770 candidate stage IV CRC DE-IRGs to search for genes related to survival, resulting in identification of 63 genes in total. LASSO was used to further narrow down correlated genes, with 42 genes extracted. Finally, 12 genes were determined as variables for use in construction of the prognostic risk score model by multivariate Cox regression analysis. The calculation formula was as follows:

Validation of the prognostic risk score model
Patient data in the training and validation sets were divided into the high-and low-risk groups, based on their respective median risk scores. Significant survival differences were detected between patients in the high-and low-risk sets in both the training and validation sets (P<0.0001, P=0.0096, respectively). In the training set, the Harrell's C-index for survival prediction was 0.701, and the AUC values for 1-, 2-, and 3-year OS were 0.955, 0.940, and 0.957, respectively. In the external validation set, Harrell's C-index for survival prediction was 0.685, and the AUC values for 1-, 2-, and 3-year OS were 0.583, 0.731, and 0.837, respectively. Univariate and multivariate regression analyses were carried out by adding the factor of risk score in the training set. Only age and risk score were independent predictive factors for OS (P=0.003 and P<0.001, respectively) ( Figure 4).

Immune-related characteristics of high-and low-risk patients, based on the prognostic risk score model
Except for CD200R1, PDCD1, TIGIT, TNFRSF9, and VTCN1 (P=0.541, 0.098, 0.095, 0.527, and 0.459, respectively), all of the other 42 ICGs showed significant differences in gene expression between high-and low-risk patients ( Figure 5). Due to the significant relationship between risk score and ICGs, all 12 genes included in the model were individually analyzed in the TISIDB database. A significant relationship was detected between FGFR4, LGR6, PDIA2, ORM1, and WNT5A expression differences and treatment response in immunotherapy clinical trials (Table 3). However, there was no significant difference in the proportion of the 22 immune cell types analyzed between high-and low-risk patients ( Figure 6).

Discussion
Stage IV CRC is commonly referred to as metastatic colorectal cancer (MCRC) and the overall 5-year survival rate for patients with this condition is 40-64% [20,21]. Autopsies demonstrated that liver metastases are the leading cause of death in such patients [22]. In 2016, ESMO proposed the concept of 'no evidence of disease (NED)' to the diagnosis and treatment strategy for MCRC, representing a break from the traditional understanding of stage IV CRC [23]. In recent years, the efficacy of drugs such as trifluridine, tipiracil, bevacizumab, cetuximab, regorafenib, encorafenib, nivolumab, and pembrolizumab has been evaluated and confirmed in different subgroups of patients with stage IV CRC [24][25][26][27][28][29]. OS of these patients is expected to be extended in the future and it is essential to make prognostic predictions, to individualize their treatment plans. Nevertheless, no such method is available in clinical practice, with only stage II and overall CRC previously investigated in this research field [5][6][7][8]. While only a few prognostic factors for stage IV CRC have been reported [30,31]. In the present study, we constructed a prognostic risk score model for patients with stage IV CRC, based on IRGs, as well as screening for immune-related hub genes during the construction process. POMC, CCR9, FCGR2B, CCL28, OPRM1, GNAI1, HTR3A, and FGF10 are all genes rarely   reported in CRC research. POMC has a role in tumor development, by promoting epithelial-mesenchymal transition (EMT) [32][33][34]. It also has been reported as an independent prognostic factor in surgically resected non-small cell lung cancer [35]. CCR9/CCL25 also has an important role in tumor invasion and metastasis by promoting EMT in liver, non-small cell lung cancer, and breast cancer cell lines [36,37]. More interestingly, intratumoral delivery of CCL25 can enhance PD-L1/PD-1 antibody therapy against triple-negative breast cancer by recruiting CCR9+ T cells [38]. Regarding CCL28, recent studies have found that blocking of β-catenin-CCL28 can reduce Treg cell infiltration and inhibit tumor growth. Therefore, the β-catenin-CCL28-Treg cell axis may be involved in immunosuppression [39]. FGF10 promotes tumor invasion and metastasis by binding to FGFR2 in pancreatic cancer and breast cancer cell lines [40,41]. Of the 12 genes included in the prognostic risk score model, only FGFR4, MET, WNT5A, and CCR8 have been studied in relation to tumor immunity. Specifically, FGFR4 can act directly on tumor cells to influence paracrine signaling, angiogenesis, and immune escape, normalizing the tumor microenvironment [42]. MET is a tumor-associated antigen facilitating tumor cell recognition by CD8+ cytotoxic T cells, which triggers immune system activation [43]. WNT5A has dual effects on the tumor microenvironment; it can promote inflammation and chemotaxis of immune cells through activation of the ROR1/Akt/P65 pathway, as well as stimulating the TLR/MyD88/p50 pathway in myelomonocytic cells, to promote synthesis of the anti-inflammatory cytokine, IL-10 [44]. Further, Tregs positive for CCR8 expression are major drivers of immune regulation, and CCL1 can induce CCR8 expression, thus enhancing the suppressive function [45]. The effect of these genes in the development and progression of CRC warrant further study, particularly in the context of tumor induced immune tolerance.
The prognostic risk score model developed in the present study included 12 IRGs. In the training set, Harrell's C-index for survival prediction was 0.701, with AUC values for 1-, 2-, and 3-year OS of 0.955, 0.940, and 0.957, respectively. In the external validation set, Harrell's C-index for survival prediction was 0.685, with AUC values for 1-, 2-, and 3-year OS of 0.583, 0.731, and 0.837, respectively. Compared with the three published prognostic models for overall CRC based on IRGs, this model contains completely different genes and has good predictive power, particularly for 2-and 3-year OS [46][47][48]. In addition, patients classified as high-and low-risk according to this model showed significant differences in ICGs expression. We further explored all 12 genes in this model in the TISIDB database and found that differential expression of LGR6, PDIA2, ORM1, FGFR4, and WNT5A were associated with treatment response to immunotherapy in past clinical trials. These findings imply that our risk score model potentially has the ability to predict immunotherapy responses. Further, the relationship between the genes included in this model and current anti-PD1 and anti-PDL1 treatments warrants further exploration.
Other researchers have constructed CRC prognostic risk model based on DNA methylation markers, autophagy-related genes, and hypoxia-related genes, among others [49][50][51], and most generated acceptable results; however, no perfect prognostic model has been achieved, due to individual differences among patients and unknown disease pathogenesis. For patients with stage IV CRC, it will be necessary to construct additional prognostic risk models in the future, based on different methods and perspectives. In recent years, the important role of the tumor immune microenvironment in the occurrence and development of malignant tumors has attracted wide attention. Since both tumor and host are directly affected by genomic factors, it is feasible to predict immunotherapy responses and prognosis of patients with CRC, based on ICGs. To our knowledge, this is the first prognostic risk score model for patients with stage IV CRC. Furthermore, the predictive power of this model could be further improved by inclusion of relevant clinical parameters.
The limitations of the present study include its reliance on online data analysis and relatively small sample size. More in-depth analyses could not be conducted in the present study, due to the lack of basic experiments. Consequently, the predictive power of this risk score model requires further verification in multi-center and prospective research investigations.

Conclusions
The prognostic risk score model for stage IV CRC developed in the present study based on immune-related genes has acceptable predictive power, and is closely related to the expression of ICGs.