A robust immune-related lncRNA signature for the prognosis of human colorectal cancer

Abstract Background: Colorectal cancer (CRC) is one of the most prevalent malignant cancers worldwide. Immune-related long non-coding RNAs (IRlncRNAs) are proved to be essential in the development and progression of carcinoma. The purpose of the present study was to develop and validate a prognostic IRlncRNA signature for CRC patients. Methods: Gene expression profiles of CRC samples were downloaded from The Cancer Genome Atlas (TCGA) database. Immune-related genes were obtained from the ImmPort database and were used to identify IRlncRNA by correlation analysis. Through LASSO Cox regression analyses, a prognostic signature was constructed. Functional enrichment analysis was performed by gene set enrichment analysis (GSEA). TIMER2.0 web server and tumor immune dysfunction and exclusion (TIDE) algorithm were employed to analyze the association between our model and tumor-infiltrating immune cells and immunotherapy response. The expression levels of IRlncRNAs in cell lines were detected by quantitative real-time PCR (qPCR). Results: A 9-IRlncRNA signature was developed by a LASSO Cox proportional regression model. Based on the signature, CRC patients were divided into high- and low-risk groups with different prognoses. GSEA results indicated that patients in high-risk group were associated with cancer-related pathways. In addition, patients in low-risk group were found to have more infiltration of anti-tumor immune cells and might show a favorable response to immunotherapy. Finally, the result of qPCR revealed that most IRlncRNAs were differently expressed between normal and tumor cell lines. Conclusion: The constructed 9-IRlncRNA signature has potential to predict the prognosis of CRC patients and may be helpful to guide personalized immunotherapy.


Introduction
Colorectal cancer (CRC) is the third most common malignancy worldwide, with almost 1.8 million new cases and approximately 8.6 million deaths in 2018 [1]. Although the diagnosis and treatment of CRC have improved significantly in the past decade, the prognosis is still poor for patients with newly diagnosed CRC presented with distant metastasis, especially liver metastasis [2]. Meanwhile, as a heterogeneous disease, the diversity of phenotypes and prognosis of CRC present a huge challenge in making individualized clinical decisions to improve the survival rate of patients [3]. Therefore, it is urgent to establish an effective risk assessment model to identify patient subgroups with different prognoses, which may also help to find potential therapeutic targets.
In recent years, genomic approaches have been applied to investigate the underlying mechanism of cancer development and explore molecular biomarkers for cancer diagnosis [4,5]. Besides the well-recognized

Construction and validation of the prognostic signature based on IRlncRNAs
The TCGA cohort was used for training the model. To construct the prognostic signature, 873 IRlncRNAs were intersected with the genes from GSE17536 and GSE38832, and only the IRlncRNAs contained in all cohorts were involved in the subsequent study. The expression level of these common IRlncRNAs in three cohorts was normalized using R package "sva". Then, the least absolute shrinkage and selection operator (LASSO) Cox proportional hazard regression with 10-fold cross-validation was applied to establish IRlncRNAs signature model for the prediction of colorectal cancer prognosis. 9-IRlncRNA signature was constructed according to the optimal λ value, and the risk score for each patient was calculated by the following algorithm: risk score = (β1 × expression of lncRNA1) + (β2 × expression of lncRNA2) + . . . + (βn × expression of lncRNAn). All patients were divided into a high-risk group and a low-risk group according to the cut-off value (median risk score). To validate the 9-IRlncRNA signature, the risk score was also calculated in two testing datasets (GSE17536 and GSE38832) and the patients were classified into high-and low-risk groups based on the same cut-off value.

Establishment and evaluation of predictive nomogram
The nomogram containing the 9-IRlncRNA signature and other independent prognostic indicators was plotted using the "rms" package of R software. The total score of each patient could be calculated via the nomogram, and then was used to predict the OS rate of 1-, 3-or 5-year. The accuracy of the nomogram was evaluated using the calibration curves and the time-dependent receiver operating characteristic (ROC) curves. Decision curve analysis (DCA) was used to compare the reliability of the nomogram with that of age, M stage, or risk group.

Calculation of tumor mutation burden (TMB)
The somatic mutation data of CRC samples were detected using VarScan. To calculate the TMB score of each sample, all base substitutions and indels in the coding region of targeted genes were counted, and silent mutations failing to lead to an amino acid change were not counted. Then, the total number of mutations counted was divided by the exome size (approximate 38 megabases) [19].

Cell culture
Human colorectal cancer cell lines (HCT116 and SW480) were purchased from the American Type Culture Collection (Manassas, VA, U.S.A.). Human colon epithelial cell line NCM460 was purchased from the Cell Bank of Type Culture Collection of Chinese Academy of Sciences (Shanghai, China). Dulbecco's modified Eagle medium (DMEM) containing 10% fetal bovine serum (FBS, Gibco, U.S.A.) and 1% penicillin-streptomycin was used for the cultivation of HCT116 and SW480 cells, and Roswell Park Memorial Institute (RPMI-1640) medium containing 10% FBS and 1% penicillin-streptomycin was used for NCM460. All cell lines were allowed to grow in a 37 • C incubator containing 5% CO 2 .

RNA extraction and quantitative real-time PCR
Total RNA was extracted using Cell Total RNA Isolation kit (Foregene, Chengdu, China) following the manufacturer's protocol, and RNA (1 μg) was reverse-transcribed to cDNA using the PrimeScript RT reagent kit (TaKaRa, Osaka, Japan). Quantitative real-time PCR (qPCR) was conducted using the SYBR Green qPCR Supermixes (Bio-Rad) on the CFX 192 Connect Real-Time PCR system (Bio-Rad, U.S.A.). The qPCR analysis was performed in triplicate with the primers shown in Table 2. The relative expression levels were normalized to GAPDH using the 2 − CT method.

Statistical analysis
Statistical analyses were conducted with R software (version 4.0.1) and GraphPad Prism 5.0 software (San Diego, CA, U.S.A.). Kaplan-Meier curve was employed to reflect the survival difference between high-and low-risk groups, which was assessed by log-rank test. Univariate and multivariate Cox proportional hazard regression models were used to analyze the prognostic significance of 9-IRlncRNA signature. The performance of the 9-IRlncRNA prognostic model was evaluated by area under curve (AUC) value of ROC curve and Harrell's concordance index (C-index). Two tailed Student's t test and paired t test were utilized to compare the statistical relevance between two groups. Quantitative data are shown as the mean + − standard deviation (SD). A two-tailed P-value < 0.05 was regarded as statistically significant.

Construction of the prognostic IRlncRNAs signature
To make the procedure of our study clearer, a detailed flowchart is illustrated in Figure 1. TCGA dataset was employed to set as training cohort, and a total of 161 IRlncRNAs were common among all datasets. Then, we conducted univariate Cox regression analysis to explore the prognosis-related IRlncRNAs and then 18 IRlncRNAs were identified for subsequent analysis ( Figure 2A). Next, LASSO penalized Cox regression was used to establish prognostic IRlncRNAs signature, and 9 of the 18 IRlncRNAs (PCED1B-AS1, VPS9D1-AS1, PCAT6, BOLA3-AS1, ZNF503-AS2, ZEB1-AS1, LINC01138, WAC-AS1, and COLCA1) were singled out in training dataset ( Figure 2B,C). Risk score of each patient was calculated according to the expression of 9 IRlncRNAs and their coefficients:

Analysis of the 9-IRlncRNA signature in the training cohort
Based on the median risk score, we classified the patients with CRC in TCGA cohort into 215 high-risk and 216 low-risk groups. As the risk score increased, both the expression of 9 IRlncRNAs and the mortality of CRC patients were elevated ( Figure 3A). Similarly, Kaplan-Meier curve and log-rank test demonstrated that CRC patients with high-risk scores showed a worse OS than those with low-risk scores [hazard ratio (HR) = 3.187, 95% confidence interval (CI): 1.993-5.097, P<0.001] ( Figure 3B). The time-dependent ROC curves showed the AUC values of 1-and 5-year OS prediction were 0.727 and 0.779, respectively, which indicated a favorable predictive value of the 9-IRlncRNA signature for the prognosis of CRC patients in the training cohort ( Figure 3C).

Validation of the 9-IRlncRNA signature in the testing cohort
To further validate the robustness of the 9-IRlncRNA signature in survival prediction in different cohorts, GSE17536 and GSE38832 datasets from the GEO database were set as external testing cohorts. The same cut-off value from the training cohort was used to divide the patients into high-and low-risk groups. As shown in Figure 4A,B, the higher the risk score, the more CRC patients dead. In GSE17536 dataset, we found that the OS and DSS of patients in high-risk group were obviously lower than those in low-risk group (HR

The 9-IRlncRNA signature acts as an independent prognostic factor in CRC patients
To determine that the 9-IRlncRNA signature can be an independent prognostic indicator, we employed univariate and multivariate Cox regression analyses to compare the prognostic value of the 9-IRlncRNA signature with other clinical features.  Figure 5A). Similarly, in testing cohort (GSE17536), 9-IRlncRNA signature risk score (HR = 2.236, 95% CI: 1.362-3.671, P=0.001) was also considered to be an independent prognostic indicator for OS ( Figure 5B).

Stratification analysis of the 9-IRlncRNA signature
We employed stratification analysis to confirm the prognostic value of the 9-IRlncRNA signature. Patients were stratified into different subgroups based on age (≤ 65 versus > 65 years), T stage (T 1-2 versus T 3-4 ), N stage (N 0 versus N 1-2 ), M stage (M 0 versus M 1 ), and AJCC stage (stage I-II versus stage III-IV), and were classified into high-and low-risk groups in each subgroup according to the median risk score. Interestingly, the 9-IRlncRNA signature was effective As several multi-gene signatures have been previously constructed to predict the prognosis of CRC, we evaluated their performance in parallel with our 9-IRlncRNA signature using time-dependent ROC curves and C-indexes.

Construction of predictive nomogram
To provide a clinical tool to predict the probability of 1-, 3-, and 5-year OS in patients with CRC, three independent prognostic factors including age, M stage and the risk score of 9-IRlncRNA signature were employed to construct a nomogram ( Figure 8A). Their respective point which indicated on the top scale was added up to a total point which was corresponding to the 1-, 3-, and 5-year survival rates in the below scale. Calibration plots indicated that the nomogram predicted short-term survival (1-and 3-year) better than long-term survival (5-year) ( Figure 8B). Moreover, DCA curves showed that the nomogram achieved the highest net benefit among the four factors examined (age, M stage, risk model, and nomogram) ( Figure 8C). Besides, the AUC values of the nomogram at 1-, 3-, and 5-year were 0.751, 0.763, and 0.824, respectively, which also displayed the most excellent predictive performance ( Figure 8D).

Analysis of functional enrichment based on 9-IRlncRNA signature
Compared with all genes and all IRlncRNAs, the 9-IRlncRNA signature could completely distinguish high-risk patients from low-risk patients, which indicated good specificity ( Figure 9A-C). To further investigate the biological process and signaling pathways involved in the 9-IRlncRNA signature, we employed GSEA to explore the pathways that were significantly altered between high-and low-risk groups. The results showed that several canonical pathways including the mTOR signaling pathway, WNT/β-catenin signaling pathway, Notch signaling pathway, and the TGF-β downstream pathway were highly enriched in the high-risk group (Figure 9D-F).

Assessment of immune infiltration and immunotherapy-related markers with 9-IRlncRNA signature, and detection of 9 IRlncRNAs expression levels in CRC
We further investigated the differences in immune infiltration between high-and low-risk CRC patients. Based on quanTIseq and CIBERSORT, the results indicated that the infiltration of CD4+ T cells (non-regulatory), CD4+ memory activated T cells and M1 macrophages were higher in the low-risk group, and the content of M2 macrophages was higher in the high-risk group (Figure 10A-D). Moreover, we employed the TIDE algorithm to predict the possibility of response to immunotherapy. Interestingly, we found that low-risk group had a lower tumor immune exclusion score than high-risk group, and the microsatellite instability (MSI) score of the low-risk group was significantly higher   than that of the high-risk group ( Figure 10E,F). Besides, patients in low-risk group also had a higher TMB ( Figure  10G).

Discussion
Recently, lncRNAs have been proved to play critical roles in the development and progression of many cancers [23]. In CRC, emerging evidence has indicated that lncRNAs act mostly as signaling molecules in many significant CRC-related pathways, and are frequently involved in different phases of CRC from precancerous lesions to distant metastasis [24]. A previous study has found that lncRNAs can also regulate cancer immunity, and concluded that these IRlncRNAs are a new but essential part of cancer immunotherapy and prognosis [25]. In the present study, we constructed a 9-IRlncRNAs signature that can successfully divide CRC patients into high-and low-risk groups. Meanwhile, through multivariate Cox regression analysis, the 9-IRlncRNA signature was proved to be an independent OS prognostic factor. In subsequent subgroup analysis, this prognostic signature also showed favorable stability in different subgroups. Besides, we noted that patients in high-risk group were involved in cancer-related signaling pathways such as MAPK, WNT/β-catenin, and Notch pathway, which might result in their short OS.
Through our analysis, all the nine IRlncRNAs (PCED1B-AS1, VPS9D1-AS1, PCAT6, BOLA3-AS1, ZNF503-AS2, ZEB1-AS1, LINC01138, WAC-AS1, and COLCA1) were associated with a dismal prognosis, which was consistent with their biological functions reported by previous studies. LncRNA PCED1B-AS1 was reported to promote the proliferation and inhibit the apoptosis of glioma cells via miR-194-5p/PCED1B axis [26]. However, in the present study, we found that there was no significant difference in the expression level of PCED1B-AS1 between CRC cell lines and normal colon epithelial cell lines. In non-small cell lung cancer (NSCLC), the expression of VPS9D1-AS1 was higher in NSCLC tissues than in paired adjacent tissues, and the high expression of VPS9D1-AS1 suggested an adverse prognosis [27]. Similarly, PCAT6 can bind with EZH2 which can bind to the promoter region of LATS2 and inhibit LATS2 expression in NSCLC. LATS2 overexpression can suppress cell proliferation and promote apoptosis. Therefore, lncRNA PCAT6 exerts an oncogenic function on NSCLC [28]. LncRNA ZEB1-AS1 is a well-recognized tumor-related lncRNAs and is overexpressed in several malignancies. In CRC, ZEB1-AS1 overexpression is significantly related to tumor invasion and distant metastasis, which indicates a poor OS and low recurrence-free survival rate [29]. Moreover, LINC01138 has been reported as a tumor promoter that can exert its biological functions via the tumor-related IGF2BP1/IGF2BP3-LINC01138-PRMT5 axis in hepatocellular carcinoma (HCC), which can serve as a robust biomarker and therapeutic target for HCC [30]. Recent study has reported that lncRNA WAC-AS1 is highly expressed in liver cancer tissues and cell lines, and verified that WAC-AS1 can regulate ARPP19 by sponging miR-320d to promote glycolysis and tumor proliferation [31]. Although the function of other lncRNAs remains unknown in carcinoma, their expression levels are up-regulated in CRC cell lines according to our findings, which may provide the foundation for the further exploration of the association between these IRlncRNAs and tumorigenesis. Moreover, the identification of additional targets of these nine IRlncRNAs is also a critical step to further explore their function. With the development of RNA-centric approaches, such as isolation of chromatin by RNA purification [32] and captured hybridization analysis of RNA targets [33], we will be able to identify the potential interaction targets of lncRNAs in a native context, which may deepen our understanding of lncRNA-mediated regulation of immune pathways and improve our insight in lncRNA functions.
Tumor-infiltrating immune cells are reported to be associated with tumor prognosis and have the ability to guide therapeutics [34]. In our study, patients in low-risk group had more infiltration of CD4+ T cells (non-regulatory), CD4+ memory activated T cells, and M1 macrophages. CD4+CD25+ regulatory T cells are proved to suppress antitumor response and result in tumor immune escape, while non-regulatory CD4+ helper T cells may be beneficial to the host defense against tumor [35]. CD4+ memory T cells were located in the secondary lymphoid node organs and tissues. When re-exposure to tumor antigen, CD4+ memory T cells undergo fast expansion and induce more effective and faster immune response against tumor antigen and may prevent tumor relapse [36,37]. Similarity, M1 macrophages were reported to lead to the promotion of inflammation and tumor suppression [38]. Therefore, the favorable prognosis of patients in low-risk group may be a result of the activation of various anti-tumor immune cells, and the accumulation of these immune cells may allow patients to benefit from immunotherapy. In addition to immune cell infiltration, we also found that patients in low-risk group had higher MSI and TMB scores. In CRC, immune checkpoint therapy received regulatory approval in 2017 to treat heavily mutated tumors that are mismatch-repair-deficient or harbor high levels of MSI [12]. Similarity, tumors with a higher TMB have a higher likelihood of immunotherapy response [39]. Therefore, these results further confirm that patients in low-risk group may show a favorable response to immunotherapy, and the 9-IRlncRNA signature we constructed may serve as a novel biomarker for the immunotherapy of CRC patients.
Previous studies have also constructed various lncRNA signatures to predict the prognosis of patients with CRC, but the 9-IRlncRNA we established shows more excellent performance via higher C-index and AUC values of OS [40][41][42][43]. Nevertheless, several limitations of this study should be addressed. First, the prognostic model is constructed using retrospective data, thus, the results should be further validated using prospective data or clinical trials. Second, due to the sample size was not large enough in the validation datasets, the accuracy of the prognostic model has decreased.

Conclusion
In conclusion, we constructed a 9-IRlncRNA prognostic model to predict the prognosis of CRC patients based on the TCGA dataset. The prognostic value of this model was further validated in two external cohorts from the GEO database. Moreover, the signature was identified as an independent prognostic factor of CRC and was involved in several cancer-related signaling pathways. Furthermore, the signature was associated with tumor-infiltrating immune cells and immunotherapy-related markers. Hence, the 9-IRlncRNA signature can serve as a robust prognostic biomarker for CRC patients and may be helpful to guide personalized immunotherapy.