An epithelial–mesenchymal transition-related long noncoding RNA signature correlates with the prognosis and progression in patients with bladder cancer

Abstract Bladder cancer is a common malignant tumour worldwide. Epithelial–mesenchymal transition (EMT)-related biomarkers can be used for early diagnosis and prognosis of cancer patients. To explore, accurate prediction models are essential to the diagnosis and treatment for bladder cancer. In the present study, an EMT-related long noncoding RNA (lncRNA) model was developed to predict the prognosis of patients with bladder cancer. Firstly, the EMT-related lncRNAs were identified by Pearson correlation analysis, and a prognostic EMT-related lncRNA signature was constructed through univariate and multivariate Cox regression analyses. Then, the diagnostic efficacy and the clinically predictive capacity of the signature were assessed. Finally, Gene set enrichment analysis (GSEA) and functional enrichment analysis were carried out with bioinformatics. An EMT-related lncRNA signature consisting of TTC28-AS1, LINC02446, AL662844.4, AC105942.1, AL049840.3, SNHG26, USP30-AS1, PSMB8-AS1, AL031775.1, AC073534.1, U62317.2, C5orf56, AJ271736.1, and AL139385.1 was constructed. The diagnostic efficacy of the signature was evaluated by the time-dependent receiver-operating characteristic (ROC) curves, in which all the values of the area under the ROC (AUC) were more than 0.73. A nomogram established by integrating clinical variables and the risk score confirmed that the signature had a good clinically predict capacity. GSEA analysis revealed that some cancer-related and EMT-related pathways were enriched in high-risk groups, while immune-related pathways were enriched in low-risk groups. Functional enrichment analysis showed that EMT was associated with abundant GO terms or signaling pathways. In short, our research showed that the 14 EMT-related lncRNA signature may predict the prognosis and progression of patients with bladder cancer.


Introduction
Bladder cancer is the most common neoplasm of urological system, and the muscle invasive bladder cancer (MIBC) makes up to approximately 15-25% of new cases. Even the nonmuscle invasive bladder cancer (NMIBC) has approximately 10% probability of progression to MIBC [1,2]. Radical cystectomy is the typical treatment for MIBC; however, approximately 50% of patients have local recurrence and metastasis [3,4]. Accurate diagnosis and detailed classification are crucial to the clinical treatment for bladder cancer [5]. To explore, biomarkers to predict the survival and progression of patients with bladder cancer will be helpful to the diagnosis and treatment.
Epithelial-mesenchymal transition (EMT) is a phenomenon of epithelial cells transforming into mesenchymal cells, which is a process of cell dedifferentiation or redifferentiation [6]. Through the EMT process, the migration and motility of cancer cells are enhanced, contributing to the invasion and metastasis. Epithelial tumours account for more than 95% of the pathological types of bladder cancer [7], so EMT may affect the pathogenesis and progression of bladder cancer. In addition, EMT can also mediate the activation of multiple transcription factors, participate in the repair of cell DNA damage, and promote chemotherapy resistance of tumour cells by enhancing tumour stemness [8]. Hence, it is critical to discover EMT-related biomarkers that can serve as early diagnostic and prognostic biomarkers for patients with bladder cancer.
Bioinformatics can help identify potential biomarkers of prognosis and progression, and in this way to predict survival outcomes in patients with cancer [9,10]. Long noncoding RNAs (lncRNAs) constitute a class of noncoding RNA molecules that regulate the growth of cancer cells and progression [11], and they are potential biomarkers to predict cancer risk and survival outcomes [12]. Therefore, we postulated that EMT-related lncRNAs may be diagnostic and monitoring biomarkers for patients with bladder cancer. In this research, we constructed a prognosis and progression signature based on 14 EMT-related lncRNAs and evaluated its ability to independently and accurately predict the prognosis and progression of patients with bladder cancer.

Data acquisition
The RNAseq expression information and corresponding clinical profiles of patients with bladder cancer were obtained from TCGA database (https://portal.gdc.cancer.gov/). A total of 402 bladder cancer cases were randomly divided into training set (n=201) and validation set (n=201) by R software using the simple random grouping method (Supplementary File S1). The prediction signature was constructed with training set and verified with the validation set and the whole set (n=402). The 200 EMT-related genes (Supplementary File S2) were obtained from gene set 'Hallmark Epithelial-mesenchymal transition' in The Molecular Signatures Database (broadinstitute.org/gsea/msigdb).

Construction of the EMT-related lncRNAs prognostic signature
The Pearson correlation between lncRNAs and EMT-related genes was analyzed. A lncRNA with a correlation coefficient |R 2 |>0.4 and P<0.05 was considered to be an EMT-related lncRNA (Supplementary File S3). The univariate Cox analysis was used to identify EMT-related lncRNAs whose expression levels were significantly associated (P<0.05) with the overall survival (OS) of the patient with bladder cancer. The hazard ratios (HRs) were used to identify risk-related lncRNAs (HR>1) and protective lncRNAs (HR<1). Subsequently, the candidate 38 EMT-related lncRNAs were subjected to multivariate Cox regression analysis to evaluate their contribution as independent prognosis factors in patient survival. Thus, we identified 14 target EMT-related lncRNAs as candidates for the prognosis signature.

Evaluation of the EMT-related lncRNAs prognostic signature
The risk scores were calculated by the following formula: where Coef (i) and x(i) represent the estimated regression coefficient and the value of each EMT-related lncRNA expression, respectively. The patients with bladder cancer were grouped into high-and low-risk group according to the median risk score. The OS of patients between the high-risk and low-risk group was compared by the survival curve. The diagnostic efficacy and clinicopathological characteristic of the 14 EMT-related lncRNAs signature were evaluated by the receiver-operating characteristic (ROC) curves. Furthermore, the efficiency of the risk score of our signature to independently predict the survival was assessed by univariate and multivariate Cox regression analyses.

Establishment the nomogram
We constructed a nomogram by integrating traditional clinical variables and the risk score to analyze the probable 3and 5-year OS of patients with bladder cancer. Calibration plots and time-dependent ROC curves were analyzed to assess the nomogram.

Gene set enrichment analysis (GSEA) and functional enrichment analysis
Gene set enrichment analysis (GSEA) was used to analyze significant functional phenotypes in the high-risk group and low-risk group. KEGG gene sets (c2.cp.kegg.v7.2.symbols.gmt) was obtained from the Molecular Signatures Database. After performing 1000 permutation, the enriched gene sets were obtained based on a standard of false discovery rate (FDR) value <0. 25 and P<0.05. Functional enrichment analysis of GO and KEGG of the mRNA associated with the lncRNA signature were completed using R software (https://www.r-project.org/, version 3.6.2).

Statistical analysis
The data were processed using the PERL programming language (http://www.perl.org/, Version 5.30.2). Statistical analyses were performed using the GraphPad Prism 8.0 software or the R software in a double-blind manner. P<0.05 was regarded as statistically significant.

The prognostic impact of 14 EMT-related lncRNA signature
Among the training set, the validation set, and the whole set, the patients with bladder cancer were divided into two groups according to the risk score ( Figure 1A). The OS period was longer in the low-risk group than that of the high-risk group ( Figure 1B). The diagnostic efficacy of the 14 EMT-related lncRNAs signature was evaluated by the time-dependent ROC curves, in which all the values of the area under the ROC (AUC) were more than 0.73 ( Figure  1C). The results above suggested that the 14 EMT-related lncRNAs signature had an excellent capacity of predicting survival period in bladder cancer.

The 14 EMT-related lncRNA signature was an independent factor
The prognostic ability of the 14 EMT-related lncRNA signature in bladder cancer was assessed by univariate and multivariate Cox regression analysis. The results showed that the risk score based on our signature was significantly related to the OS of the bladder cancer patients (P<0.001) ( Figure 3A,B). The ROC curve analysis indicated that the AUC value of the risk score based on our signature was 0.759, which was higher than that of other clinicopathological parameters ( Figure 3C). The above results suggested that our EMT-related lncRNA signature was an independent prognostic factor to predict the survival period of patients with bladder cancer.

Stratification analyses
We performed the stratified analysis of patients with bladder cancer based on clinicopathological information. Compared with the low-risk group, the patients in the high-risk group had shorter OS period in several stratums, such as age > 65 years (P<0 However, the OS rates between the high-risk and low-risk groups were similar for M1 patients (P=0.455; Figure 4), probably because of the smaller sample size. The result showed that our EMT-related lncRNA signature was powerful to predict the survival period of bladder cancer patients in different gradation of age, gender, and AJCC stage.

Establishment of the nomogram
The nomogram can be used to diagnose or predict the onset or progression of a disease [13]. We established a nomogram according to the risk score and other clinicopathological parameters including age, gender, AJCC stage, T stage, and N stage ( Figure 5A). The calibration plots worked better than the reference line to predict the 3-and 5-year OS ( Figure 5B,C). The AUC of the nomogram at 3-and 5-year were 0.799 and 0.798, respectively, in the ROC curves ( Figure 5D).

Gene set enrichment analysis
GSEA results indicated that EMT and cancer-related pathways (renal cell carcinoma, glioma, endometrial cancer, melanoma, focal adhesion, Wnt signaling pathway, and TGF-β signaling pathway) were significantly enriched in the high-risk bladder cancer group ( Figure 6A,C). While the immune-related pathways (antigen processing and presenta- tion, primary immunodeficiency, graft versus host disease, autoimmune thyroid disease) were significantly enriched in the low-risk bladder cancer group ( Figure 6B,D). These results suggested that a high prognostic signature risk score correlates with EMT and cancer, whereas low prognostic signature risk score correlates with enhanced immune function.

Construction of the lncRNA-mRNA co-expression network and functional enrichment analysis
The potential functions of the 14 EMT-related lncRNAs in BLCA were investigated by constructing the lncRNA-mRNA co-expression network using Cytoscape. The lncRNA-mRNA co-expression network contained 82 lncRNA-mRNA pairs based on the threshold parameters (Pearson correlation coefficient |R| > 0.4 and P<0.05)

Figure 4. Stratification analyses
Survival curve analysis showed the OS rates of the high-and low-risk bladder cancer patients stratified by age, gender, and AJCC stages.
( Figure 7A). Among the 82 lncRNA-mRNA pairs, 67 mRNAs were significantly correlated with the 14 lncRNAs in the prognostic signature. The Sankey diagram showed the relationship between the 67 mRNAs and 14 lncRNAs ( Figure  7B). The top three GO terms for the biological processes were extracellular matrix organization, positive regulation of cell adhesion, and extracellular structure organization. The top three GO terms for the cellular components were collagen-containing extracellular matrix, basement membrane, and cell leading edge. The top three GO terms for molecular functions were extracellular matrix structural constituent, integrin binding, and cell adhesion molecule binding ( Figure 7C,D). KEGG pathway analysis confirmed that cytokine-cytokine receptor interaction, focal adhesion, and proteoglycans in cancer were the most significant enriched pathways ( Figure 7E,F).

Discussion
Bladder cancer is sensitive to neither radiotherapy nor chemotherapy, and is prone to resistance during the treatment.
Thanks to the improvement of surgical methods and medical therapy, great progress has been made in the treatment of bladder cancer. However, the relapse and metastasis after operation has not been solved, and the 10-year survival rate is still unsatisfied [14,15]. There are three types of EMT (type 1, 2, and 3) which exert different functions. Type 1 occurs in the embryonic stage and is related to the growth and development of the embryo [16]. Type 2 participates in wound healing, tissue regeneration and organ fibrosis [17]. Type 3 plays a role in tumour invasion and metastasis, which is often related to the poor prognosis of tumour [18]. The reason why EMT is associated with poor prognosis is that EMT enables cancer cells to acquire mobility, invasiveness, and stem cell-like properties. Therefore, EMT-related biomarkers are potential prognosis and progression biomarkers for patients with cancer. Previous studies on the role of specific EMT-related genes in bladder cancer progression found that [19] the EMT signaling pathway was markedly activated during the subtype transition from the nonmuscle invasive bladder cancer (NMIBC) to the muscle invasive bladder cancer (MIBC). Moreover, they developed a tumour-promoting EMT-related gene signature to act as a negative independent prognostic factor.
In the human genome, only less than 1.5% of the DNA sequences can encode proteins, and the remaining approximately 98.5% of the sequences do not encode proteins, which is called noncoding sequences [20]. The higher the degree of evolution, the larger the proportion of noncoding sequences in the genome. This feature suggests that these noncoding sequences may participate in the extremely complex process of life regulation. Among these noncoding sequences, long noncoding RNA (lncRNA) can regulate gene expression and protein activity by affecting chromatin modification and RNA splicing, thus occupying an important position [21,22]. The role of lncRNA in the occurrence, development, prognosis, and chemotherapy resistance of malignant tumours has become a research hotspot. Recent studies reported that an autophagy-related lncRNA signature accurately predicted the prognosis of patients with bladder cancer [23], and an immune-related lncRNA signature could predict survival in hepatocellular carcinoma [24]. With the continuous indepth study on the molecular mechanism of lncRNA in tumour cells, lncRNA is going to provide important information for the diagnosis, targeted therapy and recurrence monitoring of clinical tumour.
In this research, we systematically analyzed the accuracy of EMT-related lncRNAs on prognostic prediction in bladder cancer using bioinformatics and statistical tools. Firstly, we identified 38 EMT-related lncRNAs that were significantly correlated with OS after the univariate Cox regression analysis of the expression of EMT-related lncRNAs in the bladder cancer patient from the TCGA database. Secondly, 14 EMT-related lncRNAs were selected to construct a prognostic signature based on their performance in the multivariate Cox regression analysis. Then, the risk score of each patient with bladder cancer was calculated according to the expression of the 14 EMT-related lncRNAs in the prognostic signature, and the patient was divided into the high-risk or low-risk group based on his/her median risk score. We found that the OS of the patients with bladder cancer was shorter in the high-risk group than that in the low-risk group. What's more, the ROC curve analysis validated the accuracy of the EMT-related lncRNA prognostic signature in the patients with bladder cancer.
The EMT-related lncRNA prognostic signature is more reliable than the other traditional clinical indicators in prognostic prediction. The nomogram is an effective and reliable clinical tool to predict survival of patients with cancer [25]. Therefore, we developed a robust nomogram consisting of age, gender, AJCC stage, T stage, N stage, and the risk scores (calculated from the EMT-related lncRNA prognostic signature) to improve the prognostic prediction of the patients with bladder cancer. Calibration plots demonstrated that the nomogram predicted 3-and 5-year survival rates was consistent with the actual one. Overall, the 14 EMT-related lncRNA prognostic signature accurately predicted the OS of patients with bladder cancer and exhibited great potential for clinical applications, including individualized prognosis and therapy.
GSEA analysis revealed significant differences in EMT-related signaling pathways between the high-risk group and low-risk group. The immunoregulatory pathways were enriched in the low-risk group, while some cancer-and EMT-related pathways (TGF-β signaling pathway, Wnt signaling pathway) were enriched in the high-risk group. As a multifunctional cytokine, TGF-β participates in numerous cell biological processes and acts on tumour cells by autocrine and paracrine. TGF-β family is closely related to EMT [26]. In tumour progression, TGF-β can activate a series of signaling pathways, including Notch [27], Wnt [28], and Smad [29] signaling pathways, and the cross-talk of these signaling pathways play a synergistic role to start the EMT process. Wnt/β-Catenin pathway is known as a classic Wnt pathway. β-Catenin can combine with the intracellular domain of E-cadherin to form a complex, which can be connected with actin skeleton to mediate cell adhesion and regulate the invasion and metastasis of tumour cells [30,31]. In addition, there is a nonclassical Wnt pathway, in which the Wnt proteins (Wnt5a, Wnt11, and so on) do not produce Wnt signal through the accumulation of β-Catenin in the nucleus. In the nonclassical Wnt pathway, through the action of calmodulin dependent kinase II and the phosphorylation calmodulin sensitive protein and T nuclear factor NF-AT, intracellular Ca 2+ is increased and protein kinase C is activated to regulate cell adhesion [32]. Moreover, a lncRNA-mRNA co-expression network was established in the present study. Functional enrichment analysis of GO and KEGG showed that EMT was associated with abundant GO terms or signaling pathways. These results support the recognition that EMT is a key regulator in bladder cancer progression.
There are several disadvantages or limitations in our study. Firstly, the bias of the profile analysed in the study couldn't be ignorant considering the data acquired from public resource. Secondly, we did not perform subtypes of BLCA analysis associated with EMT due to lack of relevant data of subtypes of BLCA in the database. Finally, further investigations involving biochemical experiments such as quantitative real-time PCR, transwell migration/invasion assay and clinical data analyses are required to further confirm our findings.
In conclusion, we first discovered an EMT-related lncRNA prognostic signature which accurately predicted the survival outcomes of patients with bladder cancer. The validity of the prognostic nomogram established by combining the EMT-related lncRNA prognostic signature and other clinicopathological features for accurately predicting the survival period was confirmed. A high prognostic signature risk score correlates with EMT and cancer, whereas low prognostic signature risk score correlates with enhanced immune function. These data provided valuable insights for future investigations into potential individualized treatments for bladder cancer patients belonging to different risk groups.