Explore prognostic biomarker of bladder cancer based on competing endogenous network

Abstract Bladder cancer (BC) is the most common tumor of the urinary tract. Increasing evidence showed that long non-coding RNA (lncRNA) is a critical regulator in cancer development and progression. However, the functions of lncRNAs in the development of BC remain mostly undefined. In the present study, based on RNA sequence profiles from The Cancer Genome Atlas database, we identified 723 lncRNAs, 157 miRNAs, and 1816 mRNAs aberrantly expressed in BC tissues. A competing endogenous RNA network, including 49 lncRNAs, 17 miRNAs, and 36 mRNAs, was then established. The functional enrichment analyses showed that the mRNAs in the ceRNA network mainly participated in ‘regulation of transcription’ and ‘pathways in cancer’. Moreover, the Cox regression analyses demonstrated that three lncRNAs (AC112721.1, TMPRSS11GP, and ADAMTS9-AS1) could serve as independent risk factors. We established a risk prediction model with these lncRNAs. Kaplan–Meier curve analysis showed that high-risk patients’ prognosis was lower than that of low-risk patients (P=0.001). The present study provides novel insights into the lncRNA-mediated ceRNA network and the potential of lncRNAs to be candidate prognostic biomarkers in BC, which could help better understand the pathological changes and pathogenesis of BC and be useful for clinical studies in the future.


Introduction
Bladder cancer (BC) is the most common neoplasm in the urinary system, which has a high recurrence and mortality rate due to occult onset and lack of effective detection methods [1,2]. Even with advanced medical technology, the therapeutic effect and prognosis of patients with BC are still less than desirable [3,4]. Thus, it is necessary to identify novel candidate prognostic biomarkers and therapeutic targets for treating BC effectively.
Long non-coding RNA (lncRNA) is a subtype of ncRNA, which is not involved in protein-coding directly [5,6]. Increasing evidence has confirmed that lncRNAs play significant roles in regulating chromatin organization, transcription, nuclear domains, and messenger RNA (mRNA) stability and translation [7,8]. It has been reported that lncRNAs are eligible to be diagnostic or prognostic biomarkers of multiple types of cancer, including pancreatic cancer [9], renal cancer [10], and gastric cancer [11]. Even now, certain lncRNAs associated with the prognosis of BC have come under observation [12][13][14], valid lncRNAs serving as predictive biomarkers of this disease remain tiny.
In 2011, the competing endogenous RNA (ceRNA) hypothesis was put forward for the first time in the paper published in Cell by Salmena [15]. This hypothesis provided a novel 'RNA language' in which microRNA (miRNA) response elements (MREs) were appointed as the letters to make RNAs 'talk' with each other [15]. LncRNAs containing the same MREs as mRNA may function as ceRNAs and regulate mRNA translation by competing for shared miRNAs [16,17]. For example, a study has demonstrated that lncRNA MATN1-AS1 promotes glioma progression through sponging miR-200b/c/429 to mediate CHD1 expression [18]. Collectively, these studies indicate that abnormal expression of lncRNAs in the ceRNA In the present research, we downloaded significant RNA sequence profiles and relevant clinical data from The Cancer Genome Atlas (TCGA) database (https://portal.gdc.cancer.gov/). After screening and comparing, we constructed an lncRNA-mediated ceRNA network of BC. The functional enrichment analysis and survival analysis were then carried out to investigate the potential mechanism and prognostic biomarkers of BC. Subsequently, via a comprehensive bioinformatic analysis, we identified a novel three-lncRNA signature that might be useful as a potential independent prognostic factor for BC. A flowchart was provided as a general description of how the study was carried out ( Figure  1).

Materials and methods
Data acquisition from TCGA database RNA sequence profiles and clinical data of BC cases were downloaded from the TCGA database. The Genomic Data Commons Transfer Tool was used to obtain the gene expression profiles. The inclusion criteria were: (i) expression data available, (ii) overall survival time for more than 30 days. No approval is needed from the ethics committee because all the information required from the TCGA database is open access and public.

Identification of differentially expressed RNAs
The acquired RNA data matrices were pre-processed and normalized. Then, the 'edgeR' package in R (version 3.6.1) was used for differential expression analysis between the BC and adjacent-normal bladder tissues. False discovery rate (FDR) < 0.01 and log fold change (log 2 FC) > 2 or < -2 were considered statistically significant. The heat maps and volcano plots of differentially expressed RNAs (DE RNAs) were visualized using the 'ggplot2' and 'pheatmap' packages in R.

Construction of a ceRNA network
Based on the ceRNA hypothesis, a ceRNA network, including lncRNA, miRNA, and mRNA, was constructed. Briefly, the interaction pairs of DE lncRNA with DE miRNA were predicted using the miRcode online software (http://www. mircode.org/). Then, the mRNAs targeted by miRNAs were predicted according to miRTarBase (http://mirtarbase. mbc.nctu.edu.tw//), TargetScan (http://www.targetscan.org//), and miRDB (http://www.mirdb.org/). The obtained mRNAs were used to intersect with the DE mRNAs to identify final targeted mRNAs. Finally, the ceRNA network was constructed relying on the co-expression network of interactions among DE lncRNAs, DE miRNAs and DE mRNAs, and was visualized using Cytoscape (version 3.7.2).

Functional enrichment analysis
The Gene Ontology (GO) is a freely available bioinformatics resource and widely used to offer information about gene product function [19]. Similarly, the Kyoto Encyclopedia of Genes and Genomes (KEGG) is a database resource that integrates genomic, chemical, and system functional information [20]. In the ceRNA network, lncRNAs regulate the activity of mRNAs indirectly through sponging miRNA. Therefore, we analyze the functional enrichment to explore the function of DE lncRNA in the network using the Database for Annotation, Visualization and Integrated Discovery (https://david.ncifcrf.gov/). GO and KEGG terms with P<0.05 were considered significant. The results were plotted using the 'ggplot2' R package.

Construction of a risk score model and survival analysis
The univariate and multivariate Cox regression analyses were used to determine the independent prognostic biomarkers for BC. Kaplan-Meier curves visualized the relationship between the predictive lncRNAs and the overall survival of patients with BC. Besides, the risk scores were calculated using the equation below.
Risk score = Exp lncRNA1 × β lncRNA1 + Exp lncRNA2 × β lncRNA2 + · · · Exp lncRNAn × β lncRNAn (In equation, 'Exp' represents the expression level of lncRNAs. 'β' is the regression coefficient acquired from the multivariate Cox regression analysis.) The median risk score was used as the threshold for the classification of patients with BC into high-and low-risk groups. The relationship between risk score and overall survival time was visualized by Kaplan-Meier curves and was evaluated with the log-rank test with a significance level of P<0.05. The receiver operating characteristic (ROC) curve was used to evaluate the predictive model's performance by calculating the area under the curve (AUC). In addition, the ROC curve was visualized using the 'survival ROC' package in R.

Characteristics of clinical features in the TCGA database
Clinical information of BC patients from the TCGA database was available in 400 of 433 cases. The clinical characteristics including age, gender, tumor stage and grade, lymph node metastasis, and invasiveness of the BC patients are presented in Table 1. The median age of the patients was 68 years (range: 34-89 years). Significantly, all clinical features other than gender were markedly associated with survival. BC patients with upper age, high-grade, later stage, and lymphatic metastasis were associated with adverse outcomes.

Identification of DE lncRNAs, DE mRNAs, and DE miRNAs
The RNA sequence profiles of BC and adjacent-normal bladder tissues downloaded from the TCGA database were pre-processed to identify the DE lncRNAs, DE mRNAs, and DE miRNAs. A total of 723 DE lncRNAs (471 up-and The heat maps showed the expression profiles of the 20 top-ranking DE RNAs (Figure 2A). Volcano plots presented the distribution of DE RNAs were illustrated according to-log 10 FDR as abscissa and log 2 FC as coordinate ( Figure  2B).

Construction of the lncRNA-mediated ceRNA network
The lncRNA-mediated ceRNA network was constructed to investigate the mechanism by which lncRNAs influenced the expression of mRNAs in BC. According to the ceRNA hypothesis principle, lncRNAs might regulate gene expression by competitively binding microRNA through MREs. Therefore, we first evaluated the relationships between 723 DE lncRNAs and 157 DE miRNAs using the miRcode database. A total of 409 pairs of lncRNAs and miRNAs were then identified, which included 80 lncRNAs and 21 miRNAs. Based on the above 21 miRNAs, 915 target mRNAs were identified using the miRDB, miRTarBase, and TargetScan databases. Finally, there were 36 BC-specific mRNAs in the ceRNA network after intersecting target mRNAs with 1816 DE mRNA. On the basis of the above data, the ceRNA network that consisted of 49 lncRNAs, 17 miRNAs, and 36 mRNAs was established and visualized using Cytoscape. As shown in Figure 3, two networks were constructed based on the expression levels of miRNA in BC.

Functional enrichment analysis of DE mRNAs in the ceRNA network
To better understand the biological functions of DE lncRNAs in the ceRNA network, we analyzed the 36 mRNAs of the ceRNA network using GO and KEGG analyses. The results of the GO analysis indicated that the mRNAs were mainly enriched in the molecular function (MF) and biological process (BP) ( Figure 4A). The mRNAs related to MF were most relevant to protein homodimerization activity, transcription corepressor activity, and chromatin binding. In terms of BP, mRNAs were mainly associated with DNA-templated transcription and negative regulation of transcription from RNA polymerase II promoter (Supplementary Table S1).
The results of KEGG analysis showed that four main enriched pathways were identified, including 'Mi-croRNAs in cancer' , 'Pathways in cancer' , 'mitogen-activated protein kinase (MAPK) signaling pathway' , and 'phosphatidylinositol-3 kinase-protein kinase B (PI3K-Akt) signaling pathway' (Figure 4B and Supplementary Table S2). Dysregulation of miRNAs has been widely observed in different types of cancer via the degradation of target mRNAs [21,22]. Signaling pathways play a significant role in living organisms. Still, multiple signaling pathways have been identified as oncogenesis and cancer progression drivers, such as the 'MAPK signaling pathway' and 'PI3K-Akt signaling pathway' [23][24][25]. These results indicated that these lncRNA-mediated mRNAs were involved in the pathophysiology and development of BC.

Construction of the risk score model and survival analysis
Of 49 lncRNAs in the ceRNA network, ten lncRNAs associated with prognosis and overall survival were identified using univariate Cox regression analysis (Table 2). Subsequently, the ten lncRNAs were evaluated using multivariate Cox regression analysis. As a result, only three lncRNAs (AC112721.1, TMPRSS11GP, and ADAMTS9-AS1) had a significant independent prognostic value in BC with P<0.05 (Table 2 and Figure 5A-C). We further investigated the relationship between the three lncRNAs and clinicopathological characteristics of BC patients. The results showed that the expressions of AC112721.1 and ADAMTS9-AS1 were significantly associated with lymph node metastasis, tumor grade, and pathologic stage of BC (P<0.05), while the expression of TMPRSS11GP was significantly associated with age and tumor grade (P<0.05) (Supplementary Table S3). Further, the patients in the high expression group tended to have higher tumor stages and grades.
In addition, the risk score model was constructed based on the expression level of the three lncRNAs outlined above. A total of 400 patients with BC were screened out and divided into high-and low-risk groups with the median risk score as the threshold. The Kaplan-Meier survival analysis revealed that the high-risk group's overall survival was    worse than that of the low-risk group (P=0.001, Figure 5D). The ROC curves were plotted to determine the optimal cutoff points for the three-lncRNA signature in predicting 3-and 5-year survival. The AUC values were 0.778 and 0.792 for 3-and 5-year survival, respectively, implying the moderate accuracy in prediction ( Figure 5E).

Discussion
BC is one of the most common malignant tumors worldwide with remarkable morbidity and mortality [26]. Although 70-80% of first diagnosed cases are non-muscle-invasive bladder cancer (NMIBC), more than a third of NMIBC will continue to progress and metastasize over time [27,28]. Hideously, muscle invasion and metastasis substantially reduce the 5-year survival rate and contribute to a high mortality rate for patients with BC [29]. Accordingly, it is imperative to find novel biomarkers for predicting and diagnosing BC accurately. Growing experimental evidence indicates that the abnormal expression of ncRNAs, including lncRNAs and miRNAs, is intimately relevant to malignant progression and metastasis of cancer [30][31][32][33].
Since the ceRNA hypothesis has been proposed, researchers are increasingly taking an interest in the ceRNA network in which lncRNAs may affect the transcription and expression of mRNA by interacting with miRNAs. For instance, Miao et al. [34] identified the DE lncRNAs between 13 BC and eight normal adjacent tissue samples via microarray analysis and investigated the biological functions of LINC00612 in vitro and in vivo. The results indicated that LINC00612 served as a ceRNA to up-regulate PHF14 expression and then promoted BC cell proliferation and invasion. Another study showed that lncRNA ZEB1-AS1 functioned as a ceRNA in BC to regulate the expression of protein-coding gene fascin-1 through miR-200b [35]. However, the accuracy of the above results is limited due to the few samples.
In the present study, based on data from the TCGA database, we analyzed expression profiles of three kinds of RNA of more than 400 BC tissues and 19 adjacent-normal tissues and then performed the ceRNA network analysis. As a result, 723 lncRNAs, 157 miRNAs, and 1816 mRNAs were found to be differentially expressed in BC with FDR < 0.01 and |log 2 FC| > 2 as the threshold, of which 49 lncRNAs, 17 miRNAs, and 36 mRNAs were identified to construct the ceRNA network. To explore the functions of lncRNAs in the ceRNA network, the enrichment of functions and signaling pathways of the 36 mRNAs were conducted using GO and KEGG analyses. The results showed that these mRNAs were mainly enriched in 'nuclear domains' , indicating that lncRNAs acting as ceRNAs might play significant roles in regulating transcription, stability, and translation of mRNA.
In addition, we predicted that the lncRNA-mediated 36 mRNAs were related to the miRNAs and signaling pathways, such as the 'MAPK signaling pathway' and 'PI3K-Akt signaling pathway' . It has been confirmed that abnormal expression of miRNA was related to oncogenesis. Cheng et al. [36] reported that the expression of miR-200c was significantly up-regulated in BC compared with the adjacent normal tissues and directly targeted the mRNA RECK to promote the cell migration and invasion of BC. MAPK signaling pathway is a reasonably complicated pathway associated with cell proliferation, invasion, metastasis, and survival processes. Kumar et al. [37] confirmed the facilitating effect of the MAPK pathway on cell proliferation, progression, and survival of BC through treating BC cells with MAPK specific inhibitors. Moreover, the PI3K-AKT signaling pathway is one of the most frequent activated pathways in various cancers, while its aberrant activation will reprogramme cellular metabolism for the sake of supporting survival and proliferation of cancer cells [25]. Lin et al. [38] reported that glaucocalyxin A induced apoptosis and G2/M cell cycle arrest of BC cells by inhibiting the PI3K-AKT signaling pathway, indicating that targeting PI3K-AKT signaling may be a potential therapeutic method for BC. The mechanism analysis related to the ceRNA network is meaningful and necessary to provide new insights into early detection and BC therapies.
To explore BC's prognostic biomarker based on the ceRNA network, we investigated the association of lncRNAs in the ceRNA network and overall survival in BC patients using the univariate and multivariate Cox regression. The analyses showed that three lncRNAs (AC112721.1, TMPRSS11GP, and ADAMTS9-AS1) possessed significant independent prognostic value in BC, which was consistent with the previous result [39]. However, to the best of our knowledge, the present study is the first to report on the relationship between TMPRSS11GP expression and prognosis of BC patients. Previous studies also attempted to explore BC-related biomarkers through the construction of the ceRNA network. Based on the four mRNAs, including ACTC1, FAM129A, OSBPL10, and EPHA2, Jiang et al. constructed a ceRNA network as a biomarker for BC patients [40]. Moreover, another study suggested that six biomolecules (LINC01198, PTPRD-AS1, SEMA3D, has-miR-216a, EPHA5, and DCLK1) from the ceRNA regulation network were significantly related to the excellent survival prognosis of BC, indicating that these characteristic molecules were high-risk factors for BC [41]. These discrepancies between previous studies and the present study probably attribute to different research methods and targets.
ADAMTS9-AS1 is an antisense lncRNA and has been reported in a variety of cancers. Fan et al. [42] reported that ADAMTS9-AS1 could serve as an independent prognostic marker for predicting the survival of patients with breast cancer. Another study by Xing and colleagues also indicated that ADAMTS9-AS1 could play a role in predicting the overall survival of patients with colon adenocarcinoma [43]. In our study, low expression of ADAMTS9-AS1 was associated with poor survival of BC patients, indicating that it had a significant prognostic value in BC.
Finally, we evaluated the predictive value of the three-lncRNA signature in BC. The resulted show that the risk score constructed based on the expression of these three lncRNAs was significantly related to overall survival in BC patients (P<0.05). Patients with high-risk scores faced an unfortunate tendency of worse overall survival, indicating that the three-lncRNA signature was a potential independent prognostic factor of BC.

Conclusions
To sum up, we have analyzed DEGs between BC and adjacent-normal tissues and then constructed a BC-related ceRNA network comprising 49 DE lncRNAs, 17 DE miRNAs, and 36 DE mRNAs based on the TCGA database. Three lncRNAs were identified to function as predictive biomarkers of BC based on the overall survival. Notably, a high-risk score was significantly associated with poor overall survival of BC patients. The present study provides novel insights into the lncRNA-mediated ceRNA network and the potential of lncRNAs to be candidate prognostic biomarkers in BC, which could help better understand the pathological changes and pathogenesis of BC and be useful for clinical studies in the future.

Data Availability
The origin RNA-seq data used in our study were all downloaded from the TCGA (https://portal.gdc.cancer.gov/). All data analyzed during this study are included in this published article and its supplementary information files.