The lncRNAs RP1-261G23.7, RP11-69E11.4 and SATB2-AS1 are a novel clinical signature for predicting recurrent osteosarcoma

Abstract Background: Osteosarcoma is the most common primary bone malignancy in children and adolescents. In order to find factors related to its recurrence, and thus improve recovery prospects, a powerful clinical signature is needed. Long noncoding RNAs (lncRNAs) are essential in osteosarcoma processes and development, and here we report significant lncRNAs to aid in earlier diagnosis of osteosarcoma. Methods: A univariate Cox proportional hazards regression analysis and a multivariate Cox regression analysis were used to analyze osteosarcoma patients’ lncRNA expression data from the Therapeutically Applicable Research To Generate Effective Treatments (TARGET), a public database. Results: A lncRNA signature consisting of three lncRNAs (RP1-261G23.7, RP11-69E11.4 and SATB2-AS1) was selected. The signature was used to sort patients into high-risk and low-risk groups with meaningful recurrence rates (median recurrence time 16.80 vs. >128.22 months, log-rank test, P<0.001) in the training group, and predictive ability was validated in a test dataset (median 16.32 vs. >143.80 months, log-rank test, P=0.006). A multivariate Cox regression analysis showed that the significant lncRNA was an independent prognostic factor for osteosarcoma patients. Functional analysis suggests that these lncRNAs were related to the PI3K-Akt signaling pathway, the Wnt signaling pathway, and the G-protein coupled receptor signaling pathway, all of which have various, important roles in osteosarcoma development. The significant 3-lncRNA set could be a novel prediction biomarker that could aid in treatment and also predict the likelihood of recurrence of osteosarcoma in patients.


Introduction
Osteosarcoma is a common malignant bone tumor, which is predominant in childhood and adolescence [1]. Multidrug chemotherapy and surgical removal of primary tumors have shown ameliorating effects [2]. Although improvements in therapeutic strategies have been achieved, the prognosis remains poor for most patients with recurrent osteosarcoma. It is therefore imperative to identify effective and novel recurrent biomarkers and therapeutic targets for osteosarcoma.
There have also been many lncRNA studies focusing on clinical prognosis in osteosarcoma. Dysregulation of lncRNA MEG3 could act as a potential predictor of the progression and poor prognosis of osteosarcoma [10], and overexpression of lncRNAs UCA1 and TUG1 are essential in the poor prognosis of osteosarcoma [11,12]. In addition, SNHG4 promotes tumor growth and is a predictor of poor survival and recurrence in human osteosarcoma [13]. HNF1A-AS1 is a negative prognostic biomarker that promotes tumorigenesis and progression in osteosarcoma [14]. However, prognostic signatures of recurrence are still rare, meaning that the identification of such signatures is necessary to improve clinical treatment.
In the present study, we identified an lncRNA signature in osteosarcoma from the Therapeutically Applicable Research To Generate Effective Treatments (TARGET) database, and the signature was an independent prognostic factor for osteosarcoma. The presence of the signature allowed patients to be separated into high-risk and low-risk groups, and this signature may enable the development of novel recurrence biomarkers and therapeutic targets for osteosarcoma.

Clinical cohorts and RNA-Seq data
Clinical information (age, gender) of 92 patients with osteosarcoma was obtained from the TARGET data portal (https://ocg.cancer.gov/), including 47 recurrent samples, 32 non-recurrent samples, and 13 censored samples. The RNA expression data (level 3) of 92 osteosarcoma samples were downloaded from the freely available TARGET data portal. The RNA sequence data from the 92 samples was derived from the IlluminaHiSeq RNASeq sequencing platforms, and the differentially expressed mRNAs and lncRNAs were obtained by using the edgeR software [15] to further analyze the recurrence vs. non-recurrence data. Fold changes (log 2 absolute) ≥2 and P<0.05 indicated a statistically significant difference.

Construction of co-expression network
We examined the correlation between the expression levels of the 42 differentially expressed lncRNAs and each differentially expressed mRNA using two-sided Pearson correlation coefficients and the z-test. The mRNAs positively or negatively correlated with the 42 lncRNAs were considered to be lncRNA-related mRNAs (|Pearson correlation coefficient| > 0.4 [16-20] and P-value <0.05). The lncRNA-mRNA co-expression network was then constructed using Cytoscape V.3.8.5 (https://cytoscape.org/).

Construction of an lncRNA signature in the training dataset
We performed a univariate Cox proportional hazards regression analysis [21] to assess the relationship between patient recurrence and differentially expressed lncRNAs in the training group. In order to screen the most powerful diagnostic and prognostic lncRNAs, we next used multivariate Cox regression analysis and built a model to assess the prognosis risk according to the following: where N is the representative number of prognostic lncRNAs, Exp i is the expression value of the lncRNAs, and Coef i is a single factor Cox regression coefficient. Risk Score (RS) is the multinode weighted sum of risk scores.

Statistical analysis
We adopted the median risk score as a cut-off value in the training dataset [22], using it to divide the osteosarcoma patients into low-and high-risk groups. We used Kaplan-Meier survival analysis to estimate recurrence time and compare recurrence curves between the two groups, and a two-sided log-rank test to assess the statistical significance. We then validated the prognostic ability of the lncRNA signature in the test dataset using the receiver operating characteristic (ROC) values and Kaplan-Meier survival analysis. We carried out data stratification analysis and multivariate Cox regression analysis in order to identify whether the lncRNA were an independent factor, and to assess the signature within the available data. We considered a value of P<0.05 as significant. All analyses were performed with the R statistical program (version 3.5.1), using the survival and pROC packages downloaded from Bioconductor (http://www.bioconductor.org/).

Construction of the lncRNA-function network
The Kyoto Encyclopedia of Genes and Genomes pathway (KEGG) and gene ontology functional enrichment analyses (GO) were carried out to better illustrate the potential functional mechanism of recurrence-related lncRNA modules of osteosarcoma. The DAVID Bioinformatics Tool (https://david.ncifcrf.gov/), which consists of an integrated biological information base and analytical tools aimed at systematically extracting biological meaning from gene/protein lists was used. We first screened the three lncRNAs correlated with differentially expressed mRNAs (|Pearson correlation coefficient| > 0.4 and P-value < 0.05), and then screened the mRNAs involved in the functional analysis. These database inputs and screening parameters allowed for construction of the lncRNA-function network using Cytoscape, version.3.8.5 (https://cytoscape.org/).

Patients' characteristics
All expression data used in the present study were from patients clinically and pathologically diagnosed with osteosarcoma. The median age was 15 years (4-40 years) for the enrolled patients. Other clinical information about the patients is in Table 1. Subsequently, we randomly divided the 92 patients into two groups (training group, n=61; test group, n=31) to seek and validate the prognostic lncRNA found in the osteosarcoma patients. The selection scheme of the recurrent lncRNA signature can be found in Figure 1.

Identification of significantly differentially expressed mRNAs and lncRNAs
A total of 19495 mRNAs and 14589 lncRNAs were identified from the TARGET database. Using |Fold changes| ≥2 and P<0.05 as cutoffs, we identified 184 differentially expressed mRNAs (71 down-regulated and 113 up-regulated) and 42 differentially expressed lncRNAs (22 down-regulated and 20 up-regulated) (Supplementary Table S1).

Construction of the lncRNA-mRNA co-expression network in osteosarcoma
In order to better study the role of differentially expressed lncRNAs in osteosarcoma and to further elucidate the interaction between these differentially expressed mRNAs, we constructed an osteosarcoma lncRNA-mRNA related regulatory network. We used the 36 differentially expressed lncRNAs and 101 differentially expressed mRNAs to construct this network (Figure 2 and Supplementary Table S2), and screening criteria were the |Pearson correlation coefficient| > 0.4 and P-value <0.05. Using these criteria, we found that six up-regulated lncRNAs (RP1-261G23.7, RP11-81A22.5, RP11-69E11.4, RP11-817J15.3, SATB2-AS1 and CTB-4E7.1) functioned in regulating many mRNAs, suggesting an important regulatory role for these lncRNAs in osteosarcoma.

Construction of the recurrent lncRNA signature with the training dataset
The training group (n=61), which included complete clinical information, was used to explore the association of recurrence with 42 differentially expressed lncRNAs. We initially performed a univariate Cox proportional hazards regression analysis of the expression profiling data of differentially expressed lncRNAs, with recurrence time and recurrence status as the dependent variables. We identified four lncRNAs that were significantly correlated with recurrence of patients (P-value <0.05, Supplementary Table S3). In order to screen the most powerful diagnostic and prognostic lncRNAs, we next used a multivariate Cox regression analysis (Supplementary Table S4) and built a 3-lncRNA (RP1-261G23.7, RP11-69E11.4, and SATB2-AS1) model to assess the recurrence risk. This set of 3-lncRNAs also functions in regulating the lncRNA-mRNA co-expression network. The risk score (Supplementary Table S5)  where RS is the risk score and ev is the expression value.

Determining the survival power of the lncRNA signature in the training and test datasets
The selected lncRNA signature returned a risk score for each patient. We used the median risk score to divide the training group patients into either a low-risk group (n=31) or a high-risk group (n=30). Using the Kaplan-Meier survival analysis, we found that the low-risk group patients had a significantly lower recurrence rate than those in the high-risk group (median recurrence time: >128.22 vs. 16.80 months, log-rank test P<0.001; Figure 3A). The 5-year recurrence rate of the patients in the high-risk group was less than 20%, while that of the patients in the low-risk group was greater than 60%.
The recurrent risk score model used to calculate the lncRNA signature-based risk scores of the 31 patients in the test dataset was also used to validate the prediction power of the lncRNA signature. Patients were divided into low-risk and high-risk groups using the median risk score as a demarcation point, in the same fashion as the training dataset. Kaplan-Meier curves were used to show the low-risk and high-risk groups in the test dataset ( Figure 3B), and the results were similar to those from the training dataset. In the test dataset, the low-risk group had a significantly higher recurrence rate than the high-risk group (median recurrence time: >143.80 vs. 16.32, log-rank test P=0.006). The recurrence rate of patients in the high-risk group was less than 20% at 5 years, in contrast with more than 60% in the low-risk group.

The survival prediction power of the lncRNA signature in the training and test groups
We performed a separate ROC analysis to test the prediction power of the lncRNA signature, which considered the larger area under the ROC curve as a better model for predicting recurrence of the osteosarcoma. In the training dataset, the predictive ability of the 3-lncRNA signature was high (AUC Signature = 0.68, Figure 3C), which further demonstrated that this signature was a novel and highly accurate biomarker of recurrence. A similar result was seen in the test group (AUC Signature = 0.75, Figure 3D).

The selected 3-lncRNA signature is an independent prognostic factor
A multivariate Cox regression analysis using the signature-based risk score and other clinical characteristics (e.g. age and gender) demonstrated that the prognostic power of the lncRNA signature risk score for the prediction of

Functional prediction of differentially expressed mRNAs related to 3 lncRNAs
KEGG and GO analyses were used to investigate the potential involvement of the 3 lncRNAs in biological processes associated with osteosarcoma development. We performed functional analysis with co-expressed mRNAs of 3 lncR-NAs in each lncRNA module (|Pearson correlation coefficient| > 0.4 and P-value <0.05, Supplementary Table S6). The 3 lncRNAs-functional network (Figure 4 and Supplementary Table S6) was created using Cytoscape. The results indicated that the 3 lncRNA signature was related to a collagen trimer, endoplasmic reticulum lumen, and protein digestion and absorption. We also found that only RP1-261G23.7 was involved in the Wnt signal pathway, which is in a variety of biological processes, including cell cycle regulation and cancer. RP11-69E11.4 was involved in the G-protein coupled receptor signaling pathway, which is linked to the activation of adenylyl cyclase activity and subsequent cAMP/PKA/CREB signaling [23] as potential drivers of tumorigenesis. Finally, SATB2-AS1 is involved in the PI3K-Akt signaling pathway, which is essential in cell proliferation and the biological characteristics of malignant cells [24].

Discussion
Individuals with recurrent osteosarcoma have less than a 20% chance of long-term survival, despite aggressive therapies [25]. Identification of a clinically relevant signature for prognosis is of critical need for patients. Recent studies suggested that lncRNAs could be good candidates for tumor signatures as they possess high sensitivity and high specificity [26][27][28][29][30]. Increasing data have demonstrated that lncRNAs are associated with osteosarcoma prognosis. For example, up-regulated FOXC2-AS1 was correlated with longer survival time [31], while HULC was associated with distant metastasis and clinical stage, and with shorter overall survival time [32,33]. Moreover, BANCR was associated with large tumor size, distant metastasis, and advanced clinical stage, and was an independent predictor of poor survival rate [34][35][36]. However, lncRNA signatures and their relation to recurrence of osteosarcoma has rarely been investigated. In our study, different statistical methods were used to identify a 3-lncRNA signature associated with the recurrence of osteosarcoma in patients. We found that the lncRNA signature was an independent predictor in patient recurrence. A multivariate Cox regression analysis was used to assess the independence of the selected signature in predicting recurrence of osteosarcoma. With age and gender as covariables, the risk scores of patients based on the signature maintained an independent association with recurrence. Taken together, these results suggested that the prognostic power of the lncRNA signature for predicting recurrence of osteosarcoma in patients was independent of other clinical characteristics.
Moreover, the expression levels of RP1-261G23.7, RP11-69E11.4, and SATB2-AS1 were increased in recurrent osteosarcoma samples vs. non-recurrent samples. These lncRNAs also had important regulatory roles in osteosarcoma based on the lncRNA-mRNA co-expression network findings. Finally, we analyzed the functions of these lncRNAs, and the results showed that RP1-261G23.7 was involved in growth factor activity, single organismal cell-cell adhesion, the G-protein coupled receptor signaling pathway, and cell adhesion. Vascular endothelial growth factor A (VEGF-A) is known to function directly in blood vessel formation and maintenance, and is a promising therapeutic target for activation in cancer. RP1-261G23.7 is antisense VEGF-A lncRNA, and a recent report showed that RP1-261G23.7 functions as a VEGF-A promoter enhancer-like element, possibly by acting as a local scaffold for proteins [37]. RP11-69E11.4 was involved in the Wnt signaling pathway, which is sensitive to alterations at various nodes of the pathway, and these alterations were identified in multiple tumor types after hyperactivation. These alterations converge into increased tumorigenicity, sustained proliferation and enhanced metastatic potential [24]. In addition, SATB2-AS1 is specifically involved in the PI3K-Akt signaling pathway, which can affect the epithelial-mesenchymal transition in numerous ways to influence tumor aggressiveness [38]. SATB2-AS1 was reported as overexpressed in osteosarcoma, and was associated with increased cell proliferation and growth [39]. These results indicated that recurrent lncRNA signatures executed a variety of biological functions in osteosarcoma tumorigenesis and development. There has been relatively little research on the three signature lncRNAs, and our results provide a foundation for further investigation into the function of the three lncRNAs.
There are a few limitations of this research regarding osteosarcoma. Most importantly, the specific prediction mechanism of the 3 lncRNAs in osteosarcoma requires further study. Moreover, the lncRNA signature has not yet been tested prospectively in multiple clinical trials. Despite these drawbacks, the significant and consistent correlation of our lncRNA signature with recurrence in two independent datasets indicated that it was a potentially powerful prognostic signature for osteosarcoma.
In conclusion, our data clearly indicate that the lncRNA signature can predict the recurrence of osteosarcoma in patients. The signature provides accurate prediction, which is clinically very significant.