Bioinformatics analysis of prognostic significance of COL10A1 in breast cancer

Abstract Background: Collagen type X alpha 1 (COL10A1) is overexpressed in diverse tumors and displays vital roles in tumorigenesis. However, the prognostic value of COL10A1 in breast cancer remains unclear. Methods: The expression of COL10A1 was analyzed by the Oncomine database and UALCAN cancer database. The relationship between COL10A1 expression level and clinical indicators including prognostic data in breast cancer were analyzed by the Kaplan–Meier Plotter, PrognoScan, and Breast Cancer Gene-Expression Miner (bc-GenExMiner) databases. Results: COL10A1 was up-regulated in different subtypes of breast cancer. Estrogen receptor (ER), progesterone receptor (PR), human epidermal growth factor receptor-2 (HER-2) status and nodal status were positively correlated with COL10A1 expression. Conversely, age, the Scarff–Bloom–Richardson (SBR) grade, basal-like status, and triple-negative status were negatively related to COL10A1 level in breast cancer samples compared with normal tissues. Patients with increased COL10A1 expression level showed worse overall survival (OS), relapse-free survival (RFS), distant metastasis-free survival (DMFS) and disease-free survival (DFS). COL10A1 was positively correlated with metastatic relapse-free survival. GSEA analysis revealed that enrichment of TGF-β signaling pathway. 15-leucine-rich repeat containing membrane protein (LRRC15) is a correlated gene of COL10A1. Conclusion: Bioinformatics analysis revealed that COL10A1 might be considered as a predictive biomarker for prognosis of breast cancer. Further experiments and clinical trials are essential to elucidate the value of COL10A1 in breast cancer treatment.


Introduction
Breast cancer is the most common cancer among women and a main cause of cancer-related deaths worldwide [1]. Screening and diagnosis of early breast cancer are crucial to reduce morbidity and mortality [2,3]. Although clinical, pathological and molecular indicators are generally applicated in predicting prognosis, the underlying pathogenesis of breast cancer aggressiveness remain poorly understood, and minimally invasive biomarkers for the detection of early-stage breast cancer are vital in breast cancer research [4].
Collagen type X alpha 1 (COL10A1), a secreted, short-chain collagen, belongs to the collagen family, which is a major interstitial matrix component [5]. COL10A1 expression is elevated in many solid tumor types, such as colon cancer, esophagus cancer, and breast cancer, and displays vital roles in many critical cellular processes such as cell proliferation, migration, invasion and tumor vasculature [6][7][8]. COL10A1 protein levels in plasma might be a potential diagnostic predictor for early breast cancer [9]. Although COL10A1 was reported to be highly expressed in tumors by high throughput sequencing, the specific role of COL10A1 in breast cancer was unknown [10][11][12].
Therefore, in the present study, we evaluated the significance of COL10A1 gene expression in breast cancer by using comprehensive bioinformatics analysis of the clinical indicators and survival data in several large online databases.

Oncomine analysis
The COL10A1 mRNA expression level was analyzed in breast cancer and matched normal tissues based on the Oncomine Platform (www.oncomine.org). The analysis was conducted using the following filters: Gene, COL10A1; differential analysis, cancer vs. normal analysis; cancer type, breast cancer; and data type, mRNA. In the present study, we selected two-fold change, P-value = 1E-4 and top 10% gene rank as threshold. All statistical methods and statistical values were obtained directly from the corresponding database.

Breast cancer gene-expression miner
The expression of COL10A1 mRNA in different subtypes of breast cancer and the correlation between genes or identified clusters of correlated co-expressed genes were analyzed using the Breast Cancer Gene-Expression Miner (bcGenExMiner v4.3, http://bcgenex.centregauducheau.fr/BC-GEM). The correlation between COL10A1 and 15-leucine-rich repeat containing membrane protein (LRRC15) were generated using the correlation module.

UALCAN cancer database
UALCAN is a comprehensive, user-friendly and interactive web resource for analyzing cancer OMICS data. It is built on PERL-CGI with high quality graphics using javascript and CSS. UALCAN now provides protein expression analysis option using data from Clinical Proteomic Tumor Analysis Consortium (CPTAC) Confirmatory/Discovery dataset. We evaluated the protein expression of COL10A1 in breast cancer by CPTAC analysis.

PrognoScan
The PrognoScan online database (http://www.prognoscan.org/) provides a powerful platform for assessing the biological relationships between gene expression and prognostic information in cancer patients. PrognoScan includes public microarray datasets with clinical annotation of gene expression and prognosis from Gene Expression Omnibus (GEO), ArrayExpress and individual laboratory websites. The correlation between COL10A1 expression and survival in breast cancers was analyzed by PrognoScan database. Cox P-values and hazard ratio (HR) with 95% confidence intervals were calculated automatically according to the mRNA level (high or low).

Kaplan-Meier survival curve analysis
The prognostic value of COL10A1 mRNA and protein expression in breast cancer was assessed according to overall survival (OS)/relapse-free survival (RFS) using Kaplan-Meier plotter (kmplot.com/analysis), an online database including gene expression data and clinical data. With the purpose to assess prognostic value of a specific gene, the patient samples were divided into two cohorts according to the median expression of the gene (high vs. low expression). Log-rank P-values and HRs with 95% confidence intervals were determined on the webpage.

UCSC Xena
The heat map of COL10A1 and LRRC15 in the same patient cohort were constructed by data mining in the Cancer Genome Atlas (TCGA) Breast Cancer using the UCSC Xena browser (http://xena.ucsc.edu/).

Gene Expression Profiling Interactive Analysis dataset
Gene Expression Profiling Interactive Analysis (GEPIA) is a newly developed interactive web server for analyzing the RNA sequencing expression data of 9736 tumors and 8587 normal samples from TCGA and the Genotype-Tissue Expression (GTEx) projects, using a standard processing pipeline (http://gepia.cancer-pku.cn/). GEPIA provides customizable functions such as tumor or normal differential expression analysis, profiling according to cancer types or pathological stages, patient survival analysis, similar gene detection, correlation analysis and dimensionality reduction analysis. The expression of LRRC15 was analyzed by GEPIA database.

LinkedOmics dataset
LinkedOmics (http://www.linkedomics.orglogin.php) is a new and unique tool in the software ecosystem for disseminating data from large-scale cancer omics projects. It uses preprocessed and normalized data from the Broad TCGA Firehose and Clinical Proteomic Tumor Analysis (CPTAC) data portal to reduce redundant efforts and focus on the discovery and interpretation of attribute associations, and thus complements existing cancer data portals. GSEA analysis was conducted by LinkedOmics Dataset.

The expression of COL10A1 is increased in breast cancer patients
The mRNA expression of COL10A1 in breast cancer was analyzed using the Oncomine database. The higher expression of COL10A1 was observed in male breast carcinoma, intraductal cribriform breast adenocarcinoma, invasive breast carcinoma, invasive lobular breast carcinoma, invasive ductal breast carcinoma, mixed lobular and ductal breast carcinoma, ductal breast carcinoma in situ stroma, invasive ductal breast carcinoma stroma and ductal breast carcinoma, compared with the corresponding normal tissues ( Figure 1A-I and Table 1). The higher protein expression of COL10A1 was also detected in breast cancer tissues by UALCAN cancer database ( Figure 1J).

The relationship between COL10A1 expression and clinical indicators in breast cancer patients
By using the bc-GenExMiner online tool, we next compared COL10A1 expression among groups of patients, according to different clinical indicators. Regarding age, the expression of COL10A1 was significantly higher in ≤51compared with >51-year group (Figure 2A and Table 2). The Scarff-Bloom-Richardson (SBR) is a histological grade that evaluates tubule formation, nuclear characteristics of pleiomorphism and mitotic index. Breast cancer patients with more advanced SBR grade tended to express lower COL10A1 gene ( Figure 2B and Table 2). Estrogen receptor (ER), progesterone receptor (PR) status and human epidermal growth factor receptor-2 (HER-2) status were positively associated with COL10A1 expression ( Figure 2C-E and Table 2). Breast cancer patients with positive nodal status (N) showed increased level of COL10A1 than those with negative nodal status ( Figure 2F and Table 2). Besides, we found that COL10A1 was strongly elevated in non-basal-like subtype with respect to basal-like subtype; the same pattern of change was also observed in triple-negative breast cancer (TNBC) patients ( Figure 2G,H and Table 2).

Increased expression of COL10A1 correlates with poor outcome in breast cancer patients
We then analyzed the prognostic value of COL10A1 gene. The Kaplan-Meier plotter revealed that lower level of COL10A1 correlated with preferable OS ( Figure 3A). While breast cancer patients with up-regulated COL10A1 demonstrated worse RFS ( Figure 3B). Furthermore, the PrognoScan database showed that overexpression of COL10A1 was significantly associated with inferior OS, relapse-free survival, distant metastasis-free survival and disease-free survival (Table 3). To further investigate the role of COL10A1 in breast cancer prognosis, we verified that COL10A1 was positively correlated with metastatic RFS by the bc-GenExMiner software ( Figure 3C). We also detected the high regulation of COL10A1 protein was mild significantly related to the worse OS in Kaplan-Meier plotter ( Figure 3D). To identify the potential function of COL10A1, GSEA was conducted to search KEGG pathways enriched. The results revealed that enrichment of TGF-β signaling pathway ( Figure 3E).

COL10A1 and LRRC15 are co-expressed in breast cancer patients
To further study the underlying mechanism of COL10A1 in breast cancer, we conducted co-expression data mining of COL10A1 by the Oncomine database. The co-expression profile of COL10A1 was identified with a large cluster of 19139 genes across 66 breast carcinomas, and LRRC15 is a correlated gene ( Figure 4A). Further analysis using bc-GenExMiner revealed the correlation between COL10A1 and LRRC15 ( Figure 4B,C). By comparing the COL10A1 and LRRC15 expression heat map derived from the UCSC Xena web-based tool ( Figure 4D), COL10A1 expression was proved to be positively related with LRRC15 transcript level, which was determined among a 50-gene qPCR assay (PAM50) breast cancer subtypes in TCGA database ( Figure 4C). The data above indicated that COL10A1 could be associated with the LRRC15 signaling pathways in breast cancer.

The validation and survival analysis of LRRC15 in breast cancer
The expression of LRRC15 were validated in GEPIA database. We unearthed that LRRC15 were significantly up-regulated in breast cancer tissues ( Figure 5A). The survival analysis in Kaplan-Meier plotter database confirmed that the up-regulation of LRRC15 were correlated with shorter OS of breast cancer patients, respectively ( Figure 5B).

Discussion
Breast cancer has been reported to be associated with the aberrant expression of oncogenes [13]. Despite the improvements in the diagnosis, treatment and prognosis prediction of breast cancer, it remains the most prevalent malignant tumor with the high incidence in women worldwide. The identification of novel biomarkers for breast cancer is crucial to its diagnosis, therapy and prognosis [14]. COL10A1 is a member of the collagen family. COL10A1 encodes the α chain of type X collagen, a short chain collagen expressed by hypertrophic chondrocytes during endochondral ossification. The expression of COL10A1 was increased in various solid human tumor tissues, which contributed to tumor vasculature staining [6]. COL10A1 showed an important role in differentiating in situ from invasive breast cancer and characterizing DCIS with a high risk developing IDC [11,15,16]. Additionally, the concentration of COL10A1 in the plasma could be a potential biomarker to discriminate breast cancer patients from those with benign disease [9]. Of interest, increased expression of COL10A1 correlate with poor pathologic response in breast tumors [17]. However, the significance of COL10A1 expression in the prognosis of breast cancer remains unclear. This is the first study to identify COL10A1 as a potential predictive biomarker for prognosis of breast cancer.
In our study, we analyzed the expression profile of COL10A1 by Oncomine database. COL10A1 was higher expressed in male breast carcinoma, intraductal cribriform breast adenocarcinoma, invasive breast carcinoma, invasive lobular breast carcinoma, invasive ductal breast carcinoma, mixed Lobular and Ductal Breast Carcinoma, ductal breast carcinoma in situ stroma, invasive ductal breast carcinoma stroma and ductal breast carcinoma with respect to normal tissues. The bc-GenExMiner online tool revealed that ER, PR, HER-2 status and nodal status were positively correlated with COL10A1 expression. Conversely, age, SBR, basal-like status and triple-negative status were negatively related to COL10A1 level in breast cancer samples compared with normal tissues. Therefore, these results indicated that the expression of COL10A1 may predict the prognosis of breast cancer.
We further investigated the prognostic value of COL10A1 in breast cancer using the Kaplan-Meier Plotter, PrognoScan and bc-GenExMiner databases. Patients with increased COL10A1 showed worse OS, RFS, distant metastasis-free survival and disease-free survival. These findings collectively elucidated that the expression of COL10A1 might be a predictive biomarker for prognosis of breast cancer. GSEA analysis revealed the function of COL10A1 enriched in TGF-β signaling pathway in breast cancer. The previous study elucidated the mechanistic link between COL10A1 and the TGF-β1-SOX9 axis in gastric cancer progression [8]. Our study is the first to reveal the potential function between COL10A1 and TGF-β signaling pathway in breast cancer.
The co-expression of COL10A1 was analyzed using the Oncomine, bc-GenExMiner and UCSC Xena web-based tools. The expression of LRRC15 was positively correlated with COL10A1 expression. The type I transmembrane protein LRRC15 is a member of the LRR superfamily [18,19]. The LRR family is a structural module for protein-protein and protein-matrix interactions used for molecular recognition process such as cell adhesion, signal transduction, DNA repair and RNA processing [20,21]. LRRC15 was found to be highly expressed on the cell surface of stromal fibroblasts in many solid tumors. Additionally, LRRC15 was considered as a new marker of cancer-associated fibroblasts and cancers of mesenchymal origin and might be applicated in antibody-drug conjugate targeting the tumor stroma [22].
In conclusion, this analysis revealed that COL10A1 was higher expressed in breast cancer compared with normal tissues and was correlated with a worse survival. COL10A1 could be considered as a predictive biomarker for prognosis of breast cancer with co-expressed LRRC15. Further experiments and clinical trials are essential to elucidate the value of COL10A1 in breast cancer treatment.