Overexpression of BUB1B, CCNA2, CDC20, and CDK1 in tumor tissues predicts poor survival in pancreatic ductal adenocarcinoma

Overexpressed genes in tumors usually contributed to aggressiveness in pancreatic ductal adenocarcinoma (PDAC). Using Gene Expression Omnibus (GEO) profiles including GSE46234, GSE71989, and GSE107610, we detected overexpressed genes in tumors with R program, which were enriched by Kyoto Encyclopedia of Genes and Genomes (KEGG), Gene ontology (GO), and Reactome pathway databases. Then, we performed a survival analysis of enriched genes based on TCGA profile. Our results revealed that high BUB1B, CCNA2, CDC20, and CDK1 expression in tumors was significantly associated with worse overall survival (OS) (Log rank P=0.00338, P=0.0447, P=0.00965, and P=0.00479, respectively), which was validated using a Kaplan–Meier plotter with a median cutoff (Log rank P=0.028, P=0.0035, P=0.039, and P=0.0033, respectively). Moreover, overexpression of BUB1B, CCNA2, CDC20, and CDK1 in tumor tissues was significantly associated with disease-free survival (DFS) in PDAC patients (Log rank P=0.00565, P=0.0357, P=0.00104, and P=0.00121, respectively). BUB1B, CCNA2, CDC20, and CDK1 were significantly overexpressed in deceased PDAC patients (all P<0.01) and in patients with recurrence/disease progression (all P<0.05). In addition, PDAC patients with neoplasms of histologic grade G3-4 had significantly higher BUB1B, CCNA2 and CDC20 levels (all P<0.05). In conclusion, the up-regulation of BUB1B, CCNA2, CDC20, CDK1, and WEE1 in tumor tissues are associated with worse OS and DFS in PDAC and is correlated with advanced tumor stage and tumor development.


Introduction
Pancreatic ductal adenocarcinoma (PDAC) arises from the exocrine pancreas and accounts for 95% of all pancreatic cancers [1]. Despite major improvements in its diagnosis and treatment, PDAC remains an aggressive disease that carries a poor prognosis and a 5-year survival rate of approximately 8% in the United States [2,3]. The genetic framework, early metastasis, a dense stroma, propensity for growth in a nutrient-depleted environment, and immunomodulation, all underlie its aggressive nature and resistance to treatment, which makes therapeutic progress a challenge [4]. Hence, it is essential to find predictive biomarkers and novel therapeutic targets to improve the treatment outcome in PDAC patients.
Currently, few tumor markers have been externally validated to predict the survival of patients with PDAC [5]. Novel biomarkers that predict PDAC prognosis and PDAC targets for treatment are urgently required [6]. Recently, big data bioinformatics of molecular targets and networks have gained increased attention [7,8], which is specifically due to the introduction of large-scale molecular analysis platforms  [9]. This tremendous amount of molecular data provide a rich source for a better understanding of the molecular basis of PDAC and for the identification of novel genomic targets for therapeutic intervention.
Using the Gene Expression Omnibus (GEO) database, we identified up-regulated differentially expressed genes (DEGs) between tumor tissues and nontumor tissues in PDAC patients, enriched potential pathways/biological processes, and evaluated associations between up-regulated DEGs and PDAC outcomes. We hope our results provide useful insights into potential candidate biomarkers and the pathogenesis and progression of PDAC.

Study design and source of data
The flowchart of the procedure is described in Figure 1. GEO (https://www.ncbi.nlm.nih.gov/geo/) profiles with raw data of the CEL file type and platforms of Affymetrix arrays with probe ID, Gene Symbol, and Entrez Gene ID were included in this analysis. The gene expression profiles of GSE46234, GSE71989 and GSE107610 were downloaded from the GEO database. The GSE46234 profile was composed of four healthy tissues and four PDAC tissues. GSE71989 included 8 normal pancreatic tissues and 14 PDAC tissues. Data from GSE71989 were generated from Affymetrix arrays. In GSE107610, mRNA from 39 patient-derived PDAC tumors and 2 normal organs was extracted [10] and hybridized to the GeneChip PrimeView Human Gene Expression Array and were scanned using a GeneChip Scanner 3000 7G.

Identification of up-regulated DEGs in PDAC
To investigate DEGs between tumor tissues and nontumor tissues in PDAC patients, the transcriptome gene expression data using the robust multiarray average (RMA) algorithm were explored. Bioconductor (http://www. bioconductor.org) packages Affy and AffyPLM [11,12] were used for quality assessment of tumor and nontumor samples in each GEO profile. The Limma package [13,14] in Bioconductor was used to identify DEGs (log 2 FC > 1, adjusted P-value <0.05). To identify up-regulated DEGs, log 2 FC > 1 and adjusted P-values <0.05 were set for each GEO profile. To identify shared DEGs amongst GSE46234, GSE71989, and GSE107610, the Venny 2.1 online service (http://bioinfogp.cnb.csic.es/tools/venny/index.html) was used to generate a Venn diagram.

Functional enrichment analysis
Kyoto Encyclopedia of Genes and Genomes (KEGG), Gene ontology (GO), and Reactome enrichment analysis of up-regulated DEGs was conducted using Gene Set Enrichment Analysis (GSEA). To investigate gene sets, up-regulated DEGs were uploaded to the Molecular Signatures Database in GSEA. A false discovery rate P-value cutoff of <0.05 was set as the screening condition. The shared up-regulated DEGs in common pathways enriched by KEGG, GO, and Reactome were determined for the Venn diagram using the Venny 2.1 online service (http://bioinfogp.cnb.csic.es/tools/venny/index.html) [15].

Survival analysis
To identify potential candidate biomarkers for overall survival (OS) and disease-free survival (DFS) in PDAC patients, the pancreatic adenocarcinoma (TCGA, Provisional) database in cBioPortal for Cancer Genomics web interface was used [16,17]. A z-score threshold + − 2.0 of mRNA expression was selected in genomic profiles, and 178 cases with sequenced tumors were included in the survival analysis. The mRNA expression levels of potential candidate biomarkers that were calculated by log 2 were compared based on clinical factors in PDAC patients. For validation of prognostic candidates in PDAC survival, a Kaplan-Meier analysis (http://kmplot.com/analysis/) with a median cutoff was used [18,19].

Statistical analysis
Differences of gene expression between the individual groups were analyzed using Mann-Whitney U test and Student's t test based on variables' types. GraphPad Prism version 7.0 (GraphPad Software, San Diego, CA) was used. A two-tailed P<0.05 were considered significant for all tests.

Screening up-regulated expressed genes at the mRNA level
The quality assessments of GSE46234, GSE71989, and GSE107610 were conducted by Affy and AffyPLM packages using relative log expression (RLE) and normalized unscaled standard errors (NUSE). As shown in Supplementary Figures S1-S3, the quality of GSE46234, GSE71989, and GSE107610 was reliable. Amongst GSE46234, GSE71989, and GSE107610, 632 common up-regulated DEGs were identified using a Venn diagram (Figure 2A).

Up-regulated gene functions and pathways
The KEGG pathway, GO biological process, and Reactome gene sets were analyzed for enrichment of up-regulated gene functions and pathways [20,21]. We presented the top ten pathways/biological processes in our study. The cell cycle was the most enriched pathway/biological process in KEGG, GO, and Reactome ( Figure 2B). Additionally, 24 genes were enriched in KEGG pathways, 155 genes were enriched in GO biological processes, and 69 were enriched in Reactome gene sets. Subsequently, we generated a Venn diagram and found that 21 genes including ART, BUB1B, BUB3, CCNA2, CCNB1, CDC20, CDC23, CDC25B, CDC45, CDC7, CDK1, DBF4, MAD2L1, MCM2, MCM3, MCM5, MCM6, MCM7, RAD21, SMC3, and WEE1 in the cell cycle pathway were shared in the three enrichment databases ( Figure 2C).
Considering the results above, we cautiously concluded that up-regulated BUB1B, CCNA2, CDC20, and CDK1 in tumors predict worse survival of PDAC patients. In addition, we performed a protein-protein interaction network analysis of BUB1B, CCNA2, CDC20, and CDK1. As shown in Figure 6, these four genes mostly interact with cell cycle genes and should serve as a panel for the development of malignancies.

Discussion
PDAC is amongst the most important unresolved health problems worldwide and is a lethal disease partly due to a lack of therapeutic treatment targets [22]. To identify prognostic factors that can stratify patients according to biological markers may help in the discovery of novel therapeutic approaches and the selection of adequate treatment strategies [23,24]. Unfortunately, amongst the current prognostic factors, few have been translated into clinical practice [5].
Consistent with previous reports [25,26], we found that when advanced tumor biological behaviors, including the histologic grade of the neoplasm and tumor development, are considered, BUB1B, CCNA2, CDC20, and CDK1 were   enriched in the cell cycle biological process/pathway and were associated with OS and DFS in PDAC patients. This indicates that, based on the data shown in the present study, BUB1B, CCNA2, CDC20, and CDK1 show prognostic value for PDAC patients.
Encoded by BUB1B, BUBR1 expression is sufficient to predict poor prognosis in pancreatobiliary-type tumors [27]. Previous bioinformatics analyses showed that BUB1B was one of the hub genes with high degrees of connectivity to PDAC and might be a potential target for PDAC diagnosis and treatment [25]. However, the role of BUB1B in other types of cancer cells is still controversial. Low expression of BUB1B contributes to poor survival and metastasis in human colon adenocarcinomas [28] and several lung cancer cell lines [29], while overexpression of BUB1B is related to progression and recurrence of gastric cancer [30], bladder cancer [31], liver cancer [32], and many other cancers [33][34][35]. CCNA2 belongs to a highly conserved cyclin family and is up-regulated in dozens of cancer types, which indicates its potential roles in cancer transformation and progression [36]. It has been reported that a high CCNA2 expression promotes cell proliferation in hepatoma [37] and might help monitor chemotherapy efficacy in breast cancer [38]. A bioinformatics analysis by Zhou et al. [26] revealed that CCNA2 overexpression was tightly related to progression of PDAC. Considering the above findings, we believe that BUB1B and CCNA2 should be novel prognostic biomarkers in PDAC.
Overexpression of CDC20 has been reported in various malignancies and high expression of CDC20 has been associated with high tumor grade in bladder, cervical, colon, endometrial, gastric, liver, ovarian, prostatic, and renal carcinomas [39]. CDC20 has been reported to be significantly associated with poor prognosis in pancreatic [40], lung [41], bladder [42], colon [43], oral squamous cell carcinomas [44], and breast cancers [45]. In one study, overexpression of CDC20 enhanced cell proliferation and invasion, while down-regulation of CDC20 promoted anti-tumor activity in pancreatic cancer cells [46]. Hence, CDC20 may represent a promising therapeutic target in cancer patients including those with PDAC [47]. CDK1 has been reported to be correlated with cancer growth and is a key cell cycle regulator [48]. CDK1 expression and activity are elevated in colorectal cancer [49], prostate cancer [50], and lymphomas [51,52]. Up-regulation of CDK1 is associated with poor prognosis in breast cancer [53] and epithelial ovarian cancer [54,55]. In one study, CDK1 inhibitors contributed to a marked reduction in the proportion of cells in S and G 2 /M phases of the cell cycle in PDAC tumor cell models; targetting CDK1 also showed promising anticancer activity in pancreatic cancer cells [56,57].

Conclusion
Our preliminary analysis showed that BUB1B, CCNA2, CDC20, and CDK1 were overexpressed in tumors in PDAC patients. And these four genes were associated with advanced tumor stage and showed prognostic values for PDAC outcomes. Based on our results, we cautiously concluded that BUB1B, CCNA2, CDC20, and CDK1 should comprise a panel for PDAC development and that they might be novel treatment targets. Since the sample size in this preliminary analysis was small, prospective studies with large samples should be considered for the validation of our results.

Author contribution
Q.C. and H.Z. conceived and designed the study. S.D. and F.H. wrote the manuscript. Q.C. and S.D. analyzed and interpreted the data. All authors read and approved the final manuscript.