Identification of novel genes associated with a poor prognosis in pancreatic ductal adenocarcinoma via a bioinformatics analysis

Pancreatic ductal adenocarcinoma (PDAC) is a class of the commonest malignant carcinomas. The present study aimed to elucidate the potential biomarker and prognostic targets in PDAC. The array data of GSE41368, GSE43795, GSE55643, and GSE41369 were downloaded from Gene Expression Omnibus (GEO) database. The differentially expressed genes (DEGs) and differentially expressed microRNAs (DEmiRNAs) in PDAC were obtained by using GEO2R, and overlapped DEGs were acquired with Venn Diagrams. Functional enrichment analysis of overlapped DEGs and DEmiRNAs was conducted with Metascape and FunRich, respectively. The protein–protein interaction (PPI) network of overlapped DEGs was constructed by STRING and visualized with Cytoscape. Overall survival (OS) of DEmiRNAs and hub genes were investigated by Kaplan–Meier (KM) plotter (KM plotter). Transcriptional data and correlation analyses among hub genes were verified through GEPIA and Human Protein Atlas (HPA). Additionally, miRNA targets were searched using miRTarBase, then miRNA–DEG regulatory network was visualized with Cytoscape. A total of 32 DEmiRNAs and 150 overlapped DEGs were identified, and Metascape showed that DEGs were significantly enriched in cellular chemical homeostasis and pathways in cancer, while DEmiRNAs were mainly enriched in signal transduction and Glypican pathway. Moreover, seven hub genes with a high degree, namely, V-myc avian myelocytomatosis viral oncogene homolog (MYC), solute carrier family 2 member 1 (SLC2A1), PKM, plasminogen activator, urokinase (PLAU), peroxisome proliferator activated receptor γ (PPARG), MET proto-oncogene, receptor tyrosine kinase (MET), and integrin subunit α 3 (ITGA3), were identified and found to be up-regulated between PDAC and normal tissues. miR-135b, miR-221, miR-21, miR-27a, miR-199b-5p, miR-143, miR-196a, miR-655, miR-455-3p, miR-744 and hub genes predicted poor OS of PDAC. An integrative bioinformatics analysis identified several hub genes that may serve as potential biomarkers or targets for early diagnosis and precision target treatment of PDAC.


Introduction
Pancreatic ductal adenocarcinoma (PDAC) is the most common pancreatic neoplasm, accounting for approximately 90% of all pancreatic cancers (PCs), with a poor overall survival (OS) rate of 5-8% [1,2]. Over 52% of cases are diagnosed in a distal metastasis stage, with only 3% showing a 5-year OS benefit [3]. Currently, surgical extirpation is the primary curative strategy; however, most patients with PDAC do not display any specific early signs of clinical disease, which causes missed opportunities for surgery due to the progression of cancer without detection. The preferred and most promising strategy for PC treatment is surgical resection, and the detection of PC during the surgically resectable stages has vast importance for ameliorating the survival benefit of patients with PC [4]. Therefore, an exact, early diagnosis of PDAC is needed, and the discovery of molecular biomarkers and targets could be another promising strategy to promote further developments in therapeutic modalities and strategies.
Various factors are associated with the growth and development of cancer and PDAC, including mutation, age, sex, ethnicity, cigarette smoking, and viral and bacterial infections [5][6][7][8][9]. The association of infectious agents in the etiology of different types of cancer has piqued the interest of scientists in recent years. Recent data have shown the involvement of Mycoplasma hominis, Chlamydia pneumoniae, and Escherichia coli infection in the etiology of prostate cancer, lung cancer, and colon cancer, respectively [9][10][11][12]. Recently, a wealth of previous studies have reported that microRNAs (miRNAs), a class of noncoding RNAs characterized by approximately 22 nucleotides in length, can inhibit the stability and translation of messenger RNAs (mRNAs) through binding to the specific sequence of genes, which are used as potential biomarkers for different kinds of cancer. Su et al. (2018) [13] employed Weighted Gene Co-Expression Network Analysis (WGCNA) to analyze 88 patients with PDAC and 19 healthy controls and reported that five miRNAs (miR-3201, miR-890, miR-16-2-3p, miR-877, and miR-602) were defined as promising diagnostic and prognostic biomarkers for PDAC. Additionally, miR-661 is up-regulated in PDAC and simultaneously promotes the protein expression level of transcription factor 4, β-catenin, and cyclin D1, which are associated with the Wnt signaling pathway in vitro, and can be used as a prognostic marker in suspicious pancreatic lesions [14]. Wei et al. (2018) [15] demonstrated that the overexpression of miR-23b-3p decreased tumor volume or even prevented the formation of tumors by directly down-regulating annexin A2 (ANXA2). Many biomarkers and therapeutic targets play a vital role in the diagnosis and treatment of PDAC. However, no breakthrough in the conventional treatment strategy has been investigated because of the difficulty in the early diagnosis and ominous prognosis of PDAC, and there is urgent need to identify more genetic information about pivotal genes in the progression of this disease.
Several in silico bioinformatics tools are widely applied to assess gene expression levels, to screen unique genes from RNA-sequencing and next-generation sequencing data, and to explore their possible implication in the growth of different types of cancer [16][17][18]. Currently, microarrays have been widely employed to detect genetic alterations during the initiation and progression of tumorigenesis. Accumulating data on malignancy are available at the authoritative Gene Expression Omnibus (GEO) (http://www.ncbi.nlm.nih.gov/geo/) genomics data repository that includes microarray and high-throughput sequencing data [19]. In the present paper, we employed mRNA microarray data of GSE41368 [20], GSE43795 [21], and GSE55643 [22] and miRNA microarray data of GSE41369 [20] to identify the differentially expressed genes (DEGs) and differentially expressed miRNAs (DEmiRNAs) and to explore the potential therapeutic targets as well as the prognostic value of PDAC. Compared with a single expression profile, the bioinformatics analysis of the overlapping DEGs in more chips could be more reliable. In the current study, we performed a comprehensive bioinformatics analysis to identify hub overlapping DEGs, and GO terms and KEGG pathway enrichment analyses of the DEGs and DEmiRNAs were performed. A protein-protein interaction (PPI) network and a DEmiRNA target network were constructed to assess the interactions between the DEmiRNAs and hub genes. Moreover, seven significant hub genes and ten DEmiRNAs were revealed to be associated with OS in PDAC by the Kaplan-Meier (KM) plotter (KM plotter). In addition, the mRNA expression level, correlation analysis and protein expression of the hub genes were validated with the GEPIA and Human Protein Atlas (HPA) databases. The current research aimed to explore potential hub genes that may be highly correlated with the prognostic value of PDAC from a multicultural perspective with systems biology. An integrated analysis of the DEGs in PDAC will provide further insight into the mechanism of PDAC.

PDAC dataset preprocessing
The gene expression profile data (GSE41368, GSE43795, GSE55643, and GSE41369) of PDAC were downloaded from the public GEO database by searching with the following key terms: ('microRNA' OR 'miRNA' OR 'mRNA') AND ('pancreatic ductal adenocarcinoma' OR 'PDAC'). The publication time was not limited. All included chip datasets have the same characteristics, that is, the comparison of human PDAC and normal specimens, and included at least six corresponding samples as well as all the microarray data available to the public.

Data processing
GEO2R (https://www.ncbi.nlm.nih.gov/geo/geo2r/) is an interactive online tool that can compare different groups in a GEO series and can be employed to identify DEGs and DEmiRNAs. Subsequently, by default, the adjusted P-values were selected to decrease the false-positive rate using the Hochberg false discovery rate and the Benjamini method. A P-value <0.05 and an absolute log fold-change (FC) greater than 1 for the DEGs and 2 for the DEmiRNAs were used as the cut-off criteria. We also used the online tool Venny 2.1 (http://bioinfogp.cnb.csic.es/tools/venny/index.html) to identify the overlapping DEGs that were up-/down-regulated in the GSE41368, GSE43795, and GSE55643 datasets.

Functional enrichment analysis of the overlapping DEGs and DEmiRNAs
GO terms and KEGG (Kyoto Encyclopedia of Genes and Genomes) pathway enrichment analyses play a vital role in identifying characteristic biological attributes for high-throughput transcriptome data. We used Metascape (http: //metascape.org/), a gene annotation, and analysis resource [23], to perform a functional enrichment analysis, which included cellular component (CC), molecular function (MF), and biological process (BP), and a KEGG pathway analysis of the overlapping DEGs. Similarly, we also performed a functional enrichment analysis of the DEmiRNAs using FunRich, which is also primarily used for the interaction network and the functional enrichment analysis of genes and proteins [24]. Bubble diagrams and bar charts, respectively, were used to visualize the results of the GO terms and KEGG pathway enrichment analyses of the overlapping DEGs and DEmiRNAs.

Construction of the PPI network of the overlapping DEGs
STRING (https://string-db.org/) is a user-friendly online system that provides predicted and experimental interactions of proteins [25]. We used the STRING database to analyze functional interactions between the overlapping DEGs with a confidence score of ≥0.4 for significant differences. The PPI network was visualized with Cytoscape (version 3.6.0, www.cytoscape.org), and the properties of the network topology for nodes were calculated to identify hub genes with a degree >5 in the present study. The degree indicates the number of edges connected with a specific node, and nodes with a high degree were identified as hub genes (i.e., may contribute to vital biological behaviors).

Survival analysis of the DEmiRNAs and hub genes
We used the KM plotter (http://kmplot.com/analysis/) [26] to perform an OS analysis of the DEmiRNAs and hub genes in 178 and 177 patients with PDAC, respectively. The plotter endows users with the ability to separate patients divided into high and low expression groups on the basis of the gene transcriptional expression level of a given gene and create KM plots. In addition, the hazard ratio (HR) with the 95% confidence interval and the log-rank P-value were calculated and are shown on the chart, and the number-at-risk is displayed below the curves.

Construction of the miRNA-mRNA network
The genes targeted by the DEmiRNAs were predicted using miRTarBase (http://miRTarBase.mbc.nctu.edu.tw/), the experimentally validated miRNA-target interactions database [29]. Next, the regulatory network of the predicted genes and miRNAs that targeted them were visualized in Cytoscape.

Identification of the DEGs and DEmiRNAs
The characteristics of the available profiles are presented and include the PubMed ID, GEO accession number, experiment type, sample source, number of tumors and controls, platforms, corresponding author, and publication time ( Table 1). As shown in Table 1, 57 PDAC samples and 20 normal samples were applied for RNA-seq analysis, and 9 PDAC samples and 9 normal samples were employed for miRNA-seq analysis. For the GSE41369 dataset, a total of 32 DEmiRNAs (24 up-regulated and 8 down-regulated miRNAs, Figure 1A and Table 2) were extracted, and a total of 1828, 3233, and 1144 genes were extracted from the GSE41368, GSE43795, and GSE55643 datasets, respectively (Figure 1B-D); these miRNAs and genes were recognized as differentially expressed in PDAC samples compared with normal samples. Among them, 150 DEGs overlapped; 76 were up-regulated ( Figure 1E), and 74 were down-regulated ( Figure 1F). Additionally, based on the logFC value, we show the top 25 up-and down-regulated DEGs as a heat map in Figure 1G.

Enrichment analysis of the overlapping DEGs and miRNAs
The BP analysis demonstrated that the overlapping DEGs were dramatically concentrated in cellular chemical homeostasis, carboxylic acid biosynthetic process, and the positive regulation of cell migration ( Figure 2A). Regarding MF, the overlapping DEGs were significantly enriched in cell adhesion molecule binding ( Figure 2B). Regarding CC, the overlapping DEGs were significantly enriched in the extracellular matrix and the external side of the plasma membrane ( Figure 2C). Additionally, the KEGG analysis indicated that the overlapping DEGs were significantly enriched in pathways involved in cancer and the PI3K-Akt signaling pathway ( Figure 2D). To improve our understanding of the biological information of the 32 DEmiRNAs in PDAC, we also performed GO annotation and KEGG pathway analyses. Regarding BP, the DEmiRNAs were significantly enriched in signal transduction and cell communication as well as the regulation of nucleobase, nucleoside, nucleotide, and nucleic acid metabolism ( Figure 3A). Regarding MF, the DEmiRNAs were significantly enriched in transcription factor activity and GTPase activity ( Figure 3B). In addition, the most enriched GO terms in CC were nucleus, cytoplasm, Golgi apparatus, and lysosome ( Figure 3C). As shown in Figure 3D, the most enriched KEGG pathway terms for the DEmiRNAs were related to VEGF and the VEGFR signaling network, the glypican pathway, LKB1 signaling events, the ERBB receptor signaling network and the sphingosine 1-phosphate (S1P) pathway.
To explore the interactions of the overlapping DEGs, a PPI network based on the STRING database was constructed. In total, the PPI network of the overlapping DEGs comprises 83 nodes and 99 edges (Figure 4). The more characteristic properties of node degree, the more significant influence it has on maintaining the stability of the PPI network. Seven nodes were identified as hub genes with a degree > 5, namely, MYC (degree = 15), SLC2A1 (degree = 7), PKM (degree = 7), PLAU (degree = 7), PPARG/PPARγ (degree = 6), MET (degree = 6), and ITGA3 (degree = 6).

Transcriptional expression level of the hub genes and correlation analysis
As shown in Figure 7, the mRNA expression levels of seven genes were remarkably up-regulated in PDAC based on data from the GEPIA database. In addition, we used the 'HPA' to examine the protein expression levels of these hub genes and observed that the protein expression levels of the hub genes were noticeably up-regulated in tumor tissues compared with normal tissues (Figure 8). The increase in SLC2A1, PKM, PLAU, PPARG, MET, and ITGA3 was strongly associated with the increase in MYC in PDAC (Figure 9).

Discussion
MiRNAs can be used as biological markers for PCs [4,30], and PC is becoming a common malignant disease with an incidence that has continued to rise during the last few years [30,31]. Hence, specific biomarkers for accurate diagnostic tests and potential therapeutic targets for individualized therapy are yet to be fully determined in PDAC. In the current study, the mRNA and miRNA expression data obtained from the GEO database yielded an overall total of 150 genes (76 up-regulated and 74 down-regulated genes) and 32 miRNAs (24 up-regulated and 8 down-regulated miRNAs) that were differentially expressed in PDAC samples and normal tissues. The 150 DEGs were significantly enriched in cancer-related pathways. Moreover, 32 DEmiRNAs were significantly enriched in the glypican pathway. The glypicans comprise a family of glycosylphosphatidylinositol (GPI)-anchored heparan sulfate proteoglycans that has a remarkable effect on cell division and growth regulation. At present, six members (GPC1-6) of this family are recognized in vertebrates [32]. The glypican pathway, which was indicated in the KEGG pathway analysis, includes glypican 1 (GPC1, Entry: k08107), which promotes the pathogenesis of many human cancers [33]. Research shows that high levels of serum GPC1 indicate a poor prognosis in PDAC [34], suggesting that intervention of the glypican pathway may be a promising strategy for PDAC treatment. The increased expression of seven DEmiRNAs (miR-135b, miR-221, miR-21, miR-27a, miR-199b-5p, miR-143, and miR-196a) and seven hub genes (MYC, SLC2A1, PKM, PLAU, PPARG, MET, and ITGA3) were associated with worse OS; however, hsa-miR-655, hsa-miR-455-3p, and The upper x-axis represents the P-value (−log10), and the lower x-axis represents the percentage of genes (blue). The y-axis represents the GO term. Yellow represents the P-value equal to 0.05 as reference, and red represents the specific P-value. The longer the rectangular zone, the smaller the P-value is.
hsa-miR-744 had the opposite prognostic value. SLC2A1, PKM, PLAU, PPARG, MET, and ITGA3 were up-regulated in PDAC compared with normal tissues and were closely related to an increase in MYC. MiRNAs can function as competing endogenous RNAs (ceRNAs), causing the degradation and/or inhibition of mRNA translation by binding to its 3 -untranslated region (3 -UTR) [35]. However, this function is not exactly consistent with the aforementioned pattern between the DEmiRNAs and hub genes, which may also play a prominent role in tumor onset and progression in PDAC. Munding et al. (2012) [36] showed that miR-135b was highly misregulated and was preferable to miR-196a and miR-210 as an indicator for identifying PDAC. Similarly,   [37] reported that miR-135b-5p was significantly up-regulated in PC tissue compared with adjacent tissue, and the overexpression of miR-135b-5p advanced proliferation and migration by decreasing the transcriptional expression level of SFRP4 by directly targeting its 3 -UTR. Wang et al. (2016) [38] showed that miR-221 bound to the 3 -UTR of endothelial nitric oxide synthase traffic inducer (NOSTRIN) and inhibited its expression, and up-regulated miR-221 expression correlated with unsatisfactory survival in PDAC, and another study demonstrated that miR-221 is an oncogenic miRNA that contributes to Capan-2 cell proliferation by regulating the PTEN-Akt pathway [39]. In addition, miR-21 promotes epidermal growth factor (EGF)-induced proliferation, inhibits cell apoptosis, and accelerates cell cycle progression by targeting the PI3K/AKT and MAPK/ERK signaling pathways [40]; it is also associated with poor OS and disease-free survival in PDAC [41]. Frampton et al. (2015) [42] identified three miRNAs (miR-21, miR-23a, and miR-27a) with high tumor expression, and the inhibition of miR-23a, miR-21, and miR-27a had promoting effects in decreasing the proliferation of PDAC cells in culture and the growth of xenograft tumors. A fibrotic tumor-associated stroma (TAS), which is believed to aggravate the clinical biological symptoms of PDAC, and miR-199a and miR-199b, which belong to the miR-199 family that is uniquely expressed in TAS cells, support the stromal miRNA feature, suggesting that the tumor microenvironment contributes to miRNA changes, which conversely have mechanical effects on the TAS [43]. Compared with normal tissues, miR-143 was up-regulated in both PDAC and PVAC tumor samples, which may reflect the histological features and biological behavior of different PCs [44]. Accumulating evidence indicates that the serum miR-196a expression level is able to predict the median survival time of PDAC patients [45,46]. The OS analysis of the up-regulated DEmiRNAs was consistent with the results      of the present study. Epithelial-to-mesenchymal transition (EMT) has been reported to promote cancer progression. Harazono et al. (2013) [47] reported that the overexpression of miR-655, which is an EMT-suppressive miRNA that targets TGFBR2 and ZEB1, not only induced the up-regulation of E-cadherin and the downregulation of typical EMT inducers but also inhibited the invasion and migration of mesenchymal-like cancer cells. Furthermore, the upregulation of miR-655 inhibits cell invasion in esophageal squamous cell carcinoma [48], triple-negative breast cancer [49], and hepatocellular carcinoma [50]. MiRNA-455 (miR-455) is recognized as a broadly conserved noncoding RNA that has been implicated in various cancers. The function of miR-455 has been confirmed in prostate cancer [51], gastric cancer [52], and oral squamous cancer cells [53]. However, the potential effect of miR-455-3p has rarely been explored in PDAC. In the current study, the survival analysis demonstrated that the up-regulated expression of miR-455-3p in PDAC patients was associated with prolonged OS. miR-744 was found to be significantly up-regulated in PC and increased tumorigenicity by inhibiting multiple negative regulators of the Wnt/β-catenin pathway [54]. Furthermore, a high level of plasma miR-744 contributed to the poor progression-free survival of nonoperable PC patients [55]. Interestingly, contradictory results regarding the survival analysis in this study demonstrate that further investigation is needed to confirm the effect of miR-744.
MYC acts as a carcinogenic transcription factor that regulates at least 15% of genes involved in various cellular processes, including the differentiation, cell proliferation, apoptosis, and metabolism of PC [56,57]. Additionally, the KRAS/ERK/c-Myc axis is the major driving factor of tumorigenesis in PC. In normal cells, MYC is a transient protein that is present from 20 to 30 min and is uncontrollable in cancers. Hence, promoting the degradation of MYC is a promising treatment strategy for PDAC [57]. Metabolic deregulation is a hallmark of human cancers. SLC2A1, also named GLUT1, transports glucose and its analogs into cells. A previous study indicated that the up-regulation of SLC2A1 accelerated tumor cell proliferation and metastasis [58]. Pyruvate kinase muscle isozymes (PKMs) play an important role in adjusting metabolic changes during carcinogenesis. Cheng et al. (2018) [59] revealed that the increased expression of PKM1 and PKM2 affected cell invasion and migration in PDAC cells. Lu et al. (2018) [60] analyzed the related microarray datasets (GSE15471, GSE62452, GSE102238, GSE62165, and GSE16515) and identified the top 21 genes with high connectivity degrees; among them, PLAU is a plasminogen activator, ITGA3 plays a vital function in EMT, and the high expression of PLAU and ITGA3 is related to poor survival, consistent with the current research. The peroxisome proliferator activated receptors (PPARs), a member of the NR1 (thyroid-like) subfamily of NR, comprise PPARα (NR1C1), PPARβ/δ (NR1C2), and PPARγ (NR1C3). PPARγ, which is expressed in primary PDAC and has proven to be of clinical value when used as an independent prognostic factor, is closely related to a high clinical stage and has a close association with short OS [61]. MET, the receptor of hepatocyte growth factor, was identified as a cancer stem cell marker in PDAC. Tomihara et al. (2017) [62] also revealed that patients with high Met expression experienced a markedly shorter survival time than those with low Met expression, consistent with the current research. Altogether, these data support that the DEmiRNAs and hub genes shed light on the clinical utility of prognostic values in PDAC.
The combination of a multichip joint analysis and a bioinformatics analysis has a notable advantage in recognizing and confirming possible biomarkers and targets for malignant tumors, although there are some limitations to the current study. First, the number of samples in each microarray dataset was relatively small, which may have resulted in a few false-positive results, and second, these results were not validated; therefore, further mechanistic studies of larger samples are still needed in the future.
Taken together, the results of the bioinformatics analysis of four GEO microarray datasets of PC indicated that cancer-related pathways (KEGG Entry: k05200) and the glypican pathway (KEGG Entry: k08107) probably participate in the onset and development of PDAC. The high expression of miR-135b, miR-221, miR-21, miR-27a, miR-199b-5p, miR-143, miR-196a, MYC, SLC2A1, PKM, PLAU, PPARG, and ITGA3, as well as the low expression of miR-655, miR-455-3p, and miR-744, were observably related to unsatisfactory survival effects in patients with PDAC. Bioinformatics analyses revealed that different miRNAs exhibited diverse potential functions that were associated with the DEGs. However, further studies need to be implemented to explore the molecular mechanisms