Identification of crucial aberrantly methylated and differentially expressed genes related to cervical cancer using an integrated bioinformatic analysis.

Abstract Methylation functions in the pathogenesis of cervical cancer. In the present study, we applied an integrated bioinformatics analysis to identify the aberrantly methylated and differentially expressed genes (DEGS), and their related pathways in cervical cancer. Data of gene expression microarrays (GSE9750) and gene methylation microarrays (GSE46306) were gained from Gene Expression Omnibus (GEO) databases. Hub genes were identified by ‘limma’ packages and Venn diagram tool. Functional analysis was conducted by FunRich. Search Tool for the Retrieval of Interacting Genes Database (STRING) was used to analyze protein–protein interaction (PPI) information. Gene Expression Profiling Interactive Analysis (GEPIA), immunohistochemistry staining, and ROC curve analysis were conducted for validation. Gene Set Enrichment Analysis (GSEA) was also performed to identify potential functions.We retrieved two upregulated-hypomethylated oncogenes and eight downregulated-hypermethylated tumor suppressor genes (TSGs) for functional analysis. Hypomethylated and highly expressed genes (Hypo-HGs) were significantly enriched in cell cycle and autophagy, and hypermethylated and lowly expressed genes (Hyper-LGs) in estrogen receptor pathway and Wnt/β-catenin signaling pathway. Estrogen receptor 1 (ESR1), Erythrocyte membrane protein band 4.1 like 3 (EPB41L3), Endothelin receptor B (EDNRB), Inhibitor of DNA binding 4 (ID4) and placenta-specific 8 (PLAC8) were hub genes. Kaplan–Meier method was used to evaluate survival data of each identified gene. Lower expression levels of ESR1 and EPB41L3 were correlated with a shorter survival time. GSEA results showed that ‘cell adhesion molecules’ was the most enriched item. This research inferred the candidate genes and pathways that might be used in the diagnosis, treatment, and prognosis of cervical cancer.


Introduction
Cervical cancer is the most prevalent reproductive malignancy, with an estimated 569847 new cases and 311365 deaths worldwide in 2018. Although its prognosis has been improved by the combination of screening and surgery, the disease still brings the fourth highest morbidity and mortality among females worldwide [4].
Human papillomavirus (HPV) is one player in cervical cancer, however, not all the infected patients develop the cancer ultimately, indicating that HPV infection is a necessary but not sufficient pathogenic condition [42]. Additionally, genetic and epigenetic alternations may also be involved [30,40].
Epigenetics focuses on heritable change in gene expression not mediated by changes within DNA sequence [3]. Altered methylation in DNA bases, either hypomethylation or hypermethylation, is a key event in carcinogenesis [17]. As cervical cancer progresses, epigenetic alteration rises to a level high enough to change gene expression. Epigenetic regulatory mechanisms in cervical cancer include DNA methylation, the hottest topic in this area, and post-translational modifications of histone proteins [10].
Based on high-throughput platforms, microarrays have emerged to screen genetic or epigenetic alternations in cancers in recent years [20]. Some differentially expressed genes (DEGs) in cervical intraepithelial neoplasia have been identified with this tool [46]. In addition, to explore differentially methylated genes (DMGs), aberrant methylation analysis has been applied [43]. However, these studies did not carry out a conjoint analysis that may result in insufficient power to discover key genes and pathways related to multiple cellular process and biological function. Differentially expressed and methylated genes can be detected using gene expression and methylation microarray data.
In our study, data from gene expression microarrays (GSE9750), gene methylation microarrays (GSE46306), as well as the expression profiles of oncogenes and tumor suppressor genes (TSGs) were integrated and analyzed by a series of bioinformatics tools. We used R software to obtain DEGs and DMGs and Venn diagram to overlap three datasets. Analyses also included gene-enrichment analysis (GO) and pathways (KEGG), protein-protein interaction (PPI) network analysis, and the identification and validation of oncogenes and TSGs. In our study, we aimed to screen out the aberrantly methylated and differentially expressed genes and associated pathways in cervical carcinogenesis. Some oncogenes or TSGs might be applied as biomarkers or therapeutic targets for cervical cancer.

PPI network construction
Based on the aforementioned oncogenes and TSGs, we built a PPI network with the Search Tool for the Retrieval of Interacting Genes (STRING) database [35] (https://string-db.org/cgi/input.pl).

Pathway analysis of aberrantly methylated DEGs
Gene ontology (GO) function and Kyoto Encyclopedia of Genes and Genomes (KEGG) analyses were conducted for our selected genes with an online tool FunRich [26] (http://www.funrich.org/). P<0.05 was considered as statistically significant.

Validation of the selected genes
To further validate the expression profiles of oncogenes and TSGs, online software Gene Expression Profiling Interactive Analysis (GEPIA) [36] (http://gepia.cancer-pku.cn/) was used. For validation of the aberrantly methylated differentiated expressed genes, Illumina Human Methylation 450 platform in Cancer Genome Atlas (TCGA) database was assessed. To validate the selected oncogenes/TSGs on the translational level, immunohistochemistry staining results of both normal and cancerous cervical samples were assessed using the Human Protein Atlas database (https://www.proteinatlas.org/). ROC curve analysis was conducted to distinguish the normal from the cancerous tissues. We used cBioPortal tool (cBio Cancer Genomics Portal, http://www.cbioportal.org/) to find out genetic alterations in the hub genes and correlation between messenger RNA (mRNA) expression and DNA methylation in cervical cancer.

Gene Set Enrichment Analysis
In TCGA, 306 cervical cancer samples were divided into two groups in the light of the median expression level. In order to identify the five hub genes' underlying functions and the enrichment of biological processes (BPs) in the ranked DEGs between the two groups, Gene Set Enrichment Analysis (GSEA) (http://software.broadinstitute.org/ gsea/index.jsp) was performed [33]. Collection of annotated gene set of c2.cp.kegg.v6.0.symbols.gmt in Molecular Signatures Database (MSigDB, http://software.broadinstitute.org/gsea/msigdb/index.jsp) was selected as the reference gene sets. Additionally, we used 'Clusterprofiler' package in R to handle the datasets, and 'Enrichplot' package to tease out the enriched pathways of the crucial genes. The adj.p<0.05 was set as the cut-off criterion.

DEGs and DMGs in cervical cancer
The expression matrices from GSE9750 contained 33 cervical cancer samples and 24 normal samples. The DEGs were shown with a volcano plot and a clustering heat map ( Figure 1A,B). We obtained 1313 DEGs, including 690 up-regulated genes and 623 down-regulated genes. A total of 1405 DMGs were obtained in GSE46306, as shown in the heat map ( Figure 1C). As a result, 481 hypomethylated and 924 hypermethylated genes were also obtained.

Aberrantly methylated DEGs
We got 32 hypomethylated and highly expressed genes (Hypo-HGs) and 44 Hyper-LGs. To pinpoint the aberrantly methylated DEGs, Hypo-HGs were overlapped with oncogenes, and Hyper-LGs with TSGs. Correspondingly, we identified two up-regulated hypomethylated oncogenes (Figure 2A), which may trigger tumorigenesis potentially by up-regulating gene expression after hypomethylation. Meanwhile, we retrieved eight down-regulated hypermethylated TSG ( Figure 2B), indicating aberrant hypermethylation may trigger tumorigenesis through lowering the expression of these genes.

PPI network construction
STRING online database was employed to construct PPI networks. Based on the Hypo-HGs, a PPI network containing 31 nodes and 60 edges was constructed, with a PPI enrichment P-value of 5.77e-09 (Supplementary Figure S1A). Based on the Hyper-LGs, a PPI network containing 44 nodes and 69 edges was constructed, with a PPI enrichment P-value of 1.79e-07 (Supplementary Figure S1B). Those genes were used for further functional analyses.

Pathway analysis of aberrantly methylated DEGs
There were biological process (BP), cellular component (CC), and molecular function (MF) in enriched GO terms. GO analysis showed that pyroptosis, positive regulation of interleukin-1 and positive regulation of nuclease activity for BP production were significantly enriched in Hypo-HGs ( Figure 3A). For CC analysis, AIM2 inflammasome complex, microtubule plus-end and NLRP3 inflammasome complex were significantly enriched in Hypo-HGs ( Figure  3B). For MF analysis, Hypo-HGs were significantly enriched in solute proton symporter activity, DNA insertion or deletion binding and phosphorylase activity ( Figure 3C). KEGG analysis indicated that up-hypomethylated genes mainly contributed to cell cycle, senescence, and autophagy in cancer ( Figure 3D).
In comparison, Hyper-LGs were mainly associated with (i) vascular smooth muscle contraction, negative regulation of blood vessel diameter, and regulation of gene expression (BP analysis, Figure 3E); (ii) axolemma, juxtaparanode region of axon, and paranode region of axon (CC analysis, Figure 3F); (iii) acly-CoA oxidase activity, BMP receptor activity, and fatty acid transporter activity (MF analysis, Figure 3G). These genes were mainly enriched in estrogen receptor pathway, estrogen signaling pathway and regulation of Wnt/β-catenin signaling by small molecules (KEGG analysis, Figure 3H).

Validation of the selected genes
We employed data from GEPIA to validate the role of the differentially expressed and methylated genes in carcinogenesis. Two upregulated-hypomethylated oncogenes, along with eight downregulated-hypermethylated TSGs, showed differential expression between normal tissues and cancer tissues (Figure 4). In addition, based on TCGA CESC data, we identified five DMGs: ESR1, EPB41L3, EDNRB, ID4, and PLAC8 ( Figure 5). Additionally, the immunohistochemistry staining images from Human Protein Atlas database indicated the up-regulation of PLAC8 gene and the down-regulation of ESR1 and EDNRB genes. Besides, the expression of EPB41L3 showed no significant difference between normal tissues and tumor tissues, and immunohistochemistry staining of ID4 was not obtained in Human Protein Atlas database ( Figure 6). Moreover, ESR1, EPB41L3, and ID4 all showed an AUC of more than 0.7 and the combined diagnostic power of the five genes was 0.987 ( Figure 7A), which were enough to distinguish cervical cancer    tissues from the normal tissues.
Furthermore, Human Protein Atlas database was applied to gain survival time and gene expression levels, to assess the prognostic significance of the five aberrantly methylated and differentially expressed genes. Kaplan-Meier method was used to evaluate survival time data of each identified gene. Analysis demonstrated lower expression levels of ESR1 and EPB41L3 were correlated with shorter survival time ( Figure 7B).

GSEA
To explore the functions of five genes in cervical cancer, we conducted GSEA to search pathways enriched in the TCGA samples. Ten enriched pathways in the gene sets (n=306) included cell adhesion molecules (CAMs), Wnt signaling pathway, axon guidance, calcium signaling pathway, neuroactive ligand receptor interaction, arrhythmogenic right ventricular cardiomyopathy ARVC, hypertrophic cardiomyopathy HCM, vascular smooth muscle contraction, dilated cardiomyopathy, and ECM receptor interaction (adj.P<0.05) ( Figure 7C,D).

Genetic information of the five genes
The genetic alteration in the five genes was analyzed with cBioPortal software. The network constructed by ESR1 and EDNRB and their 50 most associated neighbor genes were exhibited (only two of the five genes had a joint node, while the remaining three genes had no junctions and were not shown). In addition, drugs targeting the five genes were shown, only ESR1 and EDNRB were taken for drug targets currently ( Figure 8A). Figure 8B,C illustrated that the five genes were changed in 82 (27%) of the 308 cases/patients (310 in total); EDNRB, EPB41L3, and ID4 showed most diverse alterations, including amplification, missense mutation etc. The relevance between mRNA and DNA methylation of the five genes in the TCGA CESC patients was demonstrated in Figure 8D. We found that methylation negatively regulated the mRNA expression of the five genes.

Dicussion
The development of cervical cancer was a complicated process. Elucidating the underlying molecular mechanisms of oncogenesis would extremely benefit the diagnosis, treatment and prognosis evaluation of cervical cancer. Database-based bioinformatics analysis is increasingly adopted to screen out targeted biological molecules that might have a guiding role in the diagnosis and treatment of tumors [20].
In the present study, we utilized NCBI-GEO to analyze the data of gene expression (GSE9750) and gene methylation (GSE46306) microarrays for cervical cancer. We used the R software to screen genes which were differentially expressed.
In our research, we identified two up-regulated hypomethylated oncogenes and eight down-regulated hypermethylated TSGs in cervical cancer. To predict the protein interaction among these genes, PPI network was constructed using the online STRING database. Furthermore, based on TCGA CESC data, we obtained five DMGs. Functional and enrichment analyses demonstrated the pathways and hub genes regulated by aberrant methylation. Meanwhile, the Kaplan-Meier analysis illustrated that lower levels of EPB41L3 and ESR1 were relevant to shorter survival time of CESC patients.
As was implied by FunRich analysis, Hypo-HGs were significantly associated with cell cycle and autophagy. Increasing evidence indicates that aberrant cell cycle plays a crucial role in the development of cancer [45], due to the dysregulation of TSGs or oncogenes, such as p53, p16, and cyclin D1, etc. [31]. Cyclin D1 is essential for cell cycle G 1 [2]. Cyclin D1 aberrantly expressed is implicated in the progression of cervical cancer [1]. Cell cycle could participate significantly in the tumorigenesis of cervical cancer due to cell cycle regulators. Studies have explored the dual effects of autophagy in cancer. How autophagy functions in tumor suppression has been explained, like cell death, senescence, and metabolic stress [18]. Autophagy facilitates neoplasm growth by supporting cellular metabolism. Inhibiting autophagy has become a rationale in cancer therapy development [19].
For Hyper-LGs in cervical cancer, estrogen receptor pathway and Wnt/β-catenin signaling pathway were enriched. Multi-parity and oral contraceptives usages, particular long-term usage, are powerful risks of cervical cancer in HPV-infected women, raising the potential that estrogen and HPV infection might exert synergistic effect to cervical cancer development [28]. Previous studies have discovered that loss of exogenous estrogen, cervical disease progressed slowly and pre-existing neoplasia regressed partially in HPV transgenic mice. In addition, treated with estrogen, ERα-deficient HPV transgenic mice did not show faster development of cervical cancer [8]. Therefore, estrogen and ERα might play a key part in the tumorigenesis of cervical cancer. The Wnt/β-catenin signaling pathway, through activating β-catenin to initiate its downstream target genes transcription, could promote tumorigenesis, invasion, and metastasis [44]. Previous study found that deprivation of Wnt2 inhibited SiHa cell migration and invasion, as well as reversed EMT by restraining theWnt2/β-catenin pathway. Moreover, overexpression of Wnt2 led to metastasis in cervical cancer, which was associated with activation of β-catenin and induction of EMT [48]. In conclusion, Wnt/β-catenin signaling pathway aberrantly activated could enhance cervical cancer metastasis and invasion.
GEPIA was employed to validate the role of the ten selected genes in carcinogenesis. Two up-regulated hypomethylated oncogenes and eight down-regulated hypermethylated TSGs showed expression difference between cancer samples and normal samples. We next accessed TCGA CESC data to validate the performance of five DMGs, including ESR1, EPB41L3, EDNRB, ID4, and PLAC8. The results of boxplots based on TCGA database were mostly consistent with the data of GEO analysis. We further used immunohistochemistry staining to verify the deregulation of the gene expression. Last, we performed ROC curve analysis to assess the capacity of five genes to distinguish cervical cancer from the normal tissues.
These five hub genes are involved in tumor progression. PLAC8 is a cysteine-rich protein highly expressed in giant trophoblasts and spongiotrophoblast layer in placenta [11]. In a recent study, PLAC8 could facilitate the EMT in nasopharyngeal carcinoma [15]. In addition, high expression of PLAC8 in clear cell renal carcinoma was correlated to malignant progression and poor prognosis. [32]. In our study, we found that the expression of PLAC8 was increased by hypomethylation, demonstrating that PLAC8 may function as an initiator of cervical cancer proliferation.
EPB41L3 is a TSG which facilitates tumor cell apoptosis and inhibits its proliferation [21]. Overexpressed of EPB41L3 could enhance the migration and invasion of multiple cancers, such as esophageal cancer [47], lung cancer [22], and hepatocellular carcinoma [49]. Methylation of EPB41L3 was lower than or equal to that in CIN1 group, but rose significantly till to the peak in CIN2/3 group [41]. Our study showed that lower expression levels of EPB41L3 was related to shorter survival time. As a tumor suppressor, ID4 could promote the proliferation and inhibit the apoptosis of tumor cells in prostate [23], lung [5], and gastric [6] cancers. Expression of ID4 is lowered by its promoter hypermethylation [25]. In the present study, EPB41L3 and ID4 had a possibility of 7% mutation in cervical cancer, so we speculated that the mutation led to the aberrant methylation or deregulation of both genes. Furthermore, hypermethylation of EPB41L3 and ID4 may suppress the development of cervical cancer.
ESR1, a nuclear hormone receptor and oncoprotein, is expressed in approximately 70% of breast cancers [34]. ESR1 may serve to develop a hormonal therapy for breast cancer, since ESR1 gene mutating is related to acquired endocrine resistance in patients with ER-positive metastatic breast cancer [27]. In the present study, lower expression level of ESR1 was associated with shorter survival time. Previous research reported that EDNRB was methylated in prostate adenocarcinoma [24]. The methylation of EDNRB gene promoter was associated with gastric cancer invasion [37]. In our study, we found that ESR1 and EDNRB expression levels were lower in cervical cancer samples than in the normal samples. Aberrant methylation may contribute to these differences. The rate of ESR1 and EDNRB mutation was 6 and 7% in cervical cancer. Therefore, we speculated that the mutation resulted in the aberrant methylation or down-regulation of ESR1and EDNRB. Our study implied that ESR1 and EDNRB may serve as targets in the creation of anti-tumor drugs.
GSEA was conducted to identify the functions of the five genes involved in cervical cancer. CAMs were enriched in TCGA samples. CAMs participate in various BPs, like differentiation, growth and apoptosis, and facilitate cellular interaction and migration [9,13,14]. In highly aggressive tumors, CAMs might show decreased immunoexpression [16]. Cervical cancers and their precursor lesions demonstrated the absence or alteration in cellular adherence [39]. During the development of cervical lesions, CAMs expression is altered either in location (cytoplasmic or membrane localization) or in quantity. E-cadherin is a number of CAM family that is aberrantly expressed in cervical lesions and leads to the invasion and metastasis of this neoplasm [9]. Taken together, low expression of CAMs could facilitate the metastasis of cervical cancer.
In conclusion, we conducted integrated bioinformatic analysis to identify aberrantly methylated and differentially expressed oncogenes and TSGs in cervical cancer tissues, as well as the related pathways and functions. Ten genes validated with the TCGA database, including five DMGs (ESR1, EPB41L3, EDNRB,ID4, PLAC), might be used as biomarkers and therapeutic targets in the precise diagnosis and treatment of cervical cancer.
The limitations in our study are as follows. First, since our research is a data analysis, biological experiments are in urgent need to validate our results. Second, it is necessary to explore the molecular mechanism in detail. In the future, we will deepen our research with well-designed experiments (including PCR, Western blot, BSP and COBRA).

Data Availability
The datasets used or analyzed during the current study are available from the GEO (https://www.ncbi.nlm.nih.gov) and Cancer Genome Atlas database (https://cancergenome.nih.gov).