An integrative bioinformatics analysis of microarray data for identifying hub genes as diagnostic biomarkers of preeclampsia

Preeclampsia (PE) is a disorder of pregnancy that is characterised by hypertension and a significant amount of proteinuria beginning after 20 weeks of pregnancy. It is closely associated with high maternal morbidity, mortality, maternal organ dysfunction or foetal growth restriction. Therefore, it is necessary to identify early and novel diagnostic biomarkers of PE. In the present study, we performed a multi-step integrative bioinformatics analysis of microarray data for identifying hub genes as diagnostic biomarkers of PE. With the help of gene expression profiles of the Gene Expression Omnibus (GEO) dataset GSE60438, a total of 268 dysregulated genes were identified including 131 up- and 137 down-regulated differentially expressed genes (DEGs). Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment analyses of DEGs suggested that DEGs were significantly enriched in disease-related biological processes (BPs) such as hormone activity, immune response, steroid hormone biosynthesis, metabolic pathways, and other signalling pathways. Using the STRING database, we established a protein–protein interaction (PPI) network based on the above DEGs. Module analysis and identification of hub genes were performed to screen a total of 17 significant hub genes. The support vector machines (SVMs) model was used to predict the potential application of biomarkers in PE diagnosis with an area under the receiver operating characteristic (ROC) curve (AUC) of 0.958 in the training set and 0.834 in the test set, suggesting that this risk classifier has good discrimination between PE patients and control samples. Our results demonstrated that these 17 differentially expressed hub genes can be used as potential biomarkers for diagnosis of PE.


Introduction
Preeclampsia (PE) is a disorder of pregnancy that is characterised by hypertension and a significant amount of proteinuria [1] beginning after 20 weeks of pregnancy [2]. It is closely associated with high maternal morbidity, mortality, maternal organ dysfunction or foetal growth restriction [3,4]. It is estimated that the incidence rate of PE is between 3 and 10% of all pregnancies [5]. There are various risk factors for PE such as obesity, prior hypertension, old age and gestational diabetes. Additionally, oxidative stress, immunologic intolerance and angiogenic imbalance are considered as the main causes [6,7]. Therefore, it is necessary to identify early, novel diagnostic biomarkers of PE.
Recently, with the help of gene microarray and RNA-sequencing technology, as well as public databases including Gene Expression Omnibus (GEO), gene expression studies related to human diseases have been reported. However, there have been no reports describing the expression profiles in PE. For example, using microarray, the gene expression profiles of placental tissue from five controls and five PE samples were analysed [8]. In this study, a total of 224 transcripts were significantly differentially expressed. Functional enrichment analysis suggested that the top three significantly enriched functional groups are immune response, inflammatory response and chemotaxis. Another study identified 38840 CpG sites with significant DNA methylation alterations in early-onset PE (EOPET) [9]. These results may be useful for DNA methylation-based non-invasive prenatal diagnosis. In addition, Meng et al. [10] used DNA microarrays to find 939 differentially expressed genes (DEGs) between PE and normal pregnancies. They found Notch, Wnt, NF-κB and transforming growth factor-β (TGF-β) signalling pathways were aberrantly regulated in PE. Long non-coding RNAs (lncRNAs) were also reported to play important roles in the pathogenesis of PE. He et al. [11] described the lncRNA profiles in six PE placentas and five normal pregnancy placentas using microarray. They found 738 lncRNAs that were differentially expressed.
However, most of these studies were only focussed on determining DEGs between PE and controls samples. The value of early diagnostic biomarkers needs to be explored and to be further investigated.
In the present study, we have performed a multi-step integrative bioinformatics analysis of microarray data for identifying hub genes as diagnostic biomarkers of PE. With the help of gene expression profiles of the GEO dataset GSE60438, a total of 268 dysregulated genes were identified including 131 up-and 137 down-regulated DEGs. Functional and pathway enrichment analyses of DEGs suggested that these DEGs were significantly enriched in disease-related biological processes (BPs) such as hormone activity, immune response, steroid hormone biosynthesis, metabolic pathways and other signalling pathways. Using the STRING database, we established a protein-protein interaction (PPI) network based on the above DEGs. Module analysis and identification of hub genes was performed to screen a total of 17 significant hub genes. The support vector machine (SVM) model was used to predict the potential application of biomarkers in PE diagnosis with area under the receiver operating characteristic (ROC) curve (AUC) of 0.958 in the training set and 0.834 in the test set. Our results demonstrated that these 17 differentially expressed hub genes can be used as potential biomarkers for the diagnosis of PE.

Selection of the GEO dataset and data processing
The gene expression microarray dataset GSE60438 [12] was downloaded from the GEO database (http://www.ncbi. nlm.nih.gov/geo/). The GSE60438 series (GPL10558 platform, Illumina HumanHT-12 V4.0 expression beadchip) contained a total of 77 samples (35 PE and 42 normotensive controls). First, the probe symbols were converted into gene symbols. When multiple probes corresponded to one specific gene, the median expression was considered as its final expression level.

Screening of DEGs between PE and controls
To screen DEGs between PE and controls samples, the limma package [13] in R was used with the cut-off criteria of fold-change > 1.2 and P-value <0.05.

Functional and pathway enrichment analyses of DEGs
To explore the biological characteristics of these DEGs, we performed Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment analyses. GO enrichment analysis was performed on DAVID (version 6.8) (https://david.ncifcrf.gov/) [14]. Significant results for molecular function (MF), BP and cellular component (CC) were determined with a P-value <0.05. KEGG pathway enrichment analysis was performed by KOBAS (version 3.0) (http://kobas.cbi.pku.edu.cn/) [15], a web server was used for identification of enriched pathways. A P-value <0.05 was considered statistically significant.

PPI network construction and module analysis
To establish the PPI network, the STRING (version 10.5) (http://string-db.org/) [16] online database was used. The parameter was set as medium confidence >0.4. To visualise the PPI network, Cytoscape (version 3.6.1) software (http://www.cytoscape.org/) was used to draw their interactions. The MCODE plug-in of the Cytoscape software was used to explore significant modules (the parameters were set to default).

Identification of hub genes from the PPI network
To select hub genes from the PPI network, the plug-in cytoHubba [17] of the Cytoscape software was used, which was computed by two methods, Maximum Neighbourhood Component (MNC) and Maximal Clique Centrality (MCC).

Prediction and evaluation of diagnostic biomarkers of PE using the SVM model
To explore diagnostic biomarkers of PE, we used the above hub genes as candidates to find their diagnostic value based on SVMs [18]. The kernlab package in R was used for SVM analysis. In brief, half of the samples (PE = 18, controls = 20) were randomly distributed as the training set, which was used to build a model. The remaining data were used as the test set. An ROC curve analysis was applied to evaluate the specificity and sensitivity of the SVM prediction model. The AUC was computed to estimate the diagnostic accuracy of the classifier.

Placental tissue of PE
The placental tissues were obtained from five patients undergoing PE from the Department of Gynaecology and Obstetrics in People's Hospital of Baoan District. The normal placental tissues were obtained from five patients without PE. And the informed consents were obtained from all patients. The study was approved by the Human Ethics Committee of People's Hospital of Baoan District.

RNA extraction and quantitative real-time PCR
The total RNA from PE patients and normal people was extracted by TRIzol (Invitrogen) according to protocol. The concentration and purity of RNA was examined using NanoDrop2000 spectrophotometer (NanoDrop Technologies, Wilmington, DE, U.S.A.). One microgram RNA was reverse transcribed using reverse transcription cDNA kit (Thermo Fisher Scientific, Waltham, U.S.A.) for synthesis of cDNA (42 • C for 60 min, 70 • C for 5 min, 4 • C preservation). SYBR Green PCR Master Mix (Roche, Basel, Switzerland) was used to conduct qPCR experiment using Opticon RT-PCR Detection System (ABI 7500, Life Technology, U.S.A.). The PCR cycle was as follows: pretreatment at 95 • C for 10 min; followed by 40 cycles of 94 • C for 15 s, 60 • C for 1 min, finally at 60 • C for 1 min and at 4 • C for preservation. Comparative cycle threshold ( C t ) was employed to analyse the expression of mRNA. β-actin expression was used for normalisation. The primer sequences followed are as shown in Supplementary Table S6.

Identification of DEGs between PE and control samples
In the present study, we performed an integrative bioinformatics analysis of microarray data for identifying hub genes as diagnostic biomarkers of PE ( Figure 1). First, we downloaded the gene expression profiles of GSE60438, which  Figure 2A. After data processing, significant DEGs were screened using the limma package in R with a total of 268 dysregulated genes. Among them, 131 DEGs were up-regulated while 137 were down-regulated. The top ten up-and down-regulated DEGs in PE are shown in Table 1. The volcano plot and expression heatmap of DEGs are shown in Figure 2B,C.

Integrative bioinformatics analysis of DEGs
To explore the functional roles of the above DEGs, GO enrichment analysis of up-and down-regulated DEGs was first performed, respectively ( Figure 3A,B and Supplementary Table S1). From the enrichment results of the CC category, we found that up-regulated DEGs were significantly enriched in the extracellular region, extracellular space and extracellular exosomes. In addition, DEGs associated with female pregnancy, androgen metabolic process, and negative regulation of glucagon secretion were enriched in the BP category. For the MF category, up-regulated DEGs were mainly enriched in hormone activity, TGF-β receptor binding and steroid hydroxylase activity. Moreover, down-regulated DEGs were enriched in MHC class II protein complex, clathrin-coated endocytic vesicle membrane and integral component of the lumenal side of endoplasmic reticulum membrane in the CC category. In the BP category, DEGs were mostly enriched in immune response, interferon-γ-mediated signalling pathway, and antigen processing and presentation of peptide or polysaccharide antigen via MHC class II. For the MF category, DEGs were enriched in MHC class II receptor activity, peptide antigen binding and receptor binding.
In addition, using the KOBAS analysis tool, the up-regulated DEGs were significantly enriched in ovarian steroidogenesis, steroid hormone biosynthesis and metabolic pathways ( Figure 3C). However, the down-regulated DEGs were enriched in rheumatoid arthritis, Staphylococcus aureus infection and leishmaniasis ( Figure 3D). The above results indicated that these DEGs are significantly enriched in disease-related BPs (Supplementary Table S2).

Construction of the PPI network and module analysis
To explore the interactions between the above DEGs of PE, we performed PPI network analysis using the STRING database. The PPI network was constructed with 149 nodes and 358 edges including 67 up-regulated and 82 down-regulated genes ( Figure 4A). Then, by using MCODE, we identified three key modules from the whole network ( Figure 4B,C,D). DEGs in module 1 were significantly enriched in haematopoietic cell lineage, inflammatory bowel disease and Influenza A. However, DEGs of module 2 were not enriched in any pathways. DEGs in module 3 were significantly enriched in legionellosis, pathogenic Escherichia coli infection and pertussis (Supplementary Table S3).
To screen hub genes from the whole PPI network, we used the plug-in cytoHubba of the Cytoscape software, computed by MNC and MCC methods. We selected top 20 hub genes based on above two methods. Then, a total of 17 mutual hub genes were identified (Supplementary Table S4). They were IL7R, IL18, CCL2, HLA-DRA, CD247, ITK, CD2, IRF8, CD48, GZMK, CCR7, HLA-DPA1, LEP, IL1B, CD8A, CD3D and GZMA. Using KEGG pathway enrichment analysis, we found that above hub genes were significantly enriched in immune system, including Th17 cell differentiation, cytokine-cytokine receptor interaction, T-cell receptor signalling pathway and Primary immunodeficiency (Supplementary Figure S1 and Supplementary Table S5).

The value of diagnostic biomarkers using the SVM model for PE
To test the diagnostic value of these 17 hub genes of PE, we first performed the gene expression levels for all samples by heatmap. As shown in Figure 5A, there were some differentially expressed patterns between the PE and control samples. Then, we examined the diagnostic value of these 17 hub genes in PE using the SVM model. A risk classifier was developed based on gene expression levels, and the results suggested that the hub genes risk classifier had good discrimination between the PE and control samples with an AUC of 0.958 with the sensitivity of 95.0% and specificity of 66.7% ( Figure 5B).
Then we validated the predictive value of these biomarkers in the PE test set. The ROC curves revealed that the sensitivity was 77.3%, the specificity was 76.5% and AUC was 0.834 for the test set ( Figure 5C). These results suggested that the 17 differentially expressed hub genes can be used as potential biomarkers for the diagnosis of PE.  Figure S2, compared with normal tissues, IL17R, CD8A, CD3D, CD48 and CCL2 mRNA expression levels were significantly down-regulated in the placenta of patients with PE, and LEP mRNA level was up-regulated, which was consistent with the results of bioinformatics analysis.

Discussion
In the present study, we performed a multi-step integrative bioinformatics analysis of microarray data for identifying hub genes as diagnostic biomarkers of PE. With the help of gene expression profiles of the GSE60438 dataset, a total of 268 dysregulated genes were identified including 131 up-and 137 down-regulated DEGs. GO and KEGG pathway enrichment analyses of DEGs suggested that they were significantly enriched in disease-related BPs. PPI network construction, module analysis and identification of hub genes were performed to screen a total of 17 significant hub genes. The SVM model was used to predict the potential application of biomarkers in PE diagnosis with the AUC of 0.958 in the training set and 0.834 in the test set, suggesting that this risk classifier has good discrimination between PE patients and controls samples. Our results demonstrated that these 17 differentially expressed hub genes can be used as potential biomarkers for the diagnosis of PE.
Recently, there have been several gene expression profile studies about PE. Using the gene expression profiles of placental tissue from five controls and five PE patients, a total of 224 transcripts were found to be significantly differentially expressed [8]. In this study, they identified 91 genes that were up-regulated and 133 that were down-regulated in PE placentas compared with control placentas. Immune response, inflammatory response and chemotaxis were the top three significantly enriched functional groups. Using DNA microarrays, the gene expression profile of normal pregnancies (n=6) and patients with PE (n=6) was assessed [10]. A total of 939 genes including 483 up-and 456 down-regulated were identified that differed significantly in expression. IPA analysis revealed that MFs of these genes are involved in cellular function and maintenance, cellular development, cell signalling and lipid metabolism. Pathway analysis suggested that Notch, Wnt, NF-κB and TGF-β signalling pathways were aberrantly regulated in PE. He et al. [11] used six PE placentas and five normal pregnancy placentas to identify 738 lncRNAs that were differentially expressed (≥1.5-fold-change) among PE placentas compared with controls. They found that misregulation of LOC391533, LOC284100 and CEACAMP8 might contribute to the mechanism underlying PE. In another lncRNA study, Luo and Li [19] indicated that NR 027457 and AF085938 were significantly up-regulated, whereas G36948 and AK002210 were significantly down-regulated in PE using the qRT-PCR method. They also found NR 027457, AF085938, G36948 and AK002210 had potential diagnostic value for the detection of PE. In the present study, they identified potential diagnostic biomarkers in PE. Using the miRNA expression profile of GSE84260, a total of 65 differentially expressed miRNAs DEMIs including 32 up-miRNAs and 33 down-regulated miRNAs were identified [20]. In one review, the regulation and function of lncRNAs IGF2/H19, MEG3, SPRY4-IT1, HOTAIR, MALAT1, FLT1P1 and CEACAMP8 in placental trophoblasts was evaluated [21]. Moreover, the metabolic profiling was proved to have good clinical significance in the diagnosis of PE including PC (14:0/00), proline betaine and proline [22]. Although the above studies have found various DEGs between PE and normal samples, the value of early diagnostic biomarkers needs to explored and further investigated.
In our study, we have performed a multi-step integrative bioinformatics analysis to explore significant DEGs with diagnostic value for PE. Based on GO and KEGG pathway enrichment analyses, we found that these DEGs were significantly enriched in disease-related BPs. For example, up-regulated DEGs were mostly enriched in steroid hormone biosynthesis pathways. A previous study reported that the serum concentrations of steroid hormone, including oestrogen (E2) may be associated with PE [23]. The researchers found increased serum levels of asymmetric dimethylarginine (ADMA) in severe PE (sPE) may result from increased secretion from the placenta, and the increased progesterone/E2 ratio may play a role in the development of sPE by aggravating ADMA. In another study, it was reported that the levels of steroid hormones, particularly progesterone and E2, in the serum and placenta of women with PE are down-regulated, which may be mediated by the regulation of steroidogenic enzyme expression in the PE placenta [24]. We also found that down-regulated DEGs were enriched in various immune response processes. Immunological alterations and endothelial cell dysfunction have been widely accepted as major determinants for PE [25]. Different signalling pathways including chemokine, NF-κB, NOD-like receptor, T cell receptor, Toll-like receptor, TNF, MAPK and Notch signalling pathway were also found. MAPK/ERK phosphorylation is highly induced by inositols, which occurs in endothelial dysfunction in PE [26]. Moreover, dysregulation of Notch signalling components were observed in pregnancy disorders such as PE, which plays fundamental roles in different developmental processes of the placenta [27].
Based on our analysis, we identified a total of 17 hub genes with diagnostic value for PE. The concentrations of interleukin-18 (IL-18) were found to be significantly lower in the preeclamptic samples than in controls. Additionally, another study showed concentrations of IL-18 to be potentially linked to oxidative stress in PE [28]. However, serum IL-18 levels were significantly higher in patients with PE than in normal pregnant women in another study [29]. The CD247 molecule (CD247) was also proven to be a DEG in PE and considered as one of the top five DEGs with highest node degree in the PPI network [30]. Leptin (LEP) was differently expressed between the PE cases and controls. In addition, it might be related to the amount of placental pathology involved [31]. In one study, using the microarray dataset GSE44711, Ma et al. [32] found that granzyme A (GZMA) was a DEG between EOPET and normal controls. Additionally, they found that this gene was relevant to the immune system process, which may play significant roles in the progress of EOPET.
Based on the results of pathway enrichment analysis, hub genes were enriched in immune system. By summarising the clinical and experimental evidence, there was a contribution of the immune system to PE [33]. Immune-system alterations were associated with the origin of PE and the changes in T-cell subsets that may be seen in PE [34]. Other results demonstrated that immune imbalance can promote an inflammatory state during PE and it may be a potential therapeutic possibility for the management of PE [35]. Immune related genes such as HLA-DRA, CD247, HLA-DPA1, IL1B and CD3D were found as hub genes in our study. Above results suggested that these 17 hub genes may be potential therapeutic biomarkers of PE.
There are some limitations of the present study that should be acknowledged. Firstly, a sufficiently large number of clinical samples should be used to identify DEGs in PE. Secondly, further experimental verification such as qRT-PCR analysis should be performed.

Conclusion
Our results demonstrated that these 17 differentially expressed hub genes can be used as potential biomarkers for the diagnosis of PE.