Integrated bioinformatics analysis reveals novel key biomarkers and potential candidate small molecule drugs in gestational diabetes mellitus

Abstract Gestational diabetes mellitus (GDM) is the metabolic disorder that appears during pregnancy. The current investigation aimed to identify central differentially expressed genes (DEGs) in GDM. The transcription profiling by array data (E-MTAB-6418) was obtained from the ArrayExpress database. The DEGs between GDM samples and non-GDM samples were analyzed. Functional enrichment analysis were performed using ToppGene. Then we constructed the protein–protein interaction (PPI) network of DEGs by the Search Tool for the Retrieval of Interacting Genes database (STRING) and module analysis was performed. Subsequently, we constructed the miRNA–hub gene network and TF–hub gene regulatory network. The validation of hub genes was performed through receiver operating characteristic curve (ROC). Finally, the candidate small molecules as potential drugs to treat GDM were predicted by using molecular docking. Through transcription profiling by array data, a total of 869 DEGs were detected including 439 up-regulated and 430 down-regulated genes. Functional enrichment analysis showed these DEGs were mainly enriched in reproduction, cell adhesion, cell surface interactions at the vascular wall and extracellular matrix organization. Ten genes, HSP90AA1, EGFR, RPS13, RBX1, PAK1, FYN, ABL1, SMAD3, STAT3 and PRKCA were associated with GDM, according to ROC analysis. Finally, the most significant small molecules were predicted based on molecular docking. This investigation identified hub genes, signal pathways and therapeutic agents, which might help us, enhance our understanding of the mechanisms of GDM and find some novel therapeutic agents for GDM.


Introduction
Gestational diabetes mellitus (GDM) is the metabolic disorder diagnosed during pregnancy, affecting 2-5% of pregnant women worldwide [1,2]. Risk factors of GDM include obesity, previous occurrence of diabetes, family history of type 2 diabetes, preeclampsia, hypertension, cardiovascular diseases and genetic factors [3]. At third trimester of pregnancy, blood glucose levels are drastically elevated [4]. Moreover, the elevated glucose level in pregnancy is closely linked with detrimental consequences in the newborn babies includes fetal hyperglycemia and cardiovascular disease [5]. Therefore, it is essential to examine the factual molecular targets included in occurrence and advancement of GDM, in order to make an improvement to the diagnosis, prognosis and treatment of GDM.
The molecular mechanisms of GDM initiation and development remain unclear. It is therefore essential to identify new genes and pathways that are linked with GDM progression and patient prognosis, which might not only help to explicate the underlying molecular mechanisms associated, but also to discover sdf. format using Open Babelfree software. The protein structure was processed after introduction of the protein, the co-crystallized ligand and all the water molecules were excluded from the crystal structure; more hydrogen was added and refined the side chain. The present study employed CDOCKER, a grid-based molecular docking approach that utilizes the CHARMm force field. A higher number indicates a stronger bond. The CDOCKER score is expressed as a negative number (-CDOCKER ENERGY). The H-bonds, van der Waals and electrostatic interactions between the target protein and the ligand were used to measure the CDOCKER energy. The modeled protein's binding site was determined using the template protein's crystal data and proteins which did not co-crystallize ligand generated binding site automatically. To make it easier for ligands to interact with amino acids, the binding site sphere center was set at 9Å radius. Furthermore, using smart minimizer algorithm, CHARMm force field was applied followed by energy minimization to define local minima (lowest energy conformation) of the modeled over expressed proteins with an energy gradient of 0.1 kcal mol −1 .Å −1 , respectively. The energy minimized receptor protein and the set of 44 natural molecules which was reported as effective in diabetes mellitus and the well-known commonly used allopathic drugs, Metformin and Glyburide, were used as standard and to compare the binding interactions with natural molecules on overexpressed proteins in gestational diabetes. The binding site sphere radius set at X = 29.50, Y = −31.38 and Z = −38.79 were submitted to the CDOCKER parameter and also calculated binding energy. The X-ray co-crystallized structures were extracted from Protein Data Bank of PDB code of 4UV7, 5NJX, 3Q4Z and 3FNI of overexpressed genes of Epidermal growth factor receptor (EGFR), Heat shock protein 90 α family class A member 1 (HSP90AA1), P21 RAC1 activated kinase 1 (PAK1) and Ring-box 1 (RBX1), respectively, in gestational diabetes were selected for docking studies [26][27][28][29]. The best position was inserted into the molecular area between the protein and the ligand. The 2D and 3D interaction of amino acid molecules was achieved using the free online Discovery Studio Visualizer.

Identification of DEGs
Transcription profiling by array datasets was obtained from the ArrayExpress database containing GDM and non-GDM samples; E-MTAB-6418. Then, the R package named 'limma' was processed for analysis with adjusted P<0.05, |log FC| > 1.158 for up-regulated genes and |log FC| < −0.83 for down-regulated genes. All DEGs were displayed in volcano maps ( Figure 1). A total of 869 genes were finally obtained including 439 up-regulated and 430 down-regulated genes in the GDM samples compared with the non-GDM samples and are listed in Table 2. Top 869 genes in this dataset were displayed in the heatmap (Figure 2).

Gene ontology and pathway enrichment of DEGs analysis
To clarify the major functions of these DEGs, we first explored the associated biological processes and REACTOME pathways. The top highly enriched GO terms were divided into three categories: biological process (BP), cellular component (CC) and molecular function (MF) and are listed in Table 3. The most enriched GO terms in BP was reproduction, macromolecule catabolic process, cell adhesion and localization of cell, that in CC was nuclear outer membrane-endoplasmic reticulum membrane network, Golgi apparatus, supramolecular complex and cell junction, and that in MF had identical protein binding, molecular function regulator, signaling receptor binding and molecular function regulator. In the REACTOME pathway enrichment analysis, the DEGs were mostly enriched in cell surface interactions at the vascular wall, epigenetic regulation of gene expression, extracellular matrix organization and axon guidance and are listed in Table 4.

PPI network establishment and modules selection
By using the STRING database, the PPI network of DEGs was established and consisted of 4687 nodes and 11236 edges (Figure 3). A total of ten hub genes were selected for key biomarker identification and are listed in Table 5. They consisted of five up-regulated genes (HSP90AA1, EGFR, RPS13, RBX1 and PAK1) and five down-regulated genes (FYN, ABL1, SMAD3, STAT3 and PRKCA). Then PEWCC1 was used to find clusters in the network. Four modules were calculated according to k-core = 2. Among them, module 1 contained 16 nodes and 32 edges, with the highest score ( Figure 4A) and module 2 contained 16 nodes and 34 edges ( Figure 4B). We performed the functional analysis for the top 2 modules. In functional enrichment analysis, the DEGs of module 1 were mostly enriched in post-translational protein modification, developmental biology and macromolecule catabolic process; the DEGs of module 2 in supramolecular complex and localization of cell.

MiRNA-hub gene regulatory network construction
miRNet database was applied to screen the targeted miRNAs of the hub genes. Cytoscape software was used to construct the miRNA-hub gene network. As illustrated in Figure 5, the interaction network consists of 307 hub genes and 2280 miRNAs. The hub genes and miRNAs in the network were ranked by their degree of connectivity using Network Analyzer and are listed in Table 6. Based on the expression trend of hub genes in GDM, we found that UBE2D3 was the predicted target of hsa-mir-6127, HSP90AA1 was the predicted target of hsa-let-7d-5p, PAK2 was the predicted target of hsa-mir-8063, DDB1 was the predicted target of hsa-mir-329-3p, DVL3 was the predicted target of hsa-mir-1207-5p, FYN was the predicted target of hsa-mir-4651, ABL1 was the predicted target of hsa-mir-410-5p, SMAD3 was the predicted target of hsa-mir-222-3p, STAT3 was the predicted target of hsa-mir-29c-3p and PRKCA was the predicted target of hsa-mir-663a.

TF-hub gene regulatory network construction
NetworkAnalyst database was applied to screen the targeted TFs of the hub genes. Cytoscape software was used to construct the TF-hub gene network. As illustrated in Figure 6, the interaction network consists of 306 hub genes and 195 TFs. The hub genes and TFs in the network were ranked by their degree of connectivity using Network Analyzer and are listed in Table 6. Based on the expression trend of hub genes in GDM, we found that HSP90AA1 was the predicted target of E2F1, UBE2D3 was the predicted target of HCFC1, EGFR was the predicted target of SRY, PSMC4 was the predicted target of ZFX, DDB1 was the predicted target of RUNX1, STAT3 was the predicted target of SPI1, CCND1 was the predicted target of MYBL2, SMAD3 was the predicted target of SUZ12, FOXO1 was the predicted target of TBX3 and PRKCA was the predicted target of YAP1.

ROC curve analysis
ROC curve analysis was implemented to evaluate the capacity of hub genes to distinguish GDM and non-GDM in E-MTAB-6418, HSP90AA1, EGFR, RPS13, RBX1, PAK1, FYN, ABL1, SMAD3, STAT3 and PRKCA, exhibiting better diagnostic efficiency for GDM and non-GDM, and the combined diagnosis of these ten hub genes was more effective. The AUC index for the ten hub gene scores were 0.906, 0.838, 0.825, 0.897, 0.863, 0.876, 0.855, 0.880, 0.932 and 0.872, and are shown in Figure 7.

RT-PCR analysis
To further verify the expression level of hub genes in GDM, RT-PCR was performed to calculate the mRNA levels of the ten hub genes identified in the present study (HSP90AA1, EGFR, RPS13, RBX1, PAK1, FYN, ABL1, SMAD3, STAT3 and PRKCA) in GDM. As illustrated in Figure 8, the expressions of HSP90AA1, EGFR, RPS13, RBX1, PAK1 were significantly up-regulated in GDM samples compared with normal, while FYN, ABL1, SMAD3, STAT3 and PRKCA were significantly down-regulated in GDM samples compared with normal. The present RT-PCR results were in-line with the aforementioned bioinformatics analysis, suggesting that these hub genes might be linked to the molecular mechanism underlying GDM.

Molecular docking experiments
In the recent findings, the docking study was performed using Biovia Discovery Studio perpetual software to analyze the binding pattern of the natural plant products such as herbs have the ability to lower blood glucose levels   Figure 9. The molecules were constructed based on the natural plant products containing these chemical constituents which play vital role in reducing type 2 diabetes mellitus. The traditional plant products are used in conjunction with allopathic drug to reduce the dose of the allopathic drugs and/or to increase the efficacy of allopathic drugs. Some common and most prominent antidiabetic plants and active principles were selected from their phytochemicals for docking studies in the present research to identify the active natural molecule to avoid the use of allopathic drugs in gestational diabetes and the blood sugar level is controlled by altering the diet. For docking experiments well-known and most commonly used two allopathic drugs such as Glyburide (GLY), Metformin (MET) in gestational diabetes are used as standard and to compare the binding interaction of natural phytoconstituents with allopathic drugs. A total of common 44 in that 42 natural active constituents, few from each of flavonoids, saponins, tannins and glycosides etc., present in plant extracts responsible for antidiabetic function and 2 allopathic drugs were chosen for docking studies on overexpressed proteins and the structures are depicted in Figure  1, respectively. The one protein from each overexpressed gene in gestational type 2 diabetes mellitus such as EGFR, HSP90AA1, PAK1 and RBX1 and their X-ray crystallographic structure and co-crystallized PDB code and their PDB codes 4UV7, 5NJX, 3Q4Z and 3FNI, respectively were constructed for docking. The docking on natural active constituents was conducted to classify the potential molecule and their binding affinity to proteins. A higher number of negative number -CDOCKER energy and binding energy indicates a stronger binding interactions with proteins, few constituents obtained with a greater -CDOCKER energy and binding energy respectively with particular proteins. Docking experiments were carried out on a total of 42 constituents from plant products, few constituents obtained excellent -CDOCKER energy and binding energy. Out of 44 molecules, few of the molecules obtained -CDOCKER interaction energy of more than 40 and majority with more than 30 and less than 40, few molecules obtained optimum -CDOCKER interaction energy of less than 30, respectively. the molecules with -CDOCKER interaction energy of 40 m m m  3  68 8 8   r 55  r  ir r  84 8 8  88 88 8  68 6 68 68 68 3  84-3 84 8  88 88 88 8 84 8  88 8  47 4 CAV1   82 82 2 2 2 2  8  78 78 8 8 8  7 7 7 7 7 7 7 7 9-3  3p 3p 3p 3p  9 9 9  82 82 82 2 2 2 2  8 8 8 8 8  9 3 3 3 3  9 9  -3 3 3 3  9  7 7 7 7 7 7 7 7 7  9  8 8 8 8 82 82-82 2  82-82 2-2-82 8  78 8  78 8  78  478 78 8  47 47 7 7  hsa-mir-411-5p   hsa-mir-616-5p   17 17 1  31 31  r-3 3 3 3176 1 17 17 1 1  r-3 3  r 3  3  hsa-mir- a-a a a a a a a-a-a-m a-m a-m a- a-a a a a a a a a a a a-a-- a a a a a-a-a-a a a m m m  a-a a a a a-a-a a a-a-a-a a-  mir 6 mir 5p p p p p p p p p p p p p p p p p 6-6-6-6---6-6-6 6 6 6 6 6 6 6 6 6 6-6-6 6m -5p -5p -5p -5p -5p -5p -5p -5p -5p -5p 5p 5p 5p 5p 5 5 5p p   sa sa a a a a a a a a a a a 4 4 4 4 4 4 4 4 4 4 4 4 47 7 7 7  4 4 47 7 7 7 7 7 7 7 7 7 7 7 7 Table 7. The two molecules such as ALO and MAL (Figures 10 and 11), their interaction with amino acids of proteins with 3D structures for 3FN1 ( Figure 12) and 3Q4Z (Figure 13), while 2D structures for 3FN1 ( Figure 14) and 3Q4Z ( Figure 15).

Discussion
GDM is a metabolic disorder that can be caused by various factors, including genetics and the endocrine system. It is essential to understand the molecular mechanisms underlying GDM in order to find and advance more valid diagnostic and therapeutic strategies. Gene chip technology is generally used to reveal the expression levels of numerous genes within the human genome and might help in the recognition of target genes of interest for diagnosing or treating GDM.
In our study, a total of 869 DEGs were screened, including 439 up-regulated genes and 430 down-regulated genes. The CGB5 was associated with pregnancy success and might be a possible genetic marker for pregnancy success [30]. Studies have reported that corticotropin releasing hormone (CRH) [31], PSG1 [32] and CYP19A1 [33] are directly related to the development preeclampsia. CD248 has become an attractive target for hypertension [34], but this gene might be novel target for GDM. A previous study showed that COL1A1 [35] was expressed in type 2 diabetes mellitus, but this gene might be novel target for GDM. In a recent study, ABI3BP could facilitate the progression of cardiovascular diseases [36], but this gene might be novel target for GDM. In previous studies, there was a large amount of evidence that MFAP4 [37] directly or indirectly affects the occurrence and development of type 1 diabetes mellitus, but this gene might be novel target for GDM.
We conducted a comprehensive bioinformatics analysis on transcription profiling of GDM. Hub genes and pathways were identified to provide more detailed molecular mechanisms for the process of GDM and shed light on potential therapeutic targets. Nevertheless, further experiments are needed to further validate the identified hub genes and pathways.