Identification of novel biomarkers involved in pulmonary arterial hypertension based on multiple-microarray analysis

Abstract Pulmonary arterial hypertension (PAH) is a life-threatening chronic cardiopulmonary disorder. However, studies providing PAH-related gene expression profiles are scarce. To identify hub genes involved in PAH, we investigate two microarray data sets from gene expression omnibus (GEO). A total of 150 differentially expressed genes (DEGs) were identified by limma package. Enriched Gene Ontology (GO) annotations and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways of DEGs mostly included mitotic nuclear division, ATPase activity, and Herpes simplex virus one infection. Ten hub genes from three significant modules were ascertained by Cytoscape (CytoHubba). Gene set enrichment analysis (GSEA) plots showed that transcription elongation factor complex was the most significantly enriched gene set positively correlated with the PAH group. At the same time, solute proton symporter activity was the most significantly enriched gene set positively correlated with the control group. Correlation analysis between hub genes suggested that SMC4, TOP2A, SMC2, KIF11, KIF23, ANLN, ARHGAP11A, SMC3, SMC6 and RAD50 may involve in the pathogenesis of PAH. Then, the miRNA-target genes regulation network was performed to unveil the underlying complex association among them. Finally, RNA extracted from monocrotaline (MCT)-induced Rat-PAH model lung artery tissues were to conduct quantitative real-time PCR (qRT-PCR) to validate these hub genes. In conclusion, our study offers new evidence for the underlying molecular mechanisms of PAH as well as attractive targets for diagnosis and treatment of PAH.


Introduction
Pulmonary arterial hypertension (PAH) is a common disorder worldwide characterized by irreversible remodeling of the distal pulmonary arteries, resulting in sustained rise pulmonary vascular resistance and right ventricular failure, eventually, death [1,2]. Over the past decades, tremendous progress has been made in understanding the basic pathobiological of PAH and its essential history, prognostic biomarkers, and treatment options. However, studies providing PAH-related gene expression profiles remain rare. In consequence, it is an urgent mission to identify clinical molecular biomarkers and investigate the underlying mechanisms involved in the PAH, that might help in developing novel scientific-based diagnostic and adopt target-treatment methods in PAH patients.
In current years, bioinformatics analysis has been widely used to analyze microarray data to determine differentially expressed genes (DEGs) and perform various analyses. However, on account of the small sample size and high false-positive rate in single microarray analysis, it may be hard access to reliable data-mining results. In our research, two messenger RNA (mRNA) microarray data sets acquired from gene expression omnibus (GEO) were prepared for further analyses. DEGs between pulmonary arterial

Reverse transcription and quantitative real-time PCR (qRT-PCR)
Total RNA was isolated from MCT-induced Rats-PAH model (n=5) and control rats (n=5) lung artery tissues using TRIzol reagent (Invitrogen, U.S.A.), respectively. According to the manufacturer's instructions, RNA quality and quantity were measured by spectrophotometer. Then RNA was reverse-transcribed into cDNA using PrimeScript™RT reagent kit (No. RR037A, TaKaRa, Japan). qRT-PCR was performed on a LightCycler 480 (Roche). The gene expression levels were analyzed using a TB Green Premix Ex Taq™ kit (No. RR420, TaKaRa, Japan). The relative expression of the gene was calculated using the 2 − Ct method. Tubulin was viewed as an internal control. The thermal cycler conditions were as follows: 30 s at 95.0 • C for cDNA denatured, 40 cycles of 95.0 • C for 5 s, 60 • C for 30 s and 1 min at 60.0 • C. Our experiment was conducted three biological replicates, and all primer sequences are listed in Table 4.

ConnectivityMap (CMap) analysis
To explore potential drugs that may ameliorate pulmonary arterial hypertension, we processed DEGs via CMap analysis (https://portals.broadinstitute.org/cmap), which integrated diseases, drugs, genes based on gene expression profiles. In our study, mean < − 0.4 and P<0.01 were set as the screening criteria.

Statistical method
Data are presented as mean + − standard. Graphs were drawn by using GraphPad Prism software (version 8.0, La Jolla, CA, U.S.A.). PAH-rat versus healthy rat data were analyzed by Student t-test with Welch correction. P-value < 0.05 were considered statistically significant.

Detection of DEGs related to PAH
To identify DEGs linked with PAH, we download two microarray expression profiles from GEO (GSE113439 and GSE53408). About 150 DEGs involved in PAH were screened by limma package after consolidation and normalization of the original microarray data (adjusted P-value < 0.05, |logFC| > 1.5). As was shown in the heatmap (Figure 2A), among them, eight genes were down-regulated, and 142 genes were up-regulated ( Figure 2B).

Gene set enrichment analysis
GSEA was performed to identify gene sets with a statistically significant difference between the PAH groups and healthy controls. Most significant enriched gene sets positively correlated with the PAH group included transcription elongation factor complex, inclusion body, and axon cytoplasm ( Figure 3A-C). Most significant enriched gene sets positively correlated with the control group included solute proton symporter activity, delayed rectifier potassium channel activity, and carbohydrate cation symporter activity ( Figure 3D-F).

PPI network analysis and hub genes recognition
To identify the most significant clusters of DEGs, the PPI network of DEGs was constructed by STRING and visualized by Cytoscape (3.8.0). As was shown in Figure 4A, 101 nodes and 230 edged were contained in the PPI network. Three most significant modules were recognized by MCODE, a plug-in of Cytoscape ( Figure 4B-D). Then, ten hub genes closely linked to pulmonary arterial hypertension were identified by using cytoHubba, a plug-in of Cytoscape ( Figure  4E and Table 3).

GO enrichment analysis
To determine biological features of DEGs, GO annotation was accomplished by R software (4.0.0) as follows: Most significant enrichment in Biological process (BP) of DEGs included mitotic nuclear division, organelle fission, chromosome segregation, and sister chromatid segregation. Primary enrichment in cell component (CC) of DEGs involved mitotic spindle, spindle, chromosomal region, and microtubule. Primary enrichment in molecule function (MF) consisted of ATPase activity, helicase activity, DNA-dependent ATPase activity, and DNA helicase activity ( Figure 5A,B and Tables 1 and 2 ).

KEGG enrichment analysis
To explore enriched pathways of DEGs, KEGG pathway analysis was done using R software (4.0.0). The result of investigation revealed that DEGs were mainly enriched in Herpes simplex virus one infection, Arrhythmogenic right ventricular cardiomyopathy (ARVC), fatty acid metabolism, RNA transport, and platinum drug resistance ( Figure  6A,B).

miRNA-gene inter-regulation network
For further explore the mechanism of the ten core genes, we investigated the potential interaction network of these genes and its response miRNA. An online platform, for instance, miRWalk, miRDB, and TargetScan were then to predict miRNA-mRNA interaction network and visualized through Cytoscape software (Figure 7).

The analysis of qRT-PCR for hub genes
In our study, ten hub genes were identified using the PPI network, which is of the essence to the pathogenesis of PAH. Among these genes, SMC2, SMC4, and RAD50 were well established by reports that they were overexpressed in  PAH patients or animal models compared with healthy cases. In consequence, SMC3, SMC6, KIF11, KIF23, TOP2A, ARHGAP11A, and Anln were interested genes and were validated by qRT-PCR. As shown in Figure 8, the outcomes that KIF23 and ARHGAP11A were lower expressed in PAH rat lung artery tissues than the healthy groups while SMC3, SMC6, KIF11, TOP2A, and Anln were higher expression compared with the healthy group, which was consistence with our predict results ( Figure 8).

ConnectivityMap (CMap) analysis
To predict possible drugs or molecules that may mitigate PAH, CMap analysis was applied to find relevant molecular compounds that can reverse the expression of DEGs in cell lines. As shown in Table 5, obviously, Trioxysalen, Repaglinide, and Fluocinonide are the most significant three compounds.

Discussion
PAH is a chronic refractory disease characterized by arterial lesions in the small-to medium-sized distal pulmonary artery associated with arterial muscularization, concentric endocardial thickening, and formation of plexiform lesions, leading to right ventricular hypertrophy and failure [2,13]. Even though extensive efforts have been made in this field during recent years, the underlying pathological mechanism of PAH remains mostly unknown. The reason may be related to the complexity and variety of human genes, and traditional PAH animal models cannot solve this problem fundamentally. Due to the rapid progress of high-throughput microarray technology and bioinformatic

SMC2
Structural maintenance of chromosomes 2 SMC2 involving in chromosome segregation and stability chromosomal [20].

KIF11
Kinesin family member 11 Overexpression of KIF11during mitosis results in premature separation of sister chromatids and an uneven distribution of chromosomes [28].

KIF23
Kinesin family member 23 KIF23 plays an essential role in the bundling and transport of microtubules to specific intracellular locations in different cells at specific time points [34].

ARHGAP11A
Rho GTPase activating protein 11A ARHGAP11A dynamically regulated colon cancer cell motility and invasion and directly interacted with p53 tetramerization domain to exhibit a Rho-independent role in cancer [27].

SMC3
Structural maintenance of chromosomes 3 Smc3 acetylation stabilizes cohesin association with chromosomes, and its deacetylation by Hos1 in anaphase allows reuse of Smc3 in the next cell cycle [35].

SMC6
Structural maintenance of chromosomes 6 Smc5-Smc6 play an essential role in cellular processes such as genome replication, mitotic and meiotic chromosome segregation, DNA repair [30].  methods, we could make better insight into the critical genes associated with PAH and a deeper understanding of its pathogenesis [14].
Here, we processed a total of 18837 genes. GSEA software could provide valuable information on large-scale genes with a relatively smaller fold change. Then, we found that the PAH group was most positively correlated with enriched gene sets like transcription elongation factor complex, inclusion body, and axon cytoplasm while compared with the control group. Kuanghueih Chen et al. indicated that transcription elongation factor complex might be involved in the  pathogenesis of PAH [15]. Besides, we identified 150 DEGs between the PAH patients and normal control based on two mRNA microarray data sets. Protein-protein co-expression network showed closely linked genes among DEGs.
GO annotation result of DEGs demonstrated that organelle fission, nuclear division, and chromosome segregation were significantly enriched, which is well consistent with current research findings [16]. KEGG enrichment analysis of DEGs showed that many of these genes were mapped to Herpes simplex virus one infection, arrhythmogenic right  ventricular cardiomyopathy (ARVC) [2,17], and fatty acid metabolism [18], which suggested a critical role of immune and inflammatory responses in pulmonary arterial hypertension [19]. A total of ten DEGs were identification as hub genes as follows: SMC4, TOP2A, SMC2, KIF11, KIF23, ANLN, ARHGAP11A, SMC3, SMC6, and RAD50. Verónica Dávalos et al. suggested that high levels of structural maintenance of chromosomes 2 (SMC2) may be required to allow WNT-driven cell proliferation which contribute a lot to the development of PAH and that SMC2 down-regulate could lead to tumor cell apoptosis [20,21]. Li et al. showed the physiological of peroxisome proliferator-activated receptor γ(PPARγ) and DNA damage response (DDR) by using pulmonary arterial hypertension (PAH) as a model that impaired PPARγ signalling pathway related to endothelial cell dysfunction and disrupted PPARγ-UBR5 (MRE11-RAD50-NBS1) interaction, heightened ATM interactor (ATMIN) expression and DNA lesions. Therefore, PPARγ-DDR dysfunction may explain the genomic instability and loss of endothelial homeostasis in PAH [22]. According to a study conducted by Qinlan Wang, Smc4, a core subunit of condensin, to potentially promote an inflammatory innate immune response. They suggested that knockdown of Smc4 inhibited Toll-like receptor-mediated production of proinflammatory cytokines such as IL-6, TNF-α in macrophages [22,23]. HMGB1-TLR4 signaling axis has been shown to stimulate neutrophil NADPH oxidase (NOX2) in both neutrophils and lung microvascular endothelial cells, and NOX2 has played essential roles in the pathogenesis of PH. HMGB1 induces macrophages to secrete proinflammatory cytokines in a TLR4-dependent way [24]. These literature supported the importance of above-stated hub genes.
To date, there is no relevant publication on such hub genes as TOP2A, KIF11, KIF23, ANLN, ARHGAP11A, SMC3 and SMC6. Among them, KIF23 and ARHGAP11A were down-regulated in pulmonary arterial hypertension patients, which might have a protective role in PAH. KIF23 is a nuclear protein that localizes to the interzone of mitotic spindles and acts as a plus-end-directed motor enzyme to control the cellular shape and biological processes such as motility, mitosis, intracellular vesicle transport, organization, and positioning of membranous organelles [25,26]. ARHGAP11A localizes to the plasma membrane in early mitosis and the equatorial membrane in anaphase is known as a regulator of cell cycle-dependent motility and directly interact with p53 tetramerization domain to exhibit a Rho-independent role in cancer [27]. The up-regulation of remaining hub genes might exacerbate the PAH. KIF11 is an evolutionarily conserved microtubule motor protein that functions in centrosome and chromosome dynamics in mitosis, KIF11 silencing induced increases in nuclear areas, micronucleus formation, DNA content and chromosome numbers that may contribute to the pathogenesis of cancer [28,29]. The principal activity of the SMC5/6 complex is the maintenance of nuclear genome stability by resolving complex structures and possibly acting as an antagonist of the cohesin complex TheSMC5/6complex exercise many functions, such as the control of unidirectional rDNA replication, neutralizing toxic DNA intermediates during replication, preventing homologous recombination between nonhomologous sequences [30,31]. Anillin actin-binding protein (ANLN) encodes an actin-binding protein that plays a role in cell growth and migration and regulates actin cytoskeletal dynamics in podocytes [32]. All in all, these predicted genes were experiment supported by qRT-PCR.
In summary, our study aimed to identify key molecules involved in the pathophysiology of pulmonary hypertension. About 150 DEGs and ten hub genes were screened via multiple-microarray analysis, which may become potential targets clinical diagnosis and treatment of PAH in the near term. Our research embraces several merits. First, we applied GSEA to identify gene sets and GO enrichment (biological process, cell component, and molecular function) with a statistically significant between the PAH groups and normal control. As a result, transcription elongation factor complex, inclusion body, and axon cytoplasm were determined most positively linked with PAH. Second, we scrutinized 150 DEGs and finally screened ten essential genes from PPI network, the inter-regulation network between ten target genes, and response miRNA was then performed to explore the potential mechanism of their biological function thoroughly. More importantly, these predicted molecules were well consolidated by experiment. However, owing to the lack of PAH patients' detailed information, it is difficult to draw a clear association between selected genes and the severity of PAH while using the same samples. The mechanisms of PAH are needed to further explore in vitro or in vivo experiments.

Data Availability
All datasets for this study are included in the article, and Supplementary Materials are available from the corresponding author.