Identification of novel biomarkers and candidate small molecule drugs in rheumatoid arthritis and osteoarthritis based on bioinformatics analysis of high‐throughput data

Abstract Background: Rheumatoid arthritis (RA) and osteoarthritis (OA) are two major types of joint diseases. The present study aimed to identify hub genes involved in the pathogenesis and further explore the potential treatment targets of RA and OA. Methods: The gene expression profile of GSE12021 was downloaded from Gene Expression Omnibus (GEO). Total 31 samples (12 RA, 10 OA and 9 NC samples) were used. The differentially expressed genes (DEGs) in RA versus NC, OA versus NC and RA versus OA groups were screened using limma package. We also verified the DEGs in GSE55235 and GSE100786. Functional annotation and protein–protein interaction (PPI) network construction of OA‐ and RA‐specific DEGs were performed. Finally, the candidate small molecules as potential drugs to treat RA and OA were predicted in CMap database. Results: 165 up-regulated and 163 down-regulated DEGs between RA and NC samples, 73 up-regulated and 293 down-regulated DEGs between OA and NC samples, 92 up-regulated and 98 down-regulated DEGs between RA and OA samples were identified. Immune response and TNF signaling pathway were significantly enriched pathways for RA‐ and OA‐specific DEGs, respectively. The hub genes were mainly associated with ‘Primary immunodeficiency’ (RA vs. NC group), ‘Ribosome’ (OA vs. NC group), and ‘Chemokine signaling pathway’ (RA vs. OA group). Arecoline and Cefamandole were the most promising small molecule to reverse the RA and OA gene expression. Conclusion: Our findings suggest new insights into the underlying pathogenesis of RA and OA, which may improve the diagnosis and treatment of these intractable chronic diseases.


Introduction
Osteoarthritis (OA) is characterized by degradation of articular cartilage and subchondral bone resulting in the rigidity deformity and dysfunction of the joints [1]. Rheumatoid arthritis (RA) is a complex, multi-systemic autoimmune disease that mainly has an effect on the flexible joints as well. Although the symptoms of RA are similar to OA, the pathological components for the synovial in RA is quite different with OA [2]. RA manifests as synovial cell hypertrophy and hyperplasia infiltrated with lymphocytes and inflammatory cells, whereas OA has less infiltration of leukocytic [3,4]. As two major types of joint diseases, RA and OA have high morbidity and disability rate especially among the elderly people [5]. In addition, due to the lack of effective treatment, RA and OA are clinically incurable and therefore create huge economic burden for both patients and society [6].
Most recently, many researches have conducted high-throughput methods to screen the genetic factors involved in RA and OA [7][8][9][10]. Therefore, several key genes and novel diagnostic markers have been identified for these diseases. However, prediction of small molecules targeting the gene expression of RA and OA and the underlying mechanisms of the two diseases remain far from being elucidated.
The present study aimed to identify hub genes involved in the pathogenesis of OA and RA and further explore the potential drug targets. First, we screened and verified the differentially expressed genes (DEGs) associated with RA and OA respectively from the downloaded gene expression profiles GSE12021, GSE55235 and GSE100786. Functional annotation and protein-protein interaction (PPI) network construction of RA-and OA-specific DEGs were performed to further explore the molecular mechanisms underlying RA and OA. Most importantly, the Connectivity Map (CMap) database was used to explore candidate small molecule drugs potentially targeting RA and OA.

Data resources
Gene expression profile dataset GSE12021, GSE55235 and GSE100786 was downloaded from the GEO database (http://www.ncbi.nlm.nih.gov/geo/) ( Figure 1A). The dataset GSE12021, GSE55235 and GSE100786 were produced by Affymetrix Human Genome U133A Array (Agilent Technologies, Santa Clara, CA). The dataset GSE12021 contained data of synovial tissues from 31 samples, including 9 normal control (NC), 12 RA, and 10 OA samples. The dataset GSE55235 is including data of synovial tissues from 10 RA and 10 OA samples. The dataset GSE100786 included 6 human peripheral blood (PB) and 8 bone marrow (BM) monocytes samples from patients with RA and OA, respectively.

Data reprocessing
We downloaded the raw data files from GEO datasets. R language (version 3.4.3) (http://www.r-project.org/) and Bioconductor (http://bioconductor.org/) were used to perform data analysis. Data normalization was performed using the Affy package in R software and every probe set was normalized.

Differential expressed genes identification
The normalized data were intersected with gene symbol and reference information. And then the data were analyzed with the Limma package in the R in order to examine DEGs. Only the genes with P value ≤ 0.05 and |log2 fold change (FC)| ≥ 1 were identified as DEGs. Differential expressed genes with statistical significance were identified through volcano plot filtering. Hierarchical clustering was performed using Morpheus (https://software.broadinstitute.org/ morpheus/).

GO and pathway enrichment analysis of DEGs
GO and KEGG pathway enrichment analyses were performed to explore the biological characteristics and functional annotation of candidate DEG using the online tool DAVID (https://david.ncifcrf.gov/). The genes in modules were also analyzed in the same way. Additionally, the Networks Gene Oncology tool (BiNGO) plugin in Cytoscape was used to perform and visualize the biological process analysis of the DEGs.

Protein-protein interaction network construction and hub gene identification
PPI network was constructed by the Search Tool for the Retrieval of Interacting Genes database (STRING, https: //string-db.org/). The Molecular Complex Detection (MCODE) in Cytoscape software was performed to screen the functional modules in the PPI network. Moreover, hub genes were determined based on the interaction edges in the network. Also, GO and KEGG enrichment analyses were performed in module genes as mentioned earlier.

Gene set enrichment analysis
Gene set enrichment analysis (GSEA) could detect changes in the gene set rather than the individual genes, so it can uncover these subtle expression changes and is expected to yield more desirable results. In order to study the effect of the DEGs on various biological functions, GSEA (v3.0, http://software.broadinstitute.org/gsea/downloads.jsp) was used to analysis DEGs of biological functional annotation and pathways.

Identification of small molecules
We queried the Connectivity Map (CMap, http://www.broadinstitute.org/cmap/) to screen the candidate small molecule drugs based on the gene signature of RA and OA. CMap is a collection of databases that stores thousands of gene transcription-expression profiles from cultured mammalian cells exposed to active small molecule drugs. The DEGs were divided into up-regulated and down-regulated groups. The enrichment scores ranging from −1 to +1 were calculated, which represented the similarity. A positive connectivity value (closer to +1) revealed that a small molecule is able to induce the gene expression of RA and OA, whereas a negative connectivity value (closer to −1) reveled that a small molecule is able to imitate the status of normal cells.

Identification and verification of DEGs
The DEGs were investigated in RA, OA and NC in GSE12021.A total of 328 genes were identified to be differentially expressed between RA and NC samples with the threshold of P<0.05 and a minimal 1-fold change of expression. Among these DEGs, 165 were up-regulated and 163 down-regulated in RA compared with NC samples. The top 10 up-and down-regulated genes for RA and NC are listed in Table 1. Similarly, 366 DEGs were identified to be differentially expressed between OA and NC samples, including 73 up-regulated and 293 down-regulated genes. In addition, a total of 190 DEGs between RA and OA were identified, including 92 up-regulated and 98 down-regulated genes. The top 10 DEGs for OA versus NC and RA versus OA samples are listed in Tables 2 and 3, respectively. The Venn diagrams showed the 109 overlap DEGs between DEGs of RA versus NC and DEGs of OA versus NC, consisting of 12 up-regulated genes and 97 down-regulated genes ( Figure 1B).
The volcano plot of DEGs in each group was presented in Figure 1C-E. In addition, Using the Morpheus website, we developed a clustering heatmap of the DEGs (Figure 2A,B).  We verified the top 10 up-and down-regulated genes in GSE55235. As shown in Supplementary Table S1, except for SGCG, all the other genes are differentially expressed between synovial tissue from RA and OA. The Venn diagrams also showed the 24 overlap DEGs between synovial tissue and peripheral blood from RA and OA, the 10 overlap DEGs between synovial tissue and bone marrow from RA and OA in GSE12021 and GSE100786 (Supplementary Figure S1).

Enrichment analysis of DEGs
To gain insights into the biological roles of the DEGs, we performed GO categories enrichment analysis.  Tables S2-4). Additionally, we used the DEGs to perform GSEA analysis ( Figure 6). TCRA pathway, FEEDER pathway (RA vs. OA group), EGFR SMRTE pathway, TCAPOPTOSIS pathway (RA vs. NC group), CACAM pathway, RNA pathway (OA vs. NC group) were significantly enriched ( Figure 6).

PPI network analysis of DEGs
To further explore the relationships between DEGs at the protein level, the PPI networks were constructed based on the interactions of DEGs. In total, 154 nodes and 318 interactions (RA vs. NC group), 136 nodes and 118 interactions (OA vs. NC group), and 177 nodes and 275 interactions (RA vs. OA group) were screened to establish the PPI network. In this network, the top 10 key genes with highest degree scores are shown in Table 4.

Module analysis
The top significant modules were selected respectively, and functional annotation of the genes from the modules was analyzed (Figure 7). KEGG enrichment analysis showed that the genes were mainly associated with 'primary immunodeficiency' (RA vs. NC group), 'ribosome' (OA vs. NC group), and 'chemokine signaling pathway' (RA vs. OA group) (Figure 7).

Identification of related active small molecules
DEGs were first divided into up-regulated and down-regulated groups and then enriched with significantly changed genes obtained from treatment of small molecules from the CMap database. Table 5 showed the predicted small molecules that could inhibit RA/OA-associated gene expression. Cefamandole and Arecoline were identified in OA and RA tissue analysis.

Discussion
OA and RA are two most common joint diseases with similar characteristics in synovitis. However, the underlying pathogenesis of RA and OA remain unclear. Clarifying the molecular mechanism differences between the above two  diseases will help to define special targets for therapeutic intervention. In the present study, bioinformatics analysis was performed to analyze the underlying molecular mechanisms differences between the above two diseases. We extracted the data from GSE12021, which including 9 NC, 12 RA, and 10 OA samples. We identified 165 up-regulated and 163 down-regulated DEGs between RA and NC samples, 73 up-regulated and 293 down-regulated DEGs between OA and NC samples, 92 up-regulated and 98 down-regulated DEGs between RA and OA samples using bioinformatics analysis. We also reanalyzed the expression of the DEGs in GSE55235 and verified the DEGs in human PB and 8 BM monocytes samples from dataset GSE100786. The GO and KEGG functional annotation and pathway enrichment analyses suggested that the identified DEGs were mainly involved in 'immune response' , 'cytokine-cytokine receptor interaction' (RA vs. NC group), 'signal transduction' , 'fat cell differentiation' and 'TNF signaling pathway' (OA vs. NC group), and 'signal transduction' , 'immune response' , and 'cytokine-cytokine receptor interaction' (RA vs. OA group).
RA is an autoimmune disease featured with pain, swelling, and destruction of synovial joints, leading to functional disability [11]. Inflammatory and immune response result in excessive secretion of inflammatory cytokines, growth factors, and matrix metalloproteinases (MMPs), resulting in synovitis and joints degradation [12]. Chemokines and chemokine receptors participate in cellular migration, survival, angiogenesis and leukocyte recruitment of RA and other autoimmune diseases [13]. Based on this present study, CD19, CD2, CD27, CD79A, CD79B, CXCL10, and CXCL9 were immune-related RA-specific DEGs, which might play important roles in the pathogenesis of RA. Although the pathogenesis of RA is not fully understood, leukocyte migration, which is regulated part by cytokines and cytokine receptors contribute to the perpetuation of synovium inflammation in RA. Cytokine-cytokine receptor interaction was a significantly enriched pathway for RA-specific DEGs that emphasized the importance of Cytokine-cytokine receptor interaction and its related RA-specific DEGs in the pathogenesis of RA.
Tumor necrosis factor α (TNF-α) is a potent proinflammatory cytokine that plays a crucial role in inflammatory and immune responses as well as in the pathogenesis of OA. In addition, TNF-α can enhance cartilage degradation and induce bone resorption in OA [14][15][16]. Based on this present study, TLR7 and SOCS3 were two TNF-related OA-specific DEGs which might play roles in the pathogenesis of OA. The activation of the TLR7/8 signaling pathway led to the activation of NF-κB which was required for the induction of TNF-α [17]. SOCS3, as a suppressor of cytokine signaling, has been implicated in transcriptional activation of signal transduction and activators of transcription (STAT) signaling pathway [18][19][20]. SOCS3 is up-regulated in response to various cytokines such as TNF-α, IL-6 and growth hormone [21].
The secretion of CXCL9/10 by immune cells is dependent on IFN-γ [22]. The high level of CXCL9/10 in peripheral liquids is a biomarker of immune response, especially involving Th1-cells. Moreover, the CXCL10 concentrations in RA synovial fluid are much higher than those in OA [23]. Many previous studies revealed the significant role of CXCL9 and CXCL10 in the inflammatory process in RA. In this present study, CXCL9 and CXCL10 were found to be up-regulated in RA compared with both OA and normal controls which provided evidence for the previous study and suggested that they might be potential biomarkers for discrimination of RA and OA.
More importantly, based on the DEGs and data from the CMap database, we acquired a series of small molecules. We were surprised to find that among these small molecules, Cefamandole and Arecoline were demonstrated significant similarity in RA and OA tissues (P<0.05) and additional analysis was required to determine their suitability as broad spectrum anti-arthritis drugs.
In summary, we provide bioinformatic evidence demonstrating that CXCL9 and CXCL10 might be potential biomarkers for discrimination of RA and OA. Additionally, Cefamandole and Arecoline were demonstrated as the potential anti-arthritis drugs in RA and OA tissues (P<0.05). As the pathogenic mechanisms of RA and OA are still not clear, our discoveries may have a broad impact in RA and OA biology and therapy. Nevertheless, large sample size and further mechanism experiments are still needed to confirm our conclusion.