Identification of metastasis and prognosis-associated genes for serous ovarian cancer

Abstract Serous ovarian cancer is one of the most fatal gynecological tumors with an extremely low 5-year survival rate. Most patients are diagnosed at an advanced stage with wide metastasis. The dysregulation of genes serves an important role in the metastasis progression of ovarian cancer. Differentially expressed genes (DEGs) between primary tumors and metastases of serous ovarian cancer were screened out in the gene expression profile of GSE73168 from Gene Expression Omnibus (GEO). Cytoscape plugin cytoHubba and weighted gene co-expression network analysis (WGCNA) were utilized to select hub genes. Univariate and multivariate Cox regression analyses were used to screen out prognosis-associated genes. Furthermore, the Oncomine validation, prognostic analysis, methylation mechanism, gene set enrichment analysis (GSEA), TIMER database analysis and administration of candidate molecular drugs were conducted for hub genes. Nine hundred and fifty-seven DEGs were identified in the gene expression profile of GSE73168. After using Cytoscape plugin cytoHubba, 83 genes were verified. In co-expression network, the blue module was most closely related to tumor metastasis. Furthermore, the genes in Cytoscape were analyzed, showing that the blue module and screened 17 genes were closely associated with tumor metastasis. Univariate and multivariate Cox regression revealed that the age, stage and STMN2 were independent prognostic factors. The Cancer Genome Atlas (TCGA) suggested that the up-regulated expression of STMN2 was related to poor prognosis of ovarian cancer. Thus, STMN2 was considered as a new key gene after expression validation, survival analysis and TIMER database validation. GSEA confirmed that STMN2 was probably involved in ECM receptor interaction, focal adhesion, TGF beta signaling pathway and MAPK signaling pathway. Furthermore, three candidate small molecule drugs for tumor metastasis (diprophylline, valinomycin and anisomycin) were screened out. The quantitative reverse transcription-polymerase chain reaction (qRT-PCR) and western blot showed that STMN2 was highly expressed in ovarian cancer tissue and ovarian cancer cell lines. Further studies are needed to investigate these prognosis-associated genes for new therapy target.


Introduction
As one of the most prevalent malignancies in gynecology, ovarian cancer carries the highest mortality [1]. Most patients are diagnosed at an advanced stage with wide metastasis, leading to an extremely low 5-year survival rate. Tumor metastasis is closely related with poor prognosis, and is also the main death reason in patients with ovarian cancer [2,3]. Therefore, it is vital to explore new biomarkers to suppress tumor metastasis.
Because of the limitation of molecular biology research, bioinformatics has been widely utilized in screening and analyzing the genes related to occurrence and progression of tumor [4,5]. A lot of genes with similar expression pattern actually function within a whole network and affect each other [6]. Most of previous studies have only identified the single gene or protein, but do not describe the association between genes and interaction pathways. Weighted gene co-expression network analysis (WGCNA) is a biological network to describe the correlations between differentially expressed genes (DEGs) [7]. This method identifies synergistically altered genomes and specific candidate biomarkers based on the correlations between the genes and phenotype. WGCNA has been widely used to screen genes associated with the phenotype of cancer, such as cervical cancer and pancreatic ductal adenocarcinoma [8,9]. In the present paper, the DEGs were analyzed and biological network was constructed to verify hub genes implied in the metastasis of ovarian cancer.

Microarray data
To explore significant genes in linkage to the metastasis progression of ovarian carcinoma, the raw gene expression profiles of GSE73168 dataset were acquired from Gene Expression Omnibus (GEO) database (https://www. ncbi.nlm.nih.gov/geo/). The mRNA expression profile of GSE73168 dataset was detected by using GPL570 platform (Affymetrix Human Genome U133 Plus 2.0 Array). Twenty-four samples from eight patients were involved in the GSE73168 profile, including primary tumor tissue samples with matching ascites tumor cell samples and metastatic tumor samples.

Identification of differentially expressed genes
The flow diagram for the design of this research is displayed in Supplementary Figure S1. The raw dataset of GS73168 was analyzed with the GEO2R (https://www.ncbi.nlm.nih.gov/geo/geo2r/), a web tool using 'limma' package of R and utilized to screen DEGs between primary and metastasis tumor tissues. The points with |logFC| ≥ 1 and P-value < 0.05 were defined to be statistically meaningful cut-off points. Gene annotation and corresponding data files of the DEGs were extracted through R software.

Gene Ontology analysis and Kyoto Encyclopedia of Genes and Genomes analysis
To further systematically analyze the functional annotation of DEGs, the clusterProfiler package of R [10] was utilized to perform [11] Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) analysis. The P-value was conventionally set at 0.05.

Protein-protein interaction network construction
To explore interactions between DEGs, a protein-protein interaction (PPI) was constructed by using the Search Tool of the Retrieval of Interacting Genes Database (STRING) (https://www.string-db.org/), an authoritative database to assess interactions between proteins. The cut-off value was set at a confidence score >0.7 and individual nodes were filtered out.

Screening of hub module
String interactions were imported into Cytoscape plug-in cytoHubba to select the hub genes of biological network analysis [12]. In our study, the top 20 genes were chosen as hub genes.

Co-expression and module functional analysis
Co-expression analysis is a multidirectional network, in which each node represents a gene. The DEGs of GSE73168 dataset were utilized to construct co-expression network after assessing the expression profile. The R package 'WGCNA' was used to build co-expression network of DEGs. In addition, Pearson's correlation coefficient of genes was calculated to obtain similarity matrix. To make the network conform to scale-free network distribution, an appropriate weight value was selected and calculated. An appropriate soft threshold value was chosen to measure the connectivity between genes. The adjacency matrix was converted into topological overlap matrix (TOM); meanwhile, the WGCNA package was used to perform hierarchical clustering on the matrix. For the generated clustering tree, Dynamic Tree Cut method was adopted to cut the gene clustering tree. Genes with similar expression patterns were allocated to a branch, and each branch represented a co-expression module. In present study, the soft threshold was 7 and the minimum size of module was set as 30. After analyzing the association between modules and clinical traits, the hub module was selected, which is the most relevant to metastasis in ovarian cancer. Gene significance (GS) represents the association between the gene and clinical traits. Module membership (MM) reveals the relation between module eigen genes and gene expression.

Venn diagram of hub genes
The intersection of PPI hub genes and blue module genes were considered as real hub genes, which are probably associated with metastasis. To achieve that intersection, an online web tool (http://bioinfogp.cnb.csic.es/tools/venny/ index.html) was used to plot Venn diagram.

Prognostic signature of hub gene
The RNA sequencing data and the matching clinical characteristic data of patients with serous ovarian cancer were obtained from The Cancer Genome Atlas (TCGA) database (https://Cancergenome.nih.gov/). R package 'survival' was utilized to conduct overall survival analysis. The Univariate Cox regression analysis was performed to investigate the correlation between gene expression and overall survival. Genes with P-value <0.05 were considered to be significant.

Validations and analysis of hub gene
Hub genes were further analyzed by using the data downloaded from Illumina Human Methylation 27 platform in the TCGA database. The ONCOMINE database [13] (www.Oncomine.org), a publicly accessible online cancer database integrating sequencing data from several database, was used to assess different expressions of STMN2 in ovarian cancer tissues and ovarian normal tissues.

Gene set enrichment analysis
The samples of GSE73168 dataset have been placed into two groups according to the expression levels of hub genes. Gene set enrichment analysis (GSEA) (https://software.broadinstitute.org/gsea/index.jsp) was conducted in order to explore biological function. Terms with P < 0.05 were identified to be significant.

TIMER database analysis
Tumor Immune Estimation Resource (TIMER) is an integrated investigation for molecular characterization of immune infiltration (https://cistrome.shinyapps.io/timer/) [14]. TIMER uses six previously published statistical modules to study the correlation between the gene expression and immune cell tumor-infiltration [15]. Gene expression levels were visualized with log 2 RSEM.

Analysis of small molecule drugs
The Connectivity Map (CMap) database was established to explore the small molecule involved in tumor metastasis [16]. The DEGs were inputted into CMap database. The enrichment scores ranging from −1 to 1 were analyzed. The negative linkage score indicates that the drugs can reverse input characteristics. The connectivity score was evaluated by the number of instances (N > 10) and P-value <0.05. The tomography of small molecule drugs was displayed by using the Pubchem database (https://pubchem.ncbi.nlm.nih.gov/).

Patients and tissue samples
Twenty fresh ovarian tissue samples (10 normal and 10 ovarian cancers) were frozen in the liquid nitrogen after washed by ice-cold PBS once removed from patients. In the present study, the usage of patients' tissues and access to patients' information were approved by Ethics Committee of the First Affiliated Huai'an Hospital of Nanjing Medical University.

Western blot experiment
RIPA buffer (Tris 50 mmol/l, NaCl 150 mmol/l, EDTA 0.5 mmol/l, NP40 1%, Triton X-100 0.5%, glycerin 10%) supplemented with protease inhibitor cocktail (MCE, Shanghai, China) and phosphatase inhibitors was used to extract proteins from cells and fresh ovarian tissues. After using Bicinchoninic Acid Kit (Beyotime, Shanghai, China) for protein quantification, the lysate was metallized with added loading buffer at 100 • C for 10 min to denature. The protein was transferred to the PVDF membrane after sodium dodecyl sulfate/polyacrylamide gel electrophoresis. After blocked with 5% defatted milk powder dissolved in TBST (pH 7.4, NaCl 8 g/l, KCl 0.2 g/l, Tris 3 g/l) at room temperature for 1 h, the membrane was incubated with the primary antibody at 4 • C overnight, including STMN2 (1:1000, ab185956, Abcam, U.S.A.) and β-actin (1:1000, A5441, Sigma, U.S.A.). The membrane was incubated with horseradish peroxidase-coupled secondary antibody at room temperature for 1 h after washed with TBST for four times. Target protein bands were developed by Super ECL Plus kit (US Everbright INC, U.S.A.) and quantitatively analyzed by Image J.

Statistical analysis
Statistical analysis was conducted using Graphpad Prism 6. Unpaired t test was utilized for comparing continuous variables between two groups. P-value less than 0.05 was considered to be statistically significant.

Identification and enrichment of DEGs in ovarian cancer
The GEO2R tool was utilized to analyze DEGs in GSE73168. P <0.05 and |logFC| ≥1 were included as the standards. 957 DEGs between primary sites and metastases of serous ovarian cancer were filtered, revealing 417 up-regulated genes and 540 down-regulated genes. DEGs were displayed in the volcano map and heatmap based on the |logFC| values ( Figure 1). DEGs were mostly enriched in cell fate commitment, cell fate specification, neuron fate specification, neuron fate commitment, anion transmembrane transporter activity, chloride transmembrane transporter activity, chloride channel activity, inorganic anion transmembrane transporter activity, small GTPase binding, and anion channel activity in the GO analysis (Suppementary Figure S2A). In the KEGG analysis, DEGs were enriched in neuroactive ligand-receptor interaction, GABAergic synapse, and nicotine addiction (Suppementary Figure S2B).

Hub module selection
Through the STRING analysis, 957 DEGs were inputted into the PPI network, including 514 nodes and 842 sides (Figure 2A). Then cytoHubba, which is a plugin to rank nodes by their network capabilities [12], was utilized to offer 11 analysis methods including Density of Maximum Neighborhood Component, Maximal Clique Centrality, Maximum Neighborhood Component, Degree, Edge Percolated Component, and six centralities (Bottleneck, Ec-Centricity, Closeness, Radiality, Betweenness, and Stress). All of 11 analysis methods in the PPI network were used in the present study to identify top twenty genes ( Figure 2B-L). Eighty-two genes were selected according to 11 analysis methods in CytoHubba. This finding may indicate that 82 genes play an important role in ovarian cancer.

Weighted gene coexpression network construction and analysis
A total of 16 clinical samples of GSE73168 were collected for analysis (Supplementary Figure S1B). The 'WGCNA' package in R was conducted, and the genes with highly related genes were grouped into one module. β = 7 (scale free R 2 = 0.90) was chosen to be appropriate soft-thresholding value in order to ensure a scale-free analysis (Supplementary Figure S3). Seven modules were identified, and it was found that blue module was mostly related to tumor metastasis ( Figure 3A,B). All the genes were covered in the heat map ( Figure 3C). Furthermore, intra-module analysis of GS and MM in seven modules was performed. GS and MM were significantly associated, implying that 112 genes involved in blue module probably have significant association with metastasis among seven modules (Supplementary Figure S4A).   Interestingly, similarities were observed in certain gene modules. To find out the connectivity among the seven gene modules, interactions of eigengenes were further analyzed. Seven clusters were again divided into two clusters, including two branches (Supplementary Figure S4B,C).

Hub genes identification
One hundred twelve genes in the blue module were chosen as hub genes. Eighty-two genes identified by cytoHubba in the PPI network were defined as hub genes. Finally, intersection of PPI network and WGCNA analysis were 17 genes, which were considered to be 'real' hub genes ( Figure 4A).

Prognostic analysis of hub gene
Next, the prognostic signature of 17 hub genes were analyzed. Univariate and multivariate Cox regression analyses were employed to predict factors related with the prognosis ( Table 1). The age, stage, and STMN2 expression were significantly correlated to overall patient survival using TCGA database, except grade ( Figure 4B). Univariate and multivariate Cox regression analyses of the remaining 16 genes implied no correlation with overall patient survival (Supplementary Table S1).

Hub gene validation
STMN2 expression was further verified in ovarian cancer by using FireBrowse and ONCOMINE database. ON-COMINE analysis showed that STMN2 transcripts were 1.116-fold higher in cancer compared with normal samples from Bonome Ovarian Statistics (P = 6.37E−5), indicating that the expression of STMN2 was up-regulated in ovarian cancer ( Figure 4C). The gene was further studied to understand the possible mechanism for abnormal expression. By  using the Illumina Human Methylation 27 platform, the expression of STMN2 was negatively correlated with DNA methylation ( Figure 4E). Survival analysis implied that higher STMN2 expression was correlated with poorer survival rate ( Figure 4D).

Pertinence of STMN2 expression and immune infiltration level in ovarian cancer
The distribution of tumor-infiltrating lymphocytes is an important indicator of patients' lymph node status and prognosis [17,18]. The association of STMN2 expression level with immune infiltration abundance in ovarian tumor was evaluated using TIMER database. STMN2 expression was negatively correlated with infiltration degree of B cells, CD8+ T cells, macrophage, neutrophil, and dendritic cells ( Figure 5).

Related small molecule drug screening
All DEGs were explored in CMap database to analyze the small molecule drugs. Three small molecule drugs with high connectivity scores are displayed in Table 2. Diprophylline, valinomycin, and anisomycin showed a negative association with the tumor metastasis and implied great possibility in clinical application. The tomography of the three molecules was displayed in the PubChem database ( Figure 6).

Positive expression of STMN2 in ovarian cancer
The high expression of STMN2 in ovarian cancer tissue was verified in mRNA and protein level compared with normal ovarian tissue, which is consistent with the results of Oncomine database ( Figure 7A-C). Furthermore, the expression level of STMN2 in ovarian cell lines was also explored through western blot. STMN2 was highly expressed in ovarian cancer cell lines (SK-OV3, A2780, and HO-8910) compared with normal cell lines (IOSE-80) ( Figure 7D).

Discussion
Ovarian cancer is the prevalent malignancy of female reproductive tract. The high rate of recurrence and metastasis leads to an extremely low 5-year survival rate. Hence, it is vital to screen the markers for early diagnosis and regulatory pathways involved in metastasis of ovarian cancer. Gene expression profile of GSE73168 containing eight primary tumors and eight matched metastases of serous ovarian cancer was utilized to seek some biomarkers and figure out the molecule mechanism of metastasis in ovarian cancer in this research. 957 DEGs were found to be related to tumor metastasis, including 417 up-regulated genes and 540 down-regulated genes. The PPI and WGCNA analyses were used to choose the hub genes related to clinical metastasis. In addition, the analysis of functions and pathways involved in ovarian cancer was performed.
In the GO analysis, DEGs were mostly enriched in the cell fate commitment, cell fate specification, neuron fate specification, neuron fate commitment, anion transmembrane transporter activity, chloride transmembrane transporter activity, chloride channel activity, inorganic anion transmembrane transporter activity, small GTPase binding, and anion channel activities related to the cell cycle, which is consistent with characteristics of rapid proliferation and fast growing in ovarian cancer [19]. In the KEGG, DEGs were mostly enriched in neuroactive ligand-receptor interaction, GABAergic synapse, and nicotine addiction. Neuroactive ligand-receptor interaction has been demonstrated to be involved in some cancers, such as renal cell carcinoma [20], breast cancer [21], and lung adenocarcinoma [22].
WGCNA analysis indicated that seven modules possessed related expression patterns. 112 hub genes in the blue module connected with metastasis were selected. Based on the plugin cytoHubba in Cytoscape, 17 genes were screened. Univariate Cox regression of 17 genes was analyzed by the TCGA database, and STMN2 was finally obtained. The expression of STMN2 was up-regulated in cancer group compared with the normal group. STMN2 was highly expressed in ovarian cancer tissue and cell lines. STMN2 is a member of the stathmin gene family, and plays an important part in human hepatoma cells [23,24]. Stathmin gene family members participate in microtubule polymerization [25]. And it was reported that STMN2 is a new target of β-catenin/TCF-mediated transcription in hepatoma cell [23]. STMN2 is involved in TCF binding sites and participates in Wnt/β-catenin signaling [24].
Furthermore, our research showed that STMN2 expression levels were associated with CD4+ T cell, B cell, CD8+ T cell, macrophage, neutrophil and dendritic cell. Immune-related mechanisms involved in some cancer and immunotherapy strategy is a potential direction for the treatment of ovarian cancer [26]. This result indicated that STMN2 participates in the recruitment and regulation of immune-infiltrating cells in ovarian cancer. However, functions and pathways of STMN2 in tumor-infiltrating lymphocytes need further study.
Nowadays, paclitaxel/carboplatin combined with chemotherapy has been widely used for ovarian cancer patients. In this research, three small molecule drugs were identified, which may serve as potential therapeutic targets. Diprophylline, the most important small molecule in our study, has been reported to suppress proliferation and migration of non-small cell lung cancer via down-regulating PI3K signaling pathway [27]. Valinomycin shows a potential therapeutic effect when combined with cisplatin in vitro [28]. Anisomycin has been revealed to suppress proliferation and invasion of ovarian cancer stem cell through regulating LncRNA BACE1-AS [29].

Conclusion
In conclusion, through an integrated bioinformatics analysis, STMN2 was screened as a hub gene, mostly associated with the metastasis of ovarian cancer, and its functions and pathways involved in the ovarian cancer were explored. The gene deserves further exploration and analysis in combination with the clinical conditions of the patients. Further studies to explore functions and mechanism of STMN2 in ovarian cancer metastatic process will be performed in our future work.

Data Availability
The data used in the current study are available from the Gene Expression Omnibus (https://www.ncbi.nlm.nih.gov) and Cancer Genome Atlas database (https://cancergenome.nih.gov).