Identification and verification of three key genes associated with survival and prognosis of COAD patients via integrated bioinformatics analysis

Abstract Background: Colorectal cancer (CRC) is the third most lethal malignancy in the world, wherein colon adenocarcinoma (COAD) is the most prevalent type of CRC. Exploring biomarkers is important for the diagnosis, treatment, and prevention of COAD. Methods: We used GEO2R and Venn online software for differential gene screening analysis. Hub genes were screened via Search Tool for the Retrieval of Interacting Genes/Proteins (STRING) and Cytoscape, following Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) enrichment analysis. Finally, survival analysis and RNA expression validation were performed via UALCAN online software and real-time PCR. Immunohistochemistry (IHC) was performed to verify the protein expression level of hub genes from tissues of COAD patients. Results: In the present study, we screened 323 common differentially expressed genes (DEGs) from four GSE datasets. Furthermore, four hub genes were selected for survival correlation analysis and expression level verification, three of which were shown to be statistically significant. Conclusion: Our study suggests that Serpin Family E Member 1 (SERPINE1), secreted phosphoprotein 1 (SPP1) and tissue inhibitor of metalloproteinase 1 (TIMP1) may be biomarkers closely related to the prognosis of CRC patients.


Introduction
Colorectal cancer (CRC) originates in the colon or rectum [1], and typical symptoms include alternating diarrhea, constipation, mucus, bloody stools, as well as weight loss [2]. According to annual reports on global cancer statistics, nearly 1.4 million newly diagnosed cases and 700000 deaths occur worldwide due to CRC [3]. The 5-year survival rate of patients with CRC at its early stage is 90%, while those at later stages show a rate of no more than 12% [4,5]. Colon adenocarcinoma (COAD) is one of the most prevalent types of CRC, the incidence and mortality of which was 10.2 and 9.2%, respectively, in 2018 [6,7]. Therefore, it is very important to select and identify the specific biomarkers of COAD for its early diagnosis, development of an efficient treatment strategy, and assessment of patient prognosis.

Tissue samples and immunohistochemistry
Sixty paired primary cancer tissues and adjacent tissues from colon cancer patients who underwent surgery between September 2011 and April 2020, were collected at the Tianjin Xiqing Hospital. No chemotherapy and radiotherapy was given to patients before surgery. All the pathology materials were reviewed independently by two pathologists. Informed consent was obtained from each patient, and the collection of tissue specimens was approved by the Institutional Review and Ethics Boards of the Tianjin Xiqing Hospital. The two-step EnVision method was used to perform immune histochemical experiments, along with different primary antibodies against CXCL2 (1:50), SER-PINE1 (1:50), TIMP1 (1:50) and SPP1 (1:50). The slides were analyzed using the Image-Pro Plus software program (Media Cybernetics, Inc. U.S.A.).

Statistical analysis
Statistical analysis was performed using SPSS 22.0 and GraphPad Prism 8.0 software. Results were presented as the mean + − standard deviation. The statistically significant changes were indicated with asterisks and the P-values were calculated via Student's t test, wherein *, **, and *** represented P<0.05, P<0.01 and P<0.001, respectively.

Identification of DEGs in COAD
In the present study, we selected and downloaded four GEO datasets and extracted DEGs on the basis of the cut-off criteria. The present study covered 64 COAD and 73 normal colorectal tissues in total. The results showed that 2729, 957, 1864, and 617 genes were down-regulated while 1917, 770, 1282, and 469 genes were up-regulated in GSE37364, 41328, 81558, and 110250, respectively. By using the Venn diagram software, we detected a total of 323 common DEGs in the COAD samples, composed of 111 up-regulated genes and 212 down-regulated genes ( Table 1 and Figure  1).

GO and signaling pathway enrichment analysis
We analyzed the DEGs via the DAVID software and identified 125 significant enrichment terms, including BP (71), MF (24), and CC (26). With regard to BP, DEGs were seen to be significantly enriched during cell adhesion, negative regulation of cell proliferation, angiogenesis, cytoskeleton organization, steroid hormone-mediated signaling pathway, inflammatory response, negative regulation of protein kinase activity, and so on. Whereas, for MF, DEGs were significantly enriched during chemokine activity, hormone activity, CXCR chemokine receptor binding, receptor binding, calcium ion binding, structural constituent of cytoskeleton, protein kinase inhibitor activity, oxidoreductase activity etc. With regard to CC, DEGs were significantly enriched in extracellular space, apical plasma membrane, extracellular exosome, extracellular region, cell surface, plasma membrane, basal plasma membrane, and so forth. The top 20 GO terms are depicted in Figure 2A-C. Furthermore, KEGG pathway analysis indicated that the DEGs were enriched in 11 pathways including nitrogen metabolism, bile secretion, proximal tubule bicarbonate reclamation, cytokine-cytokine receptor interaction, aldosterone-regulated sodium reabsorption, mineral absorption, phosphoinositide-3-kinase (PI3K)-Akt signaling pathway, PPAR signaling pathway etc. ( Figure 2D).

Construction of PPI network and module analysis
Via the STRING and Cytoscape analysis, we drew the PPI network complex constructed with 268 nodes and 709 edges, consisting of 97 down-regulated and 171 up-regulated genes. Then we conducted further analyses by applying Cytoscape MCODE plus, and found 32 central nodes that included 20 up-regulated genes and 12 down-regulated genes (Figures 3 and 4 and Table 2).

Selection of hub genes and validation of the expression levels
We used UALCAN to identify the survival data of 32 central genes and found that the expression levels of four hub genes, namely CXCL2, SERPINE1, SPP1, and TIMP1, were remarkably related to the survival of COAD patients ( Figure 5). Furthermore, we validated the expression level of these four hub genes in COAD and normal tissues via UALCAN. Results showed that expression levels of the four genes were significantly increased in the COAD samples as compared with the normal samples ( Figure 6). Furthermore, we verified the mRNA levels of the four hub genes in NCM460, HCT116, HCA-7, HT-29, and SW480 cells. The real-time PCR results showed that the expression levels of SPP1, TIMP1, and SERPINE1 mRNAs were significantly increased in the HCT116, HCA-7, HT-29, and SW480 cell lines as compared with the NCM460 cell line (Figure 7). Moreover, CXCL2 expression was higher in the HCA-7, HT-29, and SW480 cells, but not in the HCT116 cells when compared with NCM460. In addition, we also analyzed the protein expressions of the four hub genes by using tissue microarray. The results indicated that SPP1, TIMP1, and SERPINE1 expression appeared to be remarkably higher in CRC tissues than in the adjacent tissues, while the expression of CXCL2 was not statistically different between cancer tissues and adjacent tissues (Figure 8).

Figure 3. PPI network constructed via STRING and Cytoscape
Each node indicates a protein module; the edges represent proteins interaction; blue circles denote up-regulated DEGs while red circles represent down-regulated DEGs.

Discussion
CRC is the third main cause of deaths from malignant cancers worldwide, and the fourth most common cancer in China [15,16]. One of the most common types of CRC is COAD, which originates in the superficial glandular epithelial cells [6]. Previous studies have indicated that the mechanism behind the tumorigenesis and development of COAD is a complicated process, including chromosomal instability, dysregulation of oncogenes and tumor suppressor, epigenetic alterations, metabolic alterations, abnormal immune response etc. [17].   In the present study, we screened DEGs via GEO2R, and selected 111 up-regulated genes and 212 down-regulated genes found in COAD tissues. Then we analyzed their enrichment in GO and signaling pathways and found that the DEGs were enriched in 11 pathways, including the PI3K-Akt signaling pathway. Previous studies showed that several signal pathways could be involved in the progress and development of CRC [18][19][20][21][22][23]. PI3K/AKT is one of the classic pathways responsible for multiple BPs, such as cell proliferation, differentiation, and migration. As a key effector of PI3K, Akt phosphorylation is closely related to CRC cell proliferation and apoptosis inhibition [18]. The specific inhibition of the PI3K/AKT signaling pathway could result in a decrease in proliferation and increase in apoptosis of the cells [24].
Furthermore, we constructed a PPI network and performed survival analysis along with verification of the RNA expression levels via the GEO database, DAVID and STRING online tools, Cytoscape software, and UALCAN online analysis. After the above series of operations, four hub genes, CXCL2, SERPINE1, SPP1, and TIMP1, were selected. Furthermore, we tested the expression of these four genes at the mRNA as well as protein levels and found that SPP1, TIMP1, and SERPINE1 had higher expression levels in COAD cell lines than the colon epithelial cell line as well as in CRC tissue compared with para-cancerous tissue. SPP1, named as osteopontin, is a phosphoprotein secreted by malignant cells. Cheng et al. [25] found that SPP1 has a higher expression level in CRC tissues as compared with normal tissue, wherein its expression is closely associated with lymph node metastasis and the TNM stage. Inhibition of SPP1 in the HCT116 cells consequently resulted in the suppression of cellular growth, migration, and invasion. SPP1 overexpression could thus enhance the proliferation, migration, and invasion of CRC cells as well as the activation of the PI3K pathway, accompanied by the increased expression of Snail and matrix metalloproteinase 9 (MMP9) [26]. As one of the transcription factors of epithelial-mesenchymal transition (EMT), snail can inhibit the expression of E-cadherin, thereby promotes the activation of EMT in tumor cells [27]. MMP9 is one of the members in the MMP family which is a class of zinc-dependent proteases with the main function of participating in the degradation of extracellular matrix. MMP9 is often overexpressed in malignant tumors and is associated with aggressive malignant phenotypes and poor prognosis in cancer patients [28]. Similarly, SPP1 could not only promote CRC cell growth and migration, but it also inhibited autophagy, which may be thus achieved by activating the p38 mitogen-activated protein kinase (MAPK) signaling pathway [29]. In addition, a study by Choe et al. [30] also confirmed higher SPP1 expression along with shorter survival times in CRC patients.
TIMP1 is one of the members of the TIMP family which can bind to MMP substrates and mediate MMP degradation that is necessary for matrix renewal [31]. TIMP1 has been shown to be highly expressed in many types of cancer, consequently correlated to their poor prognosis [32][33][34][35][36]. TIMP1 may also promote liver metastasis in CRC [37]. After chemotherapy, high expression levels of TIMP1 in the serum have been closely related to the decline over survival (OS) [38]. Moreover, knockdown of TIMP1 leads to decreased proliferation of HT29 cells [39]. Huang et al. [40], Zeng et al. [41], and Zuo et al. [42] demonstrated that the expression levels of TIMP1 may be negatively correlated with the prognosis of COAD patients. Song et al. [43] found that TIMP1 knockdown could result in decreased proliferation, migration, and invasion as well as increased apoptosis in vitro, whereas there is weakened tumorigenesis and metastasis in vivo via the FAK-PI3K/Akt and MAPK pathways. The elevated TIMP1 levels in serum were related to lower tumor grade and shorter overall survival times of CRC patients [44]. SERPINE1, also called PAI-1, can regulate plasmin formation from plasminogen [45]. The expression level of SER-PINE1 was seen to increase in CRC tissue and colon cancer cell lines showing active proliferation [46]. Moreover, overexpression of SERPINE1 led to decreased apoptosis and caspase-3 amidolytic activity in vascular smooth muscle cells (VSMCs) compared with control [47]. LC-MS/MS analysis has shown that SERPINE1 may be a peripheral participant in the apoptosis of CRC cells [48]. Shun et al. found that the expression level of SEPINE1 was remarkably up-regulated in tumor stroma and associated with the relapse of CRC stage II-III patients [49]. Several recent studies have suggested that SERPINE1 may be associated with prognosis of CRC patients [41,50,51].
CXCL2 belongs to the chemokine superfamily, and is responsible for the attraction of leukocytes to inflammatory sites apart from the regulation of angiogenesis, tumorigenesis [52]. Dietrich et al. found that there was significantly increased expression of CXCL2 in CRC, in contrast with normal tissue [53]. Further, there was increased expression of mRNA and protein levels of CXCL2 in colon cancer stem cells, whereas CXCL2 knockdown led to the decreased expression of EMT markers, MMPs, and Gai-2, which consequently promoted tumor initiation and development [54]. Down-regulation of CXCL2 destroyed the proliferation and colony formation in CRC cells [55]. Moreover, CXCL2 remarkably promoted the proliferation and migration of human umbilical vein endothelial cells (HUVECs) [56]. But the qPCR results showed no significant difference in CXCL2 expression between HCT116 and NCM460 cell lines. This may be due to the source of the cell lines; more cell lines should thus be selected for verification in future studies.
In summary, in the present study, we screened 323 DEGs from four COAD GEO datasets and identified SER-PINE1, SPP1, and TIMP1 as key genes which may be closely related to the prognosis of COAD in patients. Further investigations need to be performed to explore the mechanisms of these genes. These results would consequently provide candidate targets for the diagnosis and prognosis of CRC in patients, as well as new ideas for treatment.
The visualization of GO analysis, including BP, MF and CC of DEGs, was performed via the online software ImageGP (http://www. ehbio.com/ImageGP/index.php/Home/Index/index).

Author Contribution
Yong Liu conceived the idea behind the present paper. Chao Li and Xuewei Chen collected and analyzed the data. Lijin Dong and Ping Li drafted the paper. Rong Fan revised the final paper.

Ethics Approval
Clinical colon cancer samples were obtained from the Tianjin Xiqing Hospital, with informed written consents from the patients or their guardians and the study protocol was approved by the Institutional Review Board of Tianjin Xiqing Hospital.