Bioinformatics analysis to identify the key genes affecting the progression and prognosis of hepatocellular carcinoma

Hepatocellular carcinoma (HCC) is the most frequent primary liver cancer, which has poor outcome. The present study aimed to investigate the key genes implicated in the progression and prognosis of HCC. The RNA-sequencing data of HCC was extracted from The Cancer Genome Atlas (TCGA) database. Using the R package (DESeq), the differentially expressed genes (DEGs) were analyzed. Based on the Cluepedia plug-in in Cytoscape software, enrichment analysis for the protein-coding genes amongst the DEGs was conducted. Subsequently, protein–protein interaction (PPI) network was built by Cytoscape software. Using survival package, the genes that could distinguish the survival differences of the HCC samples were explored. Moreover, quantitative real-time reverse transcription-PCR (qRT-PCR) experiments were used to detect the expression of key genes. There were 2193 DEGs in HCC samples. For the protein-coding genes amongst the DEGs, multiple functional terms and pathways were enriched. In the PPI network, cyclin-dependent kinase 1 (CDK1), polo-like kinase 1 (PLK1), Fos proto-oncogene, AP-1 transcription factor subunit (FOS), serum amyloid A1 (SAA1), and lysophosphatidic acid receptor 3 (LPAR3) were hub nodes. CDK1 interacting with PLK1 and FOS, and LPAR3 interacting with FOS and SAA1 were found in the PPI network. Amongst the 40 network modules, 4 modules were with scores not less than 10. Survival analysis showed that anterior gradient 2 (AGR2) and RLN3 could differentiate the high- and low-risk groups, which were confirmed by qRT-PCR. CDK1, PLK1, FOS, SAA1, and LPAR3 might be key genes affecting the progression of HCC. Besides, AGR2 and RLN3 might be implicated in the prognosis of HCC.


Background
Hepatocellular carcinoma (HCC) is the most frequent primary liver cancer and causes the most deaths in cirrhosis patients [1]. HCC usually occurs in people with chronic liver inflammation, which is closely related to virus infection or alcohol and aflatoxin exposure [2]. Epidemiological data show that the main risk factors for HCC are as follows: (i) Hepatitis B or hepatitis C virus infection. (ii) Aflatoxin. (iii) Drinking wastewater or pond water containing a large amount of organochlorine compounds and algae toxins. (iv) Other factors such as family aggregation, selenium deficiency, alcoholic and nutritional cirrhosis [2,3]. The therapeutic schedules of HCC depend on disease stage, operative tolerance, and possibility of liver transplant [3,4]. The outcome of HCC patients is usually poor, because 80-90% HCCs cannot be resected completely and leads to death in 3-6 months [5,6]. HCC is amongst the most common tumors and results in more than 670000 deaths per year globally [7]. Therefore, the mechanisms of HCC needed to be explored to improve its therapies. A lot of studies indicated that Forkhead box C1 (FoxC1), αB-crystallin, Zinc finger and BTB domain containing 20 (ZBTB20), Dysregulated B-cell translocation gene 1 (BTG1), Homeobox A13b (HOXA13), DEK proto-oncogene (DEK), Ubiquitin-specific protease 7 (USP7), and Acyl-CoA Ligase 4 (ACSL4) played important roles in the occurrence and progression of HCC, and they may serve as novel prognostic factors and therapeutic targets for HCC [8][9][10][11][12][13][14][15][16][17][18]. FoxC1 may facilitate HCC metastasis via inducing epithelial-mesenchymal transition (EMT) and up-regulating neural precursor cell expressed, developmentally down-regulated 9 (NEDD9) [8,9]. The expression of αB-crystallin has correlations with the invasion and metastasis of HCC cells [10]. ZBTB20 expression was up-regulated in HCC and related to adverse prognosis in HCC patients [11,12]. BTG1 may be involved in hepatocarcinogenesis and may be taken as a biomarker for the carcinogenesis and progression of HCC [13]. HOXA13 may affect angiogenesis, progression, and outcome of HCC, and serum HOXA13 may be utilized for early diagnosis and prognosis prediction of HCC patients [14]. DEK is reported to be implicated in hepatocyte differentiation and acts as a candidate marker for the prognosis and staging of HCC [15,16]. USP7 and ACSL4 are up-regulated in HCC samples, which are related to a poor survival [17,18]. Despite these findings, the key genes having influences on the progression and prognosis of HCC patients have not been comprehensively revealed.
Various in silico tools of bioinformatics are widely applied for assessing gene expression levels and screening outstanding genes from RNA-sequencing data and next-generation sequencing data and their possible implication in growth of different types of cancer [19]. In the present study, differential expression analysis, enrichment analysis, network analysis, and survival analysis successively were conducted to find out the key genes that affected the prognosis of HCC. In addition, quantitative real-time reverse transcription-PCR (qRT-PCR) experiments were conducted to confirm the expression of key genes. The present study might help to predict the outcome of HCC patients and develop novel therapies.

Microarray data
The RNA-sequencing data of HCC was obtained from The Cancer Genome Atlas (TCGA, https://cancergenome.nih. gov/) database, which was based on the platform of llumina HiSeq 2000 RNA Sequencing. The TCGA dataset contained 419 samples, including 369 HCC samples and 50 normal control samples [20]. Meanwhile, the corresponding clinical data of the HCC samples were downloaded from the TCGA database.

Data preprocessing and differential expression analysis
The genes with counts per million (cpm) < 1 in more than 10% samples were taken as low expressed genes and removed. According to the annotation files in GENCODE database (version 22) [21], ensemble gene IDs (EN-SGs) were mapped to gene symbols and gene types (coding genes or non-coding genes). Using the R package DE-Seq (http://www.bioconductor.org/packages/release/bioc/html/DESeq.html) [22], the differentially expressed genes (DEGs) between HCC samples and normal samples were analyzed. The thresholds for differential expression analysis were set as |log fold-change (FC)| ≥ 2 and false discovery rate (FDR) < 0.01.

Functional and pathway enrichment analyses
Gene Ontology (GO) database introduces the biological process (BP), cellular component (CC), and molecular function (MF) for proteins [23]. Kyoto Encyclopedia of Genes and Genomes (KEGG) database included the functions of genes and can be utilized for the functional prediction of gene lists [24]. Using Cluepedia plug-in (http://apps. cytoscape.org/apps/cluepedia) [25] in Cytoscape software, the protein-coding genes amongst the DEGs were performed with GO and KEGG enrichment analyses. The terms with FDR < 0.05 were significant results. Besides, the significant KEGG pathways were presented by the R package pathview (http://r-forge.r-project.org/projects/pathview/) [26].

Protein-protein interaction network analysis
Protein-protein interaction (PPI) pairs were predicted for the protein-coding genes amongst the DEGs using the STRING (http://www.string-db.org/) [27] database, with the combined score set as 700. Afterward, the PPI network was visualized using Cytoscape software (http://www.cytoscape.org) [28]. Moreover, the MCODE plug-in (http:// apps.cytoscape.org/apps/MCODE) [29] in Cytoscape software was used for identifying the significant modules in PPI network.

Patient samples
Clinical samples were collected from 26 HCC patients who underwent surgery at the Central South University Xiangya School of Medicine Affiliated Haikou Hospital from 2013 to 2017. Meanwhile the adjacent non-tumor liver tissues from the HCC patients were obtained as the normal controls. None of the patients accepted radiation therapy and/or chemotherapy before surgery and all of them have signed the informed consent. Ethical approval for the study was provided by the ethics committee of Central South University Xiangya School of Medicine Affiliated Haikou Hospital, and the research has been carried out in accordance with the World Medical Association Declaration of Helsinki.

qRT-PCR analysis
The total RNAs of the samples were extracted using TRIzol (Thermo Fisher Scientific, Waltham, MA, U.S.A.), detected by UV absorbance (A 260/280 ) and agarose gel. qRT-PCR was carried out in the ABI Prism 7500 PCR system (Applied Biosystems, Foster City, CA, U.S.A.) following the standard instructions. GADPH acted as the internal criterion for the targetted genes. 2 − c t method was taken to calculate the relative expressions of the targetted genes. The primers' sequences for GADPH were (F) 5 -GCACCGTCAAGGCTGAGAAC-3 and (R) 5 -GCCTTCTCCATGGTGGTGAA-3 . PCR was performed in three parallel holes.

Survival analysis
Based on univariate COX regression analysis [30], the correlations between the DEG and overall survival (OS) were analyzed. Combined with the R package survival (https://CRAN.R-project.org/view=Survival) [31], the HCC samples were divided into high-and low-risk groups and then the OS differences between the two groups were analyzed using Kaplan-Meier (KM) method [32]. Student's t tests and one-way ANOVAs were used in either two or multiple groups for statistical significance, with P<0.05 considered significant.

Differential expression analysis
Relative to normal samples, a total of 2193 DEGs (including 1964 up-regulated genes and 229 down-regulated genes) in HCC samples were screened. Amongst these DEGs, there were 1800 protein-coding genes and 232 long non-coding RNAs (lncRNAs).

Survival analysis
Based on univariate COX regression analysis, a total of 116 DEGs were found to be correlated with the OS of HCC samples (P-value <0.05). According to the median value of gene expression, the HCC samples were divided into highand low-risk groups. Subsequently, the OS differences between the two groups were analyzed. KM survival curves showed that 12 genes (including anterior gradient 2, AGR2; cysteine-rich secretory protein 2, CRISP2; interleukin 31 receptor A, IL31RA; long intergenic non-protein coding RNA 1477, LINC01477; RLN3; SRY-box 14, SOX14; transmembrane epididymal protein 1, TEDDM1; von Willebrand factor A domain containing 5B2, VWA5B2; zinc finger protein 280A, ZNF280A; potassium voltage-gated channel subfamily C member 2, KCNC2; LOC105372556; and plasmalemma vesicle-associated protein, PLVAP) could divide the HCC samples into two groups that had OS differences (P-value <0.05) (Figure 3).

Expressions of genes differentiated the survival differences in HCC in clinical samples
We, then, examined the expressions of genes differentiating the survival differences of HCC in HCC tissues and adjacent normal tissues by qRT-PCR. As shown in Figure 4, the expressions of AGR2, CRISP2, IL31RA, LINC01477, RLN3, SOX14, TEDDM1, VWA5B2, and ZNF280A were significantly increased in HCC tissues compared with the normal tissues (P-value <0.05) (Figure 4), which were consistent with the results of differential expression analysis.

Discussion
In the present study, a total of 2193 DEGs (including 1800 protein-coding genes and 232 lncRNAs) in HCC samples were identified. For the protein-coding genes amongst the DEGs, functional and pathway enrichment analyses were carried out. In the PPI network, CDK1 (degree = 76), PLK1 (degree = 72), FOS (degree = 63), SAA1 (degree = 59), and LPAR3 (degree = 59) were key nodes. A total of 40 network modules were identified, amongst which 4 modules were with scores no less than 10. Survival analysis suggested that 12 genes (including AGR2 and RLN3) could divide the HCC samples into high-and low-risk groups. Furthermore, the expressions of nine genes (including AGR2 and RLN3) were confirmed by qRT-PCR experiments.
CDK1 interacts with apoptin during HCC tumorigenesis, and their link may play a role in mediating tumor cell apoptosis [33]. Via directly suppressing CDK1 and v-akt murine thymoma viral oncogene homolog 3 (AKT3) expression and indirectly inhibiting cyclinD1 expression; miR-582-5p functions in the development and progression of HCC [34]. PLK1 expression is significantly up-regulated in HCC tissues, therefore, PLK1 may be a potential marker for the prognosis of HCC and targetting PLK1 may be applied for the diagnosis and therapy of the disease [35,36]. PLK1 is overexpressed in HCC samples relative to normal controls, and its knockdown can induce apoptosis of tumor cells via the endonuclease-G pathway [37]. FOS expression is inhibited by miR-139 down-expression in HCC cells with high metastatic potency, which promotes the metastasis in HCC [38]. Through suppressing the expression of the oncogene FOS, dysregulated miR-101 is implicated in the pathogenesis of HCC [39]. CDK1 had interactions with both PLK1 and FOS in the PPI network, suggesting that CDK1 might be involved in the development and progression of HCC through interacting with PLK1 and FOS.
The preoperative serum level of SAA is closely correlated with tumor size and tumor stage, implicating that SAA overexpression can serve as a promising prognostic factor for HCC patients [40,41]. The LPAR1/LPAR3 expression is increased in hepatoma cell line SKHep1, and the LPA-LPAR3 signaling may play an essential role in tumor invasiveness/expansion [42]. Several LPAR subtypes are detected in HCC samples, and the suppression of LPA-LPAR signaling represses the motility and proliferation of HCC cells [43]. LPAR3 could also interact with FOS and SAA1 in the PPI network, indicating that LPAR3 might function in pathogenesis of HCC via interacting with FOS and SAA1.
High AGR2 expression level contributes to the metastasis of HCC cells through acting on mitogen-activated protein kinase (MAPK) and Caspase pathway, which results in the unfavorable prognosis of HCC patients [44]. AGR2 overexpression is found in various tumors including fibrolamellar HCC, and the dysregulated AGR2 is a phenotypic characteristic of cholangiocytes [45,46]. RLN2 expression is found to be up-regulated in HCC tissues, which can be taken as a predictor for tumor progression and adverse prognosis [47]. Therefore, AGR2 and RLN3 might be correlated to the prognosis of HCC patients.

Conclusion
In conclusion, a total of 2193 DEGs in HCC samples were identified. Besides, CDK1, PLK1, FOS, SAA1, LPAR3, AGR2, and RLN3 might play important roles in the progression and prognosis of HCC. Nevertheless, lacking experiments was main limitation of the present study and more experiments should be designed to support our results.