Analysis of CASP12 diagnostic and prognostic values in cervical cancer based on TCGA database

Abstract The present study aims to find a differential protein-coding gene caspase 12 (CASP12) in cervical cancer (CC) based on the (TCGA) database and verify its clinical diagnostic and prognostic values. The transcriptome and clinicopathological data of CC were downloaded from the TCGA database and through screening, we found that PDE2A and CASP12 were independent prognostic factors for CC patients. According to the median expression, the patients were divided into groups with high and low CASP12 and PDE2A expression. There was no difference in survival between PDE2A high and low expression groups (P=0.099), whereas there was a significant difference between CASP12 high and low expression groups (P=0.033). The serum from 68 CC patients (experimental group) and 50 healthy people (control group) was collected to detect the relative expression of CASP12 using qRT-PCR and plotted the ROC curve. The relative expression of CASP12 in the experimental group was significantly lower than in the control group (P<0.05). The area under the curve (AUC) of CASP12 was 0.865. There were statistically significant differences between CASP12 groups with high and low expression in terms of differentiation, lymph node metastasis, tumor size, FIGO staging, and clinical outcomes (P<0.05), but not in terms of age, HPV types and pathological types (P>0.05). The 3-year survival in the CASP12 low expression group was significantly worse than in the CASP12 high expression group (P=0.028). In conclusion, the expression level of CASP12 can be used as a diagnostic and prognostic biomarker for patients with CC.


Introduction
Worldwide, cervical cancer (CC) is the fourth deadliest cancer among women [1]. According to a study in 2018 [2], there were more than 570000 new CC patients in the world and over 300000 deaths in the same period. Another study showed that China had 98000 new CC patients and 30500 new deaths in 2015 [3]. This disease has become more and more common in young females [4,5]. Unapparent and easy to be ignored at an early stage, but when patients are admitted to a hospital, CC is already at a progressive stage. Therefore, patients usually miss the best treatment timeframe. Previous studies have shown [6,7] that surgical treatment in patients with early CC have a 5-year survival rate of 97.5% and do not need later adjuvant treatment, which reduces cost, side effects and improves survival time and quality of life. Therefore, early diagnosis of CC is essential. However, there are only a few diagnostic biomarkers with high specificity for early CC.
Major oncology research programs have been launched to improve next-generation sequencing technology [8,9]. Among them, the Cancer Genome Atlas (TCGA) is the most important. It is designed to draw the genome map of human tumors through large-scale high-throughput technologies for genome sequencing and DNA chip, find new treatments of cancers and improve diagnosis and prevention by exploring the development, progression and potential molecular mechanisms of tumors [10,11]. The present study aims to screen differentially expressed genes in CC based on the TCGA database and verify their clinical diagnostic and prognostic values by collecting and detecting the serum from patients with CC and healthy people, find potential diagnostic biomarkers, and provide references for clinicians.

Data sources
The gene expression data of CC in the TCGA database and the patients' clinical data were downloaded from the Broad Institute's Genome Data Analysis Center (GDAC) Firehose (http://gdac.Broadinstitute.org/). The data were collected from the TCGA repository (http://can-cergenome.nih.gov/cancergenomics/tissuesamples), sequenced and analyzed by standardized treatment schemes. Altogether, 306 cancer and 3 matched paracancerous sample data were obtained using IlluminaHiSeq2000. Perl scripts were used to combine the files into mRNA-symbol matrix files, including protein-coding genes, long non-coding RNA and pseudogenes.

Collection of clinical samples
Sixty-eight patients with CC treated in our hospital from January 2015 to February 2016 were enrolled in the experimental group. These patients had an average age of 51.2 + − 8.2 years, and all the clinicopathological characteristics were collected. Fifty healthy women who underwent a physical examination in our hospital were enrolled in the control group. Their average age was 50.1 + − 8.5 years.

Inclusion criteria
Patients who met FIGO staging [12]; patients with pathology diagnosis of CC; patients with complete clinical data and patients who signed an informed consent form.

Exclusion criteria
Patients who received radiotherapy and chemotherapy; patients with other tumors; patients with severe cardio-cerebral dysfunction; patients who did not cooperate with follow-up visits and patients with immune deficiencies.
There was no statistically significant difference in age between the two groups (P>0.05).

Preprocessing of the TCGA database
Patients whose survival time was less than 90 days were excluded. The log 2 conversion was performed on the gene expression because the gene expression of a single sample was <1 but >0. Also, the log 2 (X+1) conversion was performed on samples to make the data similar to the gene expression because the conversion results could be negative [13]. Relevant information was extracted from clinical files, including patient ID, survival time, survival condition, clinical staging and gene expression of CC patients. The gene expression files were combined with the matrix files and named differentially expressed gene files. Other data were deleted due to excessive loss.

Detection of mRNA expression
Five milliliters of venous blood was collected from fasting patients in the morning. The blood was submerged for 30 min and centrifuged at 3000 rpm for 10 min to obtain a supernatant. TRIzol (Invitrogen; Thermo Fisher Scientific, Inc., U.S.A.) was used to extract the total RNA from the serum. Ultraviolet spectrophotometer and agarose gel electrophoresis were used to detect its purity, concentration and integrity. The total RNA was reverse transcribed into cDNA using 5× TransScript ® All-in-One SuperMix for qPCR and gDNA Remover, with the steps carried out following the manufacturer's kit. The cDNA was stored, part of which was taken for subsequent experiments. A 7900PCR instrument from ABI was used for PCR amplification based on TransScript Two-Step RT-PCR SuperMix (TransGen Biotech, Beijing, China, AQ201-01) kit. The system was as follows: 1 μl cDNA, each 0.5 μl upstream and downstream primers, 12.5 μl of 2× TransTaq ® HIFI PCR SuperMix II and Nuclease-free water added up to 25 μl. The conditions were as follows: pre-denaturation at 94 • C for 30 s, denaturation at 94 • C for 5 s, and annealing at 60 • C for 30 s. Each sample was provided with the same three wells, and the experiment was conducted thrice. GAPDH was used as an internal reference for caspase 12 (CASP12), and 2 − C t was used to analyze the data. CASP12 primers were designed and synthesized by Shanghai GenePharma Co., Ltd. The upstream and downstream primer sequences of CASP12 were 5 -TTCAACAACCGTAACTGCCAGAGTC-3 and 5 -CTGTCAGTGGTGAACCAAACAATCC-3 . Those of GAPDH were 5 -CACCCACTCCTCCACCTTTG-3 and 5 -CCACCACCCTGTTGTTGTAG-3 .

Statistical analysis
The edgeR package in R was used to analyze the differential expression of sample genes in the TCGA database, and the data met the requirements of P<0.001 and fold change = 4. We performed univariate and multivariate Cox regression analyses on differentially expressed gene files. SPSS 20.00 edition was used to analyze the data statistically, Graph Pad, 7 to plot figures. K-S test was used to analyze data distribution, and data confirming normal distribution was tested using a t test, while comparison between groups was made using independent sample t test. The Pearson χ 2 test was used to assess the association between CASP12 expression and clinicopathological parameters. Kaplan-Meier and Log-rank tests were used to analyze the survival rate, and the ROC curve to plot the diagnostic value of CASP12 in CC. CASP12 has a diagnostic value in CC when the area under the curve (AUC) is >0.5. P<0.05 indicates a statistical difference.

Differentially expressed genes in the TCGA database
In the present study, 590 differentially expressed genes were found using screening, 180 were highly expressed and 410 were lowly expressed. Five high and low expressed genes, with the most significant difference, are shown in Table  1 and Figure 1.

Cox analysis of survival
According to the univariate Cox regression analysis, there were 36 factors with differences. From those, ten with the most significant differences were CENPM, NTRK3, CD300LG, PTTG1, KIAA0101, PCP4, CASP12, CLEC3B, HAND2 and TP73. According to the multivariate analysis of the 36 factors, PDE2A and CASP12 with differences were independent prognostic factors for patients with CC. The patients were divided into high and low expression groups according to the median expression of PDE2A and CASP12, and the survival curves were plotted. There was no difference in survival between the PDE2A groups with high and low expression (P=0.099), whereas there was a significant difference between the CASP12 groups with high and low expression (P=0.033). More details are shown in Tables 2 and 3, and Figure 2.

Expression and diagnostic value of CASP12 mRNA
According to the qRT-PCR, the relative expression of CASP12 in the experimental group (patients with CC) was significantly lower than in the control group (healthy people) (P<0.05). More details are shown in Figure 3. According to the ROC curve, the AUC of CASP12 was 0.865, 95CI%: 0.799-0.932, the specificity was 67.64%, and the sensitivity was 98.00%, with a Youden index of 65.65%, and a cut-off value > 0.959. More details are shown in Figure 4.

Comparison of clinical data
According to the median expression of CASP12, the patients were divided into high (n=34) and low (n=34) expression groups. Then the correlation between the levels of CASP12 and clinicopathological parameters was analyzed to determine the clinical relevance of CASP12 expression in CC. The results showed that CASP12 expression was significantly associated with differentiation, lymph node metastasis, tumor size, FIGO staging and clinical outcomes (P<0.05), but not with age, HPV types and pathological types (P>0.05). More details are shown in Table 4.

Patient survival
All CC patients (n=68) were followed for 3 years, and the 3-year survival rate was 73.5%. Then the patients were divided into two groups: a high CASP12 expression group (above the median CASP12 expression, n=34) and a low CASP12 expression group (below the median CASP12 expression, n=34). The results of the Kaplan-Meier analysis and log-rank test indicated that the 3-year survival in the CASP12 low expression group was significantly worse than in the CASP12 high expression group (P=0.028). More details are shown in Figure 5.

Figure 2. Survival in high and low expression groups
There was no difference in survival between the PDE2A groups with high and low expression (P=0.099), whereas there was a significant difference between the CASP12 groups with high and low expression (P=0.033).

Discussion
The incidence of CC, a common malignant tumor of the female reproductive system, is second only to breast cancer among female malignant tumors [14]. The disease is becoming more and more common in young females, which poses a severe threat to females' health and their quality of life [15,16]. Early diagnosis and treatment are conducive to treatment. For example, previous studies have shown that patients with CC have a high postoperative survival rate if diagnosed early, but they are usually diagnosed in the middle and advanced stages, thus missing the best treatment timeframe [17,18]. Therefore, early diagnosis and screening are extremely important. More and more gene sequencing programs have been launched to improve next-generation sequencing technology. The TCGA database, which is the most widely used, detects the key genes of 33 cancers comprehensively and draws multidimensional maps [19,20]. In this study, 306 CC samples and 3 matched paracancerous samples from the TCGA database were used to screen out potential differential protein-coding genes and observe their prognostic and diagnostic values. A total of 590 differentially expressed genes were found, 180 were highly expressed, and 410 were lowly expressed. Differential gene expression matrix files were established according to the screened differentially expressed genes and the clinical data of the samples, and then subjected to univariate and multivariate Cox regression analyses. The univariate analysis showed 36 genes with differences. Protein genes were screened, but not pseudogenes and long non-coding RNA were eliminated. According to the multivariate analysis, PDE2A and CASP12 were independent prognostic factors for CC. Additionally, there was no difference in the survival between the PDE2A groups  with high and low expression, whereas the survival in the CASP12 low expression group was significantly worse than in the CASP12 high expression group. In summary, CASP12 was dramatically dysregulated in CC tissues through the TCGA database, and it was further proven to serve as a potential prognostic factor using the univariate (P=0.002) and multivariate Cox regression analysis (P=0.004). The data of the survival curve revealed a significant difference between the CASP12 groups with high and low expression (P=0.033). Therefore, we chose CASP12 for follow-up experiments to verify its potential prognosis value in CC. CASP12, a cysteine protease [21], is widely expressed in 14 tissues such as the ovaries and endometrium and is highly correlated with members of the ICE subfamily [IL-1 invertase: IL-1β-converting enzyme (ICE)] that treats inflammatory cytokines [22,23]. In a study by Marshall et al. [24], the expression of CASP12 was significantly down-regulated during an infection, and there are protective factors between its polymorphism and rheumatoid arthritis. Therefore, CASP12 role in inflammation is widely known [25]; however, there are currently few studies on CASP12 in cancers. According to Long et al. [26], the high expression of CASP12 promotes apoptosis of intestinal epithelial cells, indicating that rapid proliferation of tumor cells could be related to the inhibition of CASP12 activity. In a study by Chu et al. [27], inhibition of CASP12 expression causes an increase in the invasiveness of nasopharyngeal carcinoma cells, and in a study by Cheng et al. [28], overexpression of CASP12 inhibits the proliferation of hepatoma HEP-J5 cells. Currently, there is no study on the correlation of CASP12 with CC. Therefore, the expression of serum CASP12 mRNA in patients with CC and healthy people were assessed in the present study, which was easy to operate. Furthermore, it provides potential diagnostic and prognosis indicator for CC. In the present study, the relative expression of CASP12 mRNA in the control group (healthy people) was significantly higher than in the experimental group (patients with CC), and the AUC of CASP12 was 0.865, revealing that CASP12, which has a high diagnostic value for CC and high sensitivity, has potential to become a diagnostic indicator for CC.
In the present study, there were statistically significant differences between the groups of CASP12 mRNA with high and low expression in terms of differentiation, lymph node metastasis, tumor size, FIGO staging and clinical outcomes. The 3-year survival rate in the CASP12 mRNA high expression group was significantly higher than in the CASP12 mRNA low expression group. These findings suggest that CASP12 can be used as a prognostic indicator in patients with CC. In the present study, the expression and clinical value of CASP12 in CC were verified through the TCGA database and clinical experiments. However, the correlation of CASP12 with the development and progression of CC remains unclear, which is the direction of future research. It is expected that the mechanism of action of CASP12 on CC could be explored through basic experiments, and to confirm the findings of this experiment. In summary, data mining and clinical experiments revealed and verified the differential expression of CASP12 in CC, and the expression level of CASP12 can be used as a diagnostic and prognostic biomarker in patients with CC.

Author Contribution
We would like to thank Miss Zhang Beilei for her contribution to the bioinformatics section of this article and Miss Chen Caizhi for her commitment to following-up on the patients. Lastly, thanks to Professor Zou Wen for his revision of the article.