An interactive network of alternative splicing events with prognostic value in geriatric lung adenocarcinoma via the regulation of splicing factors

Abstract Alternative splicing (AS), an essential process for the maturation of mRNAs, is involved in tumorigenesis and tumor progression, including angiogenesis, apoptosis, and metastasis. AS changes can be frequently observed in different tumors, especially in geriatric lung adenocarcinoma (GLAD). Previous studies have reported an association between AS events and tumorigenesis but have lacked a systematic analysis of its underlying mechanisms. In the present study, we obtained splicing event information from SpliceSeq and clinical information regarding GLAD from The Cancer Genome Atlas. Survival-associated AS events were selected to construct eight prognostic index (PI) models. We also constructed a correlation network between splicing factors (SFs) and survival-related AS events to identify a potential molecular mechanism involved in regulating AS-related events in GLAD. Our study findings confirm that AS has a strong prognostic value for GLAD and sheds light on the clinical significance of targeting SFs in the treatment of GLAD.


Introduction
Cancer morbidity and mortality rates are rapidly increasing worldwide. Lung cancer is the most common cause of cancer-related mortality [1]. In 2018, the numbers of new cases and deaths globally for lung carcinomas were 2.1 million (11.6% of the total cancer cases) and 1.8 million (18.4% of total cancer deaths), respectively [2]. In the new World Health Organization (WHO) classification, non-small cell lung carcinomas (NSCLCs) are classified into squamous cell carcinomas, adenocarcinomas, large cell carcinomas, and mixed cell carcinomas. Among them, adenocarcinoma is the most common type, accounting for approximately 60% of NSCLCs [3]. According to the latest statistics from the American Cancer Society, lung cancer mainly occurs in the elderly. Most people diagnosed with lung cancer are 65 or older; very few people are diagnosed under 45 years of age. The average age at diagnosis is approximately 70 [1]. Considering that elderly patients always have more complications and poor prognosis, many clinical trial inclusion criteria exclude elderly patients. Thus, more attention should be paid to geriatric patients. A study analyzing the genetic characteristics of 184 patients with lung adenocarcinoma showed that distinctive genetic profile including the Kristen rat sarcoma viral oncogene, serine/threonine kinase 11 (STK11), and epidermal growth factor receptor (EGFR) exon 20 mutation were common in the older patient group. However, EGFR/tumor protein 53 (TP53) mutations, anaplastic lymphoma kinase (ALK), and human epidermal growth factor receptor 2 (HER2) genetic alterations were more prevalent in younger patients [4]. Currently, tumor biomarkers, tumor stages, and molecular markers are common indicators that predict the prognosis of patients with lung adenocarcinoma (LUAD) [5][6][7]. However, the number of biomarkers that can be used clinically is limited and no prognostic model was built exclusively for elderly patients. Therefore, novel and effective prediction methods are needed to predict the prognosis of GLAD.

Evaluation of the prognostic value of the prognostic index models
Based on the median value of the risk score calculated by the PI model, cases were divided into high-risk and low-risk groups. Kaplan-Meier curve (K-M) analysis was used to describe survival probabilities. To certify the reliability of the model in predicting prognosis, the survival receiver-operator characteristic (ROC) package in R was used to calculate the area under the curve (AUC) of the ROC curve for each model. Models with AUC > 0.7 were more effective models. We then substituted data from non-elderly lung adenocarcinoma patients (age 33-64) into the PI models to demonstrate that AS events differed between elderly and non-elderly lung adenocarcinoma patients. Differences that may arise from the differential expression of genes in elderly and non-elderly patients.

Correlation network between alternative splicing and splicing factors
SFs played a significant role in regulating splicing events. We obtained the information regarding SFs from the database SpliceAid2 (http://www.introni.it/splicing.html). The mRNA expression of SFs in geriatric lung adenocarcinoma was downloaded from the TCGA database. Survival-related SFs were screened by univariate Cox regression analysis. Pearson correlation analysis was used to analyze the correlation between survival-related SFs and AS events with independent predictive significance (|correlation coefficient| > 0.6, P<0.001). We explored the correlation network of survival-associated SFs and prognosis-associated AS events to further understand the underlying molecular mechanism of AS events in GLAD. Cytoscape v3.7.1 software was used to visualize network data such as genetic, protein-DNA, and protein-protein interactions [31] and generated a potential regulatory network between AS and SFs. Then, based on the R package, the Gene Ontology (GO) terms and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways were used to assess the functions associated with the most significant prognosis-related AS events.

Clinical characteristics
In the present study, cancer-related AS events were selected in 59 normal controls and 513 tumor tissues from the TCGA SpliceSeq database. A total of 251 patients were included in our analysis. We processed the clinical information and TCGA SpliceSeq files of geriatric patients with LUAD. Univariate analyses and multivariate analyses were

Survival-associated alternative splicing events
The intersections of genes and AS events in GLAD were described using the following UpSet plot ( Figure 1A,B), which indicated that a single gene may be involved in different AS events. Of seven types of AS events, ES was the most common type. A total of 16,793 ES events were observed in 6475 genes, of which only 1744 genes were involved in the ES event. We also identified 3559 AA events in 2306 genes, 3057 AD events in 1710 genes, 8992 AP events in 3398 genes, 8546 AT events in 3522 genes, 220 ME events in 65 genes, and 2781 RI events in 1681 genes ( Figure 1A). After combining AS data and survival data, a total of 2381 survival-related AS events in 1633 genes were reported through the univariate Cox regression analysis (P<0.05), including 158 AA events in 151 genes, 145 AD events in 137 genes, 580 AP events in 369 genes, 527 AT events in 325 genes, 825 ES events in 672 genes, 12 ME events in 13 genes, and 124 RI events in 124 genes ( Figure 1B). The distribution of AS events correlated with patient survival are displayed in the Volcano plot (P<0.05) shown in Figure 2. Furthermore, the bubble charts in Figure 3 revealed the 20 most significant prognostic-related AS events; however, only 12 prognostic related ME events were included. As shown, each AS event is defined by a unique code. For example, for the code ATAD3A-176-AA, ATAD3A is the name of gene, 176 is the sequence number of the splicing event in the TCGA database, and AA is the splicing type.

Prognostic value of prognostic indexes in survival analysis
Univariate Cox regression analysis (P<0.001) was used to select significant survival associated with AS events for the construction of PI models. Then LASSO Cox analysis was conducted to eliminate interacting genes after cross-validation of minimum error ( Figure 4A-H), and to screen significant survival-associated genes ( Figure 5A-H). Genes with high survival correlations selected by LASSO regression were used to construct the PI models. The ROC curve analyses showed that eight models had predictive significance for prognosis. The PI model of AA events was the most effective at estimating the prognosis of geriatric patients with LUADs, with an AUC value of 0.87 ( Figure  6B), followed by PI of all types and the PI of AD events, with an AUC value of 0.857 and 0.821, respectively ( Figures  13B and 7 B). Cases were then divided into high-risk and low-risk groups according to the median PI value. The results showed that all the PI models could achieve good stratification of the prognosis of the low-and high-risk groups (Figures 6-13). Kaplan-Meier curve analysis showed that the survival time of the low-risk group was significantly longer than the high-risk group (Figures 6A-13A). Patients with lower risk experienced a longer survival (P<0.001).
According to the univariate and multivariate Cox regression analysis, PI models of AD splicing events showed the most significant survival time, with a value of 1.479E-13 ( Figure 7A), followed by the PI of AA events and PI of all types of splicing events, with values of 2.053E-12 and 1.377E-11, respectively ( Figures 6A and 13A). In all PI models, as the risk score increased, geriatric patients with LUAD were more inclined to experience a poorer prognosis ( Figures 6C-13C). Heat maps were used to illustrate the relationship between AS events and the risk score. If the risk value positively correlated with the PSI value of the AS events, the AS event was considered as high-risk event ( Figures 6E-13E). After substituting the data of non-elderly lung adenocarcinoma patients into the prognosis model, we found that there was no significant difference in survival between the high-risk group and low-risk group ( Figure  14A-H).

Survival-associated splicing factors-alternative splicing networks
Previous studies have suggested that the dysregulation of SF plays a crucial role in tumorigenesis or progression [8]. Data relative to the mRNA expression of SFs in geriatric lung adenocarcinoma were downloaded from the TCGA database. Pearson's correlation analysis was used to analyze the correlation between survival-related SFs and AS events with independent predictive significance (screening criteria: |correlation coefficient| > 0.6, P<0.001). A total of 19 SF and 54 AS events were selected by Pearson's analysis. A correlation network was established using Cytoscape to examine the potential regulatory association between the SF and the survival-associated AS events ( Figure 15). A shown in Figure 15, a single SF can be involved in regulating multiple AS events, and an AS event can be regulated by multiple SFs. Generally, AS events with high-risk were mainly negatively correlated with SFs, whereas AS events with low-risk were mainly positively correlated. However, some SFs are negatively correlated with low-risk AS events. For example, SNRPF showed a negative correlation with PSMB7-87531-AT but a positive correlation with PSMB7-87532-AT.
To investigate the potential biological function of the 19 SFs, we analyzed the biological pathways involved and identified enriched pathways using the R package. In GO terms, genes were mostly enriched in terms involving RNA-dependent ATPase activity, RNA helicase activity, snRNA binding, and mRNA binding ( Figure 16A). Further, three KEGG pathways were enriched in the AS-SFs network, including the spliceosome, the mRNA surveillance pathway, and RNA transport ( Figure 16B). We determined that the gene DDX39B was involved in seven biological functions ( Table 2).

Discussion
Splicing of pre-mRNA is essential for the maturation of mRNAs, and it is an important step in regulating the expression of protein and genes. Abnormal AS in pre-mRNA splicing may lead to the development of tumors. Previous studies have revealed that SFs can regulate aberrant AS events that affect the occurrence and development of lung cancer. For example, Liu et al. showed that abnormal splicing of BIN1 was controlled by serine and arginine-rich factor 1 (SRSF1) in NSCLC [32]. Lin et al. showed that SRSF1 and RBM4 had differential impact on HIF-1α in a CU element-dependent manner [33]. However, these studies were limited to a single splicing factor or splicing event and did not group the types of lung cancer evaluated. Our study focused on analyzing prognosis related AS events and SFs in elderly lung adenocarcinoma patients. We screened the SFs and AS events related to the prognosis of geriatric patients with LUAD and established eight PI models to analyze the risk score of different AS events. Then, we    screened the SFs and AS events that were highly correlated with each other using Pearson correlation analysis. A total of 54 AS events and 19 SFs were selected to construct a correlation network that could help elucidate the potential mechanisms involved in the splicing pathway in GLAD. The ROC curve analyses verified that all the models had a certain guiding significance for prognosis, and the PI model of AA events was the most significant. In the AA model, we analyzed 12 AS events closely related to the prognosis of GLAD, related genes mainly include LIMD2, USP28, IL32, and ZC3H13. LIM domain containing 2 (LIMD2) is a small LIM-only protein that has been revealed to promote tumor progression. A recent study showed that LIMD2 serves as an oncogenic in NSCLC and was regulated by miR-34a and miR-124 [34,35]. Ubiquitin-specific proteases (USP) regulate physiological homeostasis of the ubiquitination process, a high level of USP28 is related to poor overall survival and prognosis in NSCLC patients [35]. IL32 and ZC3H13 also play an important role in tumorigenesis and metastasis of lung adenocarcinoma [36,37]. These are consistent with our findings. AS is a complex and sophisticated process and SFs can regulate the aberrant AS events that affect the occurrence and development of lung cancer. Therefore, the correlation between SFs and AS events with independent prognostic value is worth studying.
In our study, a total of 19 SFs were included in the correlation network. We analyzed the regulation relationship between SFs and AS events, our findings may provide new insight for precise treatment and explain the potential molecular mechanism of GLAD. Different SFs often have a synergistic effect when regulating the same AS event. For example, ATAD3A is positively regulated by several SF such as RBM5, DDX17, CDK-10, CLK1, CCDC130, LUC7L, SNRNP70, and LUC7L3. Besides, one SF may positively or negatively affect different AS events. For example, SRRM2 positively regulates RANBP1-61138-RI, whereas it negatively regulates SEC23A-27347-AT. In our network about GLAD, positive correlations between SFs and AS events were more common than negative ones. Generally, AS events with high-risk are mainly negatively correlated with SFs, whereas AS events with low-risk are mainly positively correlated. In other words, most of the SF in our study may delay the progression of GLAD except SNRPF, ALYREF, and TNPO1.
Several SFs in key nodes were frequently related to splicing events in GLAD, mainly including DDX39B, DDX17, SRRM2, CIRBP, and RBM5. Prior studies have shown that these SF are closely related to tumor formation. Overexpression of DDX39B enhances cell proliferation and global translation to promote tumor formulation [38]. DDX17 promotes the formation of hepatocellular carcinoma by inhibiting Klf4 transcriptional activity [39]. SRRM2 is a main component of the spliceosome, and mutation in SRRM2 is associated with the predisposition of papillary thyroid carcinoma [40]. Abnormal expression of CIRBP is involved in the progression and migration of nasopharyngeal carcinoma and bladder cancer [41,42]. Besides, RBM5 can be acted as a tumor suppressor [43], and it inhibits the formation of lung adenocarcinoma through several apoptotic signaling pathways [44]. It was consistent with our study findings that RBM5 could up-regulate several low-risk AS events and down-regulate high-risk AS events. However, previous studies did not discuss the role of DDX39B, DDX17, SRRM2, and CIRBP in GLAD. In our analyses, these SF were found to affect tumor prognosis by regulating several AS events. For example, DDX39B positively regulates the ATAD3A-176-AA, whereas negatively regulates the RAB11B-47226-AT, and GO function terms and KEGG pathway analysis showed that DDX39B was involved in seven biological functions. DDX17 positively regulates the AIG1-77971-AT, whereas negatively regulates the RAB11B-47227-AT. Furthermore, we found that DDX39B, DDX17, and RBM5 were simultaneously positively regulated the ATAD3A-176-AA. ATAD3A is a kind of mitochondrial enzyme, and it is a low-risk gene associated with AA event [45]. The deregulation of ATAD3A is crucial in the tumor microenvironment because it promotes tumor metastasis [46]. As such, ATAD3A is a promising drug target in the treatment of GLAD. However, there are few studies on ATAD3A in lung adenocarcinoma, the clinical significance of targeting ATAD3A in the treatment of GLAD deserves further exploration and demonstration. Therefore, our study may highlights a possible mechanism in GLAD tumorigenesis.
The results of the GO analysis indicated that the genes were mainly involved in RNA-dependent ATPase activity, RNA helicase activity, snRNA binding, and mRNA binding. Furthermore, three KEGG pathways were enriched in the AS-SFs network, including the spliceosome, the mRNA surveillance pathway, and RNA transport. It is worth noting that DDX39B was involved in seven biological functions. The AS events generated from these genes may affect the development of GLAD by interfering with the above biological processes and pathways.
In conclusion, we assessed the prognostic value of survival-related AS events in GLAD and established eight PI models with high prognostic values. The regulatory network and enrichment analyses revealed the distinct relationship between AS and SFs. The process of these specific regulatory mechanisms in spliceosomes may serve as crucial starting points for further exploration of splicing events in GLAD.

Data Availability
The datasets used and analyzed during the current study are available from the online tools.

Ethics Approval
There were no cell, tissue, or animal studies. No ethical requirements are involved.

Consent for Publication
All authors agree to publish the paper.