A ferroptosis-related gene signature for overall survival prediction and immune infiltration in lung squamous cell carcinoma

Abstract Background: Ferroptosis is associated with cancer initiation and progression. However, the molecular mechanism and prognostic value of ferroptosis-related genes in lung squamous cell carcinoma (LUSC) are poorly understood. Methods: The mRNA expression profiles, methylation data, and clinical information of patients with LUSC were downloaded from TCGA and GEO database. Ferroptosis-related differentially expressed genes (DEGs) were identified between cancerous and non-cancerous tissues, and their prognostic value was systemically investigated by bioinformatic analyses. Results: A ferroptosis-related gene signature (ALOX5, TFRC, PHKG2, FADS2, NOX1) was constructed using multivariate Cox regression analysis and represented as a risk score. Overall survival (OS) probability was significantly lower in the high-risk group than in the low-risk group (P<0.001), and receiver operating characteristic curve showed a good predictive capacity (AUC = 0.739). The risk score was an independent prognostic factor for LUSC. A nomogram was constructed to predict the OS probabilities at 1, 3, and 5 years. High-risk score was associated with increased immune infiltration, lower methylation levels, higher immune checkpoint genes expression levels, and better chemotherapy response. Cell adhesion molecules, focal adhesion, and extracellular matrix receptor interaction were the main pathways in the high-risk group. The signature was validated using the TCGA test cohort, entire TCGA cohort, GSE30219, GSE157010, GSE73403, and GSE4573 datasets. The gene disorders in patients with LUSC were validated using real-time PCR and single-cell RNA sequencing analysis. Conclusions: A ferroptosis-related gene signature was constructed to predict OS probability in LUSC. This could facilitate novel therapeutic methods and guide individualized therapy.


Introduction
Lung cancer is a common malignancy and the leading cause of cancer-associated deaths worldwide [1]. Non-small cell lung carcinoma (NSCLC) is the most common type of lung cancer and represents approximately 80-85% of all lung cancers [2,3]. Lung squamous cell carcinoma (LUSC) is a subtype of NSCLC and accounts for 30% of lung cancer diagnoses [4,5]. Various genetic and epigenetic changes have occurred in different subtypes of lung cancer [6]. Moreover, in patients with lung adenocarcinoma (LUAD), targeted therapies (against EGFR, ALK, ROS1, or BRAF) have significantly improved the clinical outcome [7]. However, these effective therapeutic methods may not be suitable for patients with LUSC [6]. Meanwhile, surgical intervention and chemotherapy and/or radiation have been proven to be suitable against a part of LUSC [8]. The 5-year survival rate of lung cancer is approximately 18% [9]. Therefore, considering the

Nomogram and calibration plots of the nomogram
A nomogram could better predict the disease prognosis due to its multidimensional parameters [21,22]. A nomogram was constructed using independent prognostic factors (age, UICC stage, and risk score) to predict the OS probability at 1-, 3-, and 5-years using the 'rms package' in R language. Calibration plots of the nomogram were applied to check the conformity of the nomogram-predicted and actual OS probabilities.

Gene set enrichment analysis
Gene set enrichment analysis is a computational method that determines whether a priori defined set of genes shows statistically significant and concordant differences between two biological states [23].

Analysis of immune infiltration and immunotherapy
The mRNA expression matrix of LUSC was converted to the tumor micro-environment score matrix using R language. The tumor micro-environment score including stromal score, immune score, and estimate score were compared between the high and low-risk groups using R language. The correlations of risk score and five genes with different immune cells were determined using R language. Spearman's rank correlation test was used to analyze the correlations, and the significance level was set at P<0.05. Moreover, levels of immune checkpoints genes (PD1, PDL1, CTLA4) relative expression were compared between the two risk groups using GraphPad Prism (version 7.00).

Analysis of sensitivity of chemotherapy drugs
Six chemotherapy drugs including bexarotene [24,25], dasatinib [26,27], embelin [28], midostaurin [29], pazopanib [30][31][32], and pyrimethamine [33,34] were screened from previous literature, which have been shown to have effects on lung cancer. The half inhibitory concentration (IC50) of these chemotherapy drugs was compared between both risk groups using 'pRRophetic packages' in R language. Wilcoxon signed-rank test was used to compare the differences between the two risk groups. P<0.05 was considered statistically significant.

Real-time PCR validation
Ten paired lung tissue samples were obtained from patients with LUSC who underwent lobectomy in West China Hospital of Sichuan University. Histologically normal tissues were considered as controls. Total RNA was extracted using the E.

Single-cell RNA sequencing analysis
The relative expression levels of five genes (ALOX5, TFRC, PHKG2, FADS2, NOX1) were compared between the cancer cells and immune cells.

Statistical analysis
Statistical analysis was performed using GraphPad Prism or R language. Shapiro-Wilk test was applied to test the data distribution type. Levels of relative gene expression were expressed as median (interquartile range). Mann-Whitney test was used in comparing the differences between two groups. P<0.05 was considered statistically significant difference.

Identification of ferroptosis-related DEGs
The study flowchart is shown in Figure 1. A total of 51 ferroptosis-related DEGs (38 up-regulated and 13 down-regulated genes) were identified when LUSC was compared with the control (Figure 2A).

Identification of prognostic ferroptosis-related DEGs
In univariate Cox regression analysis, a total of six ferroptosis-related genes were associated with survival ( Figure 2B). There were six prognostic ferroptosis-related DEGs between ferroptosis-related DEGs and prognostic ferroptosis-related genes ( Figure 2C).

Construction of the prognostic signature
Multivariate Cox regression analysis was applied to construct the prognostic signature. Subsequently, five genes (ALOX5, TFRC, PHKG2, FADS2, NOX1) were screened using R language ( Figure 2D LUSC were stratified into high-or low-risk groups according to the median risk score ( Figure 3C). Kaplan-Meier survival curve demonstrated that the high-risk group had a lower OS probability than the low-risk group (P<0.001; Figure 3A), and the area under the ROC curve (AUC) value was 0.739 ( Figure 3B). Patients in the high-risk group had a higher mortality ( Figure 3D). Relative expression levels of the five genes between the two risk groups are shown in a heatmap ( Figure 3E). PCA and t-SNE analysis indicated that the patients in the high-or low-risk groups were distributed in different positions ( Figure 4A,B).

Validation of the prognostic signature
Kaplan-Meier survival curve showed that OS probability in the high-risk group was significantly lower than that in the low-risk group (P<0.001, Figure 5A), and AUC was 0.710 ( Figure 5B) in the TCGA test cohorts. In addition, PCA and t-SNE analysis showed that patients in the two risk groups were distributed in different positions ( Figure 4C,D). The high-risk group had a lower OS probability compared with the low-risk group (P<0.001, Figure 5C), and AUC was 0.722 ( Figure 5D) in the entire TCGA cohorts. In addition, the high-risk group also had a lower OS probability than the low-risk group in GSE30219 (P<0.001, Figure 5E), GSE157010 (P<0.001, Figure 5F), GSE73403 (P<0.001, Figure 5G), and GSE4573 (P<0.001, Figure 5H).  Figure 6B). In contrast, only risk score was associated with prognosis in univariate Cox regression analysis (HR: 2.076, P=0.006, Figure 6C) and multivariate Cox regression analysis (HR: 2.085, P=0.005, Figure 6D) in the test cohorts. Thus, the risk score was an independent prognostic factor for LUSC.

Nomogram and calibration plots of the nomogram
A nomogram was successfully constructed to predict the OS probabilities at 1, 3, and 5 years in patients with LUSC, which was calculated by plotting a vertical line between the total point axis and each prognostic axis ( Figure 6E). In addition, calibration plots of the nomogram demonstrated high conformity of the nomogram-predicted and actual OS probabilities at 1, 3, and 5 years in patients with LUSC ( Figure 6F).

Gene set enrichment analysis
The results showed that the high-risk group was mainly enriched in cell adhesion molecules, focal adhesion, and extracellular matrix (ECM) receptor interaction ( Figure 7A), whereas metabolism of xenobiotics by cytochrome P450, and oxidative phosphorylation were the main pathways in the low-risk group ( Figure 7B).

DNA methylation levels of five genes
DNA methylation levels of TFRC (P=0.002) and FADS2 (P=0.020) in the high-risk group were significantly lower than that in the low-risk group ( Figure 8A,B), whereas DNA methylation levels of ALOX5, PHKG2, and NOX1 were slightly reduced in the high-risk group compared with the low-risk group without a statistical difference ( Figure  8C-E).

Immune infiltration and immunotherapy
The tumor microenvironment score including stromal score (P<0.001), immune score (P<0.001), and estimate score (P<0.001) were all higher in the high-risk group compared with the low-risk group ( Figure 9A). Thus, the tumor micro-environment was significantly different between the two risk groups. Moreover, the risk score was positively associated with macrophages M 2 (P=0.013, rho = 0.  Figure  9I. In addition, to explore the immunotherapy response on different risk groups, relative expression levels of immune checkpoint genes (PD1, PDL1, CTLA4) were compared, and the results showed that the relative expression levels of PD1 (P<0.001) and CTLA4 (P<0.001) were higher in the high-risk group ( Figure 10A-C), which showed that the high-risk group may have a better immunotherapy response.

Sensitivity of chemotherapy drugs
The results indicated that the high-risk group exhibited a lower IC50 for bexarotene, dasatinib, embelin, midostaurin, pazopanib, and pyrimethamine (P<0.001, Figure 10D-I), suggesting that the prognostic signature may be a reference option for chemotherapy drugs.

Single-cell RNA sequencing analysis
Many types of immune cells infiltrate tumors, thus clarifying that the resource of five genes is of great importance. Levels of TFRC (P<0.001), PHKG2 (P=0.005), FADS2 (P<0.001), and NOX1 (P=0.011) mRNA relative expression were higher in cancer cells than in immune cells ( Figure 11F-I), whereas levels of ALOX5 (P<0.001) mRNA relative expression were lower in cancer cells than in immune cells ( Figure 11J).

Discussion
In the present study, a ferroptosis-related gene signature was constructed to predict prognosis in patients with LUSC using multivariate Cox regression analysis, and the risk score was determined to be an independent prognostic factor for LUSC. A nomogram was successfully constructed to predict the OS probabilities at 1, 3, and 5 years in patients with LUSC. The high-risk score was associated with increased immune infiltration, lower methylation levels, higher levels of immune checkpoint genes, and better chemotherapy drugs sensitivity. Finally, the prognostic signature was validated using the TCGA test cohorts, entire TCGA cohorts, and multiple GEO datasets. TFRC, also known as CD71, is an essential membrane protein-regulating intracellular iron transporter [35,36]. Activating TFRC increases the iron content, mediates the release of ROS, and induces lipid peroxidation, which further promotes the ferroptosis of cells [37]; however, knocking down TFRC significantly inhibits cancer cell proliferation and metastasis via up-regulation of AXIN2 expression or sponge of microRNA-107 [38,39]. Levels of TFRC mRNA expression were significantly increased in LUSC samples [40]. PhK is a heterotetramer composed of four copies each of α, β, γ, and subunits [41]. Subunit γ is encoded by PHKG2 [42]. Silencing PHKG2 prevents accumulation of lipid peroxides and decreases cellular iron level [43]. PHKG2 is a useful diagnostic biomarker for multiple cancers, including breast cancer [44] and endometrial cancer [45]. NOX1 plays an important role in ROS generation and lung cancer [46,47]. NOX1-dependent ROS generation for toll-like receptor 4 (TLR4) signaling is found to enhance the metastasis of NSCLC [48]. In addition, NOX1 up-regulation is shown to activate sirtuin 1 (SIRT1) and inhibit P53 [49]. FADS2 is overexpressed in cancer and functions as a potential oncogene that facilitates cancer cell proliferation [50]. Inhibiting FADS2 could reduce ferroptosis by increasing levels of Fe and lipid ROS in lung cancer cells [51]. ALOX5 gene encodes lipoxygenase, which could catalyze the conversion of arachidonic acid to leukotriene [52]. Knockdown of ALXO5 mitigates lipid peroxidation, mitochondrial damage, DNA impairment, and cell death in ARPE-19 cells [53]. Genetic variations in the promoter region of ALOX5 may induce a reduced drug response to montelukast sodium in patients with asthma, leading to pharmacogene [54]. ALOX5 polymorphisms in non-smokers may increase risk of lung cancer [55]. The increased TFRC, PHKG2, FADS2, NOX1, and reduced ALOX5 levels in   LUSC tissues were reported in the present study, and scRNA-seq analysis showed that the levels of the five dysregulated genes were mainly influenced by cancer cells rather than immune cells.
The ferroptosis-related gene signature has been successfully constructed to predict the prognosis in patients with hepatocellular carcinoma [56], breast cancer [57], pancreatic adenocarcinoma [58], and cholangiocarcinoma [59]. Additionally, in LUAD, a ferroptosis-related gene signature including five genes was constructed to predict prognosis [60]. However, the ferroptosis-related gene signature for LUAD was not suitable for LUSC due to disease heterogenicity [6], different responses to clinical treatment [61,62], different prognoses [63], and lack of an experimental validation. Therefore, in the present study, a ferroptosis-related gene signature was successfully constructed using five genes validated by real-time PCR to predict the prognosis in patients with LUSC. Kaplan-Meier survival curve showed that patients with LUSC in the high-risk group were associated with a lower OS probability with moderate sensitivity and specificity than patients in the low-risk group. Moreover, the risk score was an independent prognostic factor for LUSC through multivariate Cox regression analysis. The signature was successfully validated using the TCGA test cohort, entire TCGA cohort, GSE30219, GSE157010, GSE73403, and GSE4573. Thus, the prognostic signature could be used to discriminate different risk groups and may contribute to guide therapy.
GSEA was performed to explore the underlying molecular mechanism between the two risk groups. The abnormal expression of cell adhesion molecules resulting in a loss of cell-cell and cell-matrix interactions can promote cancer cell invasion and migration [64]. For example, CEACAM6 overexpression promotes the migration of NSCLC by enhancing integrin expression [65][66][67]. In addition, up-regulation of adhesion molecules may contribute to lung metastasis enhanced by local infection/inflammation [68]. Dysregulated focal adhesion and ECM receptor could result in tumor progression [69][70][71]. In the present study, the cell adhesion molecules, focal adhesion, and ECM receptor interaction were the main pathways in the high-risk groups. Thus, the lower OS probability in the high-risk group may correlate with activated adhesion molecules, focal adhesion, and ECM. DNA methylation is an important type of epigenetic modification [72]. Hypermethylation or hypomethylation could cause the down-regulation or overexpression of target genes, which further regulate NSCLC tumorigenesis and progression [73]. The overall methylation levels of the five genes were lower in the high-risk group in the present study, which may also correlate with the lower OS probability in the high-risk group. Thus, interfering with the above-mentioned pathways and targets may facilitate novel therapeutic methods and thus improve prognosis. Immune cells are an important part of the tumor microenvironment and play a critical role in tumor development [74]. Tumor-associated macrophages and intra-tumoral CD8 + T cells are significantly associated with a poor prognosis in lung cancer [75][76][77][78]. Treg cells could promote lung cancer progression and metastasis [79][80][81][82] and are significantly associated with worse OS [83]. The neutrophil count in peripheral blood is an effective diagnostic biomarker for lung cancer [84]. Our research showed that high risk score was associated with increased macrophages M 2 , memory B cells, memory CD4 + T cells, neutrophils, and Treg cells. Thus, the high-risk group had a lower OS probability due to increased immune infiltration and may have a better treatment response to immunotherapy. Currently, immune checkpoint inhibitors (ICI) are becoming the standard first-line treatment for advanced NSCLC [85][86][87], and PD1, PDL1, and CTLA4 are mainly targets for ICI [88][89][90][91][92]. The high-risk score was correlated with increased relative expression levels of immune checkpoint genes (PD1, CLTA4). Thus, the high-risk group may be more sensitive to immune checkpoint inhibitors against PD1 and CLTA4. Most patients with lung cancer are diagnosed in an advanced stage [93]; thus, chemotherapy still serves as an important therapeutic method for them [94]. The high-risk group has a lower IC50 for six common chemotherapy drugs for lung cancer, which may be a reference option for chemotherapy drugs.
Highlights of the present study include the prognostic signature, which was constructed in the training cohort and validated in the test cohort, entire TCGA cohort and GSE30219, GSE157010, GSE73403, and GSE4573 datasets; and immunotherapy and chemotherapy response, which were identified to guide individualized treatment. In addition, the results of bioinformatic analysis were validated using real-time PCR in another cohort. Limitations of the present study are that our results were based on TCGA database and, thus, the prognostic signature needs to be validated in a clinical patient cohort. Moreover, the molecular mechanisms and specific role of ferroptosis-related genes, such as on lipid reactive oxygen species and ferrous ion accumulation, in LUSC need to be explored in further study.

Conclusions
A ferroptosis-related gene signature (ALOX5, TFRC, PHKG2, FADS2, NOX1) was constructed in the present study. The OS probability was significantly lower in the high-risk group than in the low-risk group (P<0.001), and AUC value was 0.739. The high-risk score was associated with increased immune infiltration, lower methylation levels, higher immune checkpoint genes expression level, and better chemotherapy sensitivity. Therefore, a ferroptosis-related gene signature was successfully constructed to predict prognosis for LUSC, and it may facilitate novel therapeutic methods and guide individualized therapy including immunotherapy and chemotherapy.

Data Availability
The data used to support the findings of the present study are available from TCGA database (https://portal.gdc.cancer.gov/) and GEO datasets (https://www.ncbi.nlm.nih.gov/gds/).

Ethics Statement
The studies involving human participants were approved by the Clinical Trial and Biomedical Ethics Committee of West China Hospital of Sichuan University (No. 2016-120). All patients with LUSC volunteered to attend the study and signed an informed consent to allow analyses to be performed on their tissue samples.