An autophagy-related prognostic signature associated with immune microenvironment features of uveal melanoma

Abstract Autophagy is involved in cancer initiation and progression but its role in uveal melanoma (UM) was rarely investigated. Herein, we built an autophagy-related gene (ARG) risk model of UM patients by univariate Cox regression and least absolute shrinkage and selection operator (Lasso) regression model and filtrated out nine prognostic ARGs in The Cancer Genome Atlas (TCGA) cohort. Survival and Receiver Operating Characteristic (ROC) Curve analysis in the TCGA and other four independent UM cohorts (GSE22138, GSE27831, GSE44295 and GSE84976) proved that the ARG-signature possessed robust and steady prognosis predictive ability. We calculated risk scores for patients included in our study and patients with higher risk scores showed worse clinical outcomes. We found the expressions of the nine ARGs were significantly associated with clinical and molecular features (including risk score) and overall survival (OS) of UM patients. Furthermore, we utilized univariate and multivariate Cox regression analyses to determine the independent prognostic ability of the ARG-signature. Functional enrichment analysis showed the ARG-signature was correlated with several immune-related processes and pathways like T-cell activation and T-cell receptor signaling pathway. Gene set enrichment analysis (GSEA) found tumor hallmarks including angiogenesis, IL6-JAK-STAT3-signaling, reactive oxygen species pathway and oxidative phosphorylation were enriched in high-risk UM patients. Finally, infiltrations of several immune cells and immune-related scores were found significantly associated with the ARG-signature. In conclusion, the ARG-signature might be a strong predictor for evaluating the prognosis and immune infiltration of UM patients.


Introduction
Uveal melanoma (UM) is a type of malignant intraocular cancer derived from melanocytes with high metastasis rate and poor prognosis [1]. Fifty percent patients with UM were found with liver metastases within 10 years from first diagnosis, and the median survival time of patients with metastatic lesions is approx. 5-7 months [2,3]. These data indicated UM is an aggressive tumor with fetal malignancy. However, the methods of cancer biomedical diagnosis and targeted therapy have been developed rapidly in past decades, but the metastasis and death rate of UM did not decline [3]. Nowadays, with the rapid development of biomedical and bioinformatics technologies and methods, universal researches based on cancer expression profiles helped to search cancer diagnostic and therapeutic biomarkers, but similar studies in UM were rare. Therefore, identification of novel and effective prognostic and therapeutic biomarkers remains a priority for UM-tailored therapy.
Autophagy, a cellular programmed-digestion process, has been reported related to several pathological process including malignancy progression. It is a special mechanism for cancer cells to obtain enough energy and nutriment in the hostile conditions of hypoxia, oxidative stress and nutrient starvation [4,5]. In recent years, there emerged numerous autophagy-related cancer researches, and sufficient evidences have revealed that the autophagy process plays a crucial role in the occurrence and progression of various cancers [6][7][8][9]. Several autophagy-related genes (ARGs) were revealed associated with the prognosis of cancer patients and were potential biomarkers for predicting the survival time and therapeutic effect for cancer patients [9,10]. However, studies on the role of autophagy in UM are still rare and relative biomarkers or prognostic signatures were hanging for further excavating and developing.
In the present study, we collected five independent UM cohorts (one training cohort and four validation cohorts) and established an autophagy-related prognostic signature by using univariate Cox regression analysis and least absolute shrinkage and selection operator (Lasso) Cox regression in the TCGA training cohort. A risk score was calculated for each patient and by the median value of risk scores we dichotomized patients into low-and high-risk subgroups which showed different overall survival (OS) time and disease-free survival (DFS) time. Besides, functional enrichment analysis between low-and high-risk UM patients showed malignancy-related processes, pathways and signatures are associated with the high-risk patients. Immune microenvironment features, including multiple immune cells, machine-learning based scores, were also investigated for their associations with the ARG-signature in UMs.

Data acquisition
RNA-seq profile and related clinical information of The Cancer Genome Atlas (TCGA) cohort were obtained from the TCGA website (https://portal.gdc.cancer.gov/), and the related RNA expression and clinical information files of the other four cohorts (including GSE22138, GSE27831, GSE44295 and GSE84976) were downloaded from the Gene Expression Omnibus (GEO) repository (https://www.ncbi.nlm.nih.gov/geo/) and previous published researches [11][12][13]. The baseline information of patients including in current study are summarized in Table 1. The ARGs included in the present study were acquired from the Human Autophagy Database (HADb, http://autophagy.lu/ clustering/index.html) [14] and the GO AUTOPHAGY gene set in the Molecular Signatures Database v6.2 (MSigDB, http://software.broadinstitute.org/gsea/msigdb) [15]. We intersected genes in the five databases and obtained a list of 423 ARGs for further analysis.

Data processing
For the TCGA RNA-seq cohort, The Fragments Per Kilobase of transcript per Million (FPKM) values were transformed to Transcripts Per Kilobase Million (TPM) values according to an algorithm acquired in published researches [16,17]. For the microarray data in the GSE22138, GSE27831, GSE44295 and GSE84976 cohort, background adjustments and quantile normalization were performed on the microarray raw data using a robust multiarray averaging method (RAM) with the R packages 'affy' [18] and 'simpleaffy' [19]. The TPM and RAM expression values in each cohort were utilized in following data analysis.

Construction of the ARG-signature
By taking the TCGA cohort as training cohort, we performed univariate Cox regression analysis to the 423 ARGs based on the survival information in the TCGA cohorts and we screened out 133 OS-related ARGs (P<0.05) including 39 protective ARGs (HR < 1) and 94 risky ARGs (HR > 1). These OS-related ARGs were used to further construct the prognostic signature by Lasso Cox regression analysis. Finally, a 9-ARG signature was established and a risk score calculating formula, based on the related expression value and coefficients of the nine ARGs, was developed. The formula developed was as following: in which the Coe f i is the coefficient of each ARGs, and x i is the TPM value or RMA value of each ARG in each cohort.
Based on the formula above, risk scores were calculated for each patients and divided UM patients into low-and high-risk subgroups by the median risk score in each cohort.

Functional enrichment analysis
Genes whose expressions were in significant differential levels between low-and high-risk UM patients with |log2 (fold change)| > 1 and P-value <0.05 were defined as differential expression genes (DEGs) in the present study. We used the R package 'limma' [20] to perform differential expression analysis in the TCGA cohort and we obtained 2365

Immune cells infiltration and immune-related scores calculation
Based on the TPM expression values of UM patients in the TCGA cohort, the 22 immune cells infiltration proportions of each UM patients were calculated using 'CIBERSORT' R package. The ESTIMATE score, immune score, stromal score and tumor purity values of UM patients in the TCGA dataset were obtained from the website of PanCanAtlas Publications (https://gdc.cancer.gov/about-data/publications/panimmune).

Statistical methods
Two-sided log-rank test was applied to contrast the OS/DFS between different risk subgroups or expression levels of ARGs of UM patients in survival curves, visualized by R package 'survival' . To evaluate the OS/DFS predictive power of the ARG-signature, receiver operating characteristic (ROC) curves were plotted by R package 'timeROC' [22] and area under curve (AUC) was calculated as the main quantized parameter. Based on the clinical and survival information, univariate and multivariate Cox regression were the vital analytical methods to decide the independent prognostic role of the ARG-signature. The analysis method of correlations between risk scores and immune-related scores was Pearson correlation analysis. Statistical analysis contained in our study were implemented based on the R programming language (version 3.6.3, https://www.r-project.org/).

Development of the ARG-signature in UM patients
The workflow of the development of the risk signature is shown as Figure 1. To develop an ARG-signature which could be applied to all the patients in these five cohorts, we first examined the detection statuses of the 531 ARGs in each cohort. By taking the intersection of the detected ARGs, we finally obtained a list of shared 423 ARGs in all the cohorts. Then univariate Cox regression analysis showed that 133 of the 423 ARGs were associated with the OS of UM patients in the TCGA cohort (P<0.05), of which 39 ARGs were protective genes (HR < 1) and 94 ARGs were risky genes (HR > 1). Lasso Cox regression model was subsequently applied to the 133 OS-related ARGs to establish a prognostic model for the UM patients in the TCGA cohort. We successfully developed a prognostic signature based on nine ARGs (Figure 2A,B). The coefficients of the nine ARGs were visualized from high to low in the Figure 2C, and univariate Cox regression analysis of the nine ARGs were summarized in Table 2.
To confirm the prognostic predictive value of the ARG-signature, 80 UM patients were separate into 40 low-and 40 high-risk patients by the median risk score in the TCGA cohort. The Kaplan-Meier curves (survival analysis) revealed that low-risk UM subgroup have higher survival rate and longer survival time compared with the high-risk subgroup ( Figure 2D). And the patients' distributions plot depicted that high-risk UM patients showed shorter OS time compared with low-risk patients ( Figure 2E). Besides, the ROC analysis indicated that the risk score possessed a very high AUC value for predicting the 1/3/5-year OS of UM patients ( Figure 2F  Furthermore, associations between the 9-gene expression levels and clinical and molecular features (including risk score) were investigated. UM patients were in order with the risk score increasing and patients with higher risk were associated with lower DLC1, GABARAPL1, LMCD1, PRKCD, TUSC1 expressions, higher BNIP1, ITGA6 IKBKE and FKBP1A expressions, more Chr3/Chr8q loss, less Chr6p loss and higher tumor thickness, diagnostic age and clinical stage ( Figure 2G).

Validation of the ARG-signature in external cohorts
Generally, a well-performed prognostic signature needs more validations in other retrievable cohorts. Prognostic validation analysis was used to verify the prognostic predictive stability of the ARG-signature. In the GSE44295 cohort, 57 UM patients were divided into low-and high-risk UM groups by the median risk score and survival analysis showed that high-risk UMs were associated with lower OS rate and shorter OS time ( Figure 3A), patients' distributions of OS time and status also proved this result ( Figure 3B). And ROC curves revealed the risk score remained a high level of prognostic predictive ability in the GSE44295 cohort ( Figure 3C, 1/3/5 AUC = 0.833/0.745/0.731). Similar results were also found in the GSE84976 (n=28) and GSE22138 (n=63) UM cohorts ( Figure 3D-I), it showed that the ARG-signature could commendably distinguish between low-and high-risk UMs with distinct OS time.   Furthermore, the GSE27831 cohort (n=29) was used to examined whether the ARG-signature could predict the DFS of UM patients. Like the results above, DFS analysis and patients' distributions of DFS time indicated that the high-risk UM patients have a lower DFS rate and shorter DFS time ( Figure 3J,K). And ROC curves demonstrated that the ARG-signature could well forecast the 3/5-year DFS of UM patients ( Figure 3L).

The prognostic value of the nine ARGs
Survival analysis was used to test the relationship between the OS and ARGs' expression in the TCGA cohort, UM patients were divided into low-and high-expression of each genes to compare survival difference between two subgroups. UM patients with high expression levels of IKBKE, BNIP1, ITGA6 and FKBP1A showed worse clinical outcomes which means they might be risky ARGs in UM ( Figure 4A-D). And high expression levels of DLC1, PRKCD, GABARAPL1, LMCD1 and TUSC1 were associated with better clinical outcomes of UM patients, the five ARGs could be potential protective genes in UM ( Figure 4E-I). The results of survival analysis were consistent with the univariate Cox regression results (Table 2).

Independent prognostic ability of the ARG-signature stratification
Univariate and multivariate Cox regression were implemented to examine the independent prognostic role of the ARG-signature in the TCGA cohort. We concluded clinically common factors like gender, age, stage, T and M statuses (N status was excluded due to there exists 76 N0 UMs and 4 Nx UMs) in the univariate Cox regression analysis.  .651-3.652, P=0.325) was not a prognostic predictor ( Figure 5A).
We noticed that half of UM patients in the TCGA received radiotherapy and the other half of UM patients were under chemotherapy. To excluded the effects of different therapy methods, we applied survival analysis in UM patients with different therapy and results indicated that UM patients with higher risk scores have worse clinical outcomes no matter what treatment they received ( Figure 5C,D).
Principle component analysis (PCA) showed UM patients were in distinct directions based on all detected genes or 531 ARGs profiles in the TCGA cohort (top three principle components were analyzed), which indicated that UM patients with different risk stratification harbored different gene expression status and autophagy status ( Figure 5E,F).

Functional enrichment analysis
To investigate the inner molecular distinctions between low-and high-risk UM patients, DEGs between low-and high-risk patients were analyzed in the TCGA cohort. Differential expression analysis was first performed between low-and high-risk subgroups of UM patients in the TCGA cohort and 2369 DEGs were identified.
In order to understand the main functional processes and pathways of the 2369 DEGs, we implemented GO and KEGG pathway to identify the dominating functions of them. Results of GO analysis revealed that functional enrichments of these DEGs included T-cell activation, lymphocyte activation, leukocyte activation and differentiation (biological process, BP), plasma membrane protein complex, extracellular matrix, collagen-containing extracellular matrix (cellular component, CC), receptor regulator activity, channel activity, passive transmembrane transporter activity (molecular function, MF) and so on ( Figure 6A). KEGG pathway analysis indicated that cytokine-cytokine receptor interaction, cell adhesion molecules (CAMs), antigen processing and presentation, Th1 and Th2 cell differentiation, natural killer cell-mediated cytotoxicity, Th17 cell differentiation, T cell receptor signaling pathway and NF-κB signaling pathway were significantly differentially enriched between low-and high-risk UM patients ( Figure  6B).
Tumor hallmarks were also investigated between low-and high-risk UM patients in the TCGA cohort by using gene set enrichment analysis (GSEA) method. We identified four tumor hallmarks including Angiogenesis, IL6-JAK-STAT3 signaling, Reactive oxygen species pathway and Oxidative phosphorylation were prominently enriched in high-risk UM patients ( Figure 6C).

ARG-signature related immune microenvironment features
In light of the results of functional enrichment analysis, we speculated that the ARG-signature was associated with the immune characterization of UM. The landscape of 22 immune cell infiltration proportions were shown in Figure  7A. The violin plot was used to visualize the differences of immune cell infiltration between low-and high-risk UM samples ( Figure 7B). Results showed that B cell naive, T cells CD4 memory resting, Monocytes, Mast cells resting were significantly infiltrating in low-risk UM samples and T cells CD8, T cells CD4 memory activated and T cells follicular helper were more infiltrating in high-risk UM patients.
Immune microenvironment-related scores, including ESTIMATE score, Tumor Purity, Immune score and Stromal score, were calculated for each patients using related algorithm. These immune microenvironment related scores were used to performed Pearson correlation analysis with the risk score to confirm the correlation between the ARG-signature and immune microenvironment status. Results showed that the risk score was positively associated with the ESTIMATE score (R = 0.43, P=6. [1][2][3][4][5], Immune score (R = 0.46, P=2-5) and Stromal score (R = 0.31, P=0.0048) and negatively associated with the Tumor Purity (R = −0.43, P=8. [1][2][3][4][5]. These data demonstrated that UM patients with higher risk scores were infiltrated with more immune and stromal cells.

Discussion
Along with booming evolution of biotechnological and bioinformatics science, genomic analysis had been widely applied to search cancer biomarkers or develop diagnostic and prognostic models [23]. However, few associated studies focused on prognostic prediction model development of UM patients. In current research, we combined univariate Cox regression model and Lasso Cox regression model to screen OS-related ARGs and developed a prognostic ARG-signature to distinguish patients with distinct clinical outcomes. Furthermore, unlike previous prognostic signature development researches [24], the ARG-signature was successfully validated in other four external independent UM cohorts, it is the greatest strength of our study and it proved the robustness and stableness of our ARG-signature. Nevertheless, the main limitation of our study was the number of patients included might not be enough compared with counterpart researches in other cancer types, although we have tried to search UM cohorts for validation.
ROC curve analysis was the main method to judge the prognostic prediction accuracy of the ARG-signature in our study, and we noticed that the AUC value of the 5-year OS prediction in the TCGA cohort and the 5-year DFS prediction in the GSE27831 cohort was 1.000, which might raise some concerns about the accuracy of our risk model. These results might be contributed by the reason that only three patients and one patient lived longer than 5 year in the TCGA and GSE27831 cohorts, respectively. On the contrary, there were 18, 9 and 17 UM patient lived longer than 5 years in GSE22138, GSE44295 and GSE84976 cohorts, respectively.
In current study, we totally identified nine prognostic ARGs and found high expressions of IKBKE, BNIP1, ITGA6, FKBP1A and low expressions of DLC1, PRKCD, GABARAPL1, LMCD1 and TUSC1 were associated with worse prognosis of UM patients. Inhibitor of nuclear factor κ B kinase subunit (IKBKE) had been discovered as an oncogene and is overexpressed in over 30% of breast carcinomas samples and cell lines [25]. It was reported that BNIP1 could restrain cervical cancer cell proliferation, migration and invasion by inhibit mTOR signaling pathway, although it seemed to be a risky factor in UM patients [26]. Integrin subunit α 6 (ITGA6), a member of the integrin α chain family of proteins, is an efficient early-detection biomarker and prognostic factor for colorectal cancer patients [27]. FKBP1A belongs to immunophilin protein family and mediate the immunosuppressive and antitumor effects of rapamycin [28]. DLC-1 is a Rho GTPase-activating protein (RhoGAP), could facilitate melanoma cell invasion and metastasis by cooperating transcription factor FOXK1 to promote MMP9 expression [29] and enhancing colorectal cancer metastasis by the epithelial-to-mesenchymal transition [30]. The protein kinase C δ (PRKCD) is a member of the protein kinase C family of serine-and threonine-specific protein kinases and it is involving in manipulating tumor repopulation while receiving radiotherapy [31]. It had been reported that lower GABARAPL1 expression is correlated with poor prognosis of hepatocellular carcinoma and lymph node-positive breast cancer patients [32,33], which might indicate that GABARAPL1 is a negative regulator of cancer progression. LIM and cysteine-rich domains 1 (LMCD1) had been found acting as an activator E2F1 transcription factor in human cells [34] but its role in cancers were rarely investigated. TUSC1 tumor suppressor candidate 1 (TUSC1) is a putative tumor suppressor gene and could restrain lung cancer and glioblastoma cells growth in vitro [35,36]. Few researches focused on the roles of the nine ARGs in UM and we thought our work could help to throw light on the roles of them.
In conclusion, totally 257 UM patients in five independent cohorts were included and used to develop and validate a prognostic predictive signature for UM patients. The ARG-signature showed a powerful predictive ability to the clinical outcomes in both training cohort and four external validation cohorts. Meanwhile, nine novel prognostic ARGs were identified associated with the clinical outcomes of UM patients and might provide some clues for further researches. Besides, UM patients with higher risk scores showed higher immune cell infiltration levels and tumor hallmarks enrichments.