Identification of co-expressed genes associated with MLL rearrangement in pediatric acute lymphoblastic leukemia

Abstract Rearrangements involving the mixed lineage leukemia (MLL) gene are common adverse prognostic factors of pediatric acute lymphoblastic leukemia (ALL). Even allogeneic hematopoietic stem cell transplantation does not improve the outcome of ALL cases with some types of MLL rearrangements. The aim of the present study was to identify the co-expressed genes that related to MLL rearrangement (MLL-r) and elucidate the potential mechanisms of how MLL-r and their partner genes lead to leukemogenesis. Gene co-expression networks were constructed using the gene expression data and sample traits of 204 pretreated pediatric ALL patients, and co-expression modules significantly related to the MLL-r were screened out. Gene ontology annotation and Kyoto Encyclopedia of Genes and Genomes pathway analysis of the module genes were performed. Hub genes were identified and their expression levels were analyzed in samples with or without MLL-r and the results were validated by an independent investigation. Furthermore, the relationships between the hub genes and sample traits were analyzed. In total, 21 co-expression modules were identified. The green module was positively correlated with MLL-r. PROM1, LGALS1, CD44, FUT4 and HOXA10 were identified as hub genes, which were involved in focal adhesion, calcium-dependent phospholipid binding, connective tissue development and transcriptional misregulation in cancer. The expression levels of the five hub genes were significantly increased in MLL-r samples, and the results were further validated. PROM1, LGALS1, CD44 and HOXA10 were positively related to the leukocyte count. These findings might provide novel insight regarding the mechanisms and potential therapeutic targets for pediatric ALL with MLL-r.


Introduction
Acute lymphoblastic leukemia (ALL) is the most common cancer diagnosed in children and represents approximately 25% of cancer diagnoses among children [1]. Approximately 98% of patients with pediatric ALL achieve remission, and 85% of patients aged 1-18 years with newly diagnosed ALL treated on current regimens are expected to be long-term event-free survivors, with over 90% surviving at 5 years [2,3]. Unfortunately, despite advances in treatments according to risk stratification, ALL remains the most common cause of death due to malignancy in children [4,5]. Several high-risk cytogenetic and molecular subtypes of ALL are associated with unfavorable outcomes. Rearrangements involving the mixed lineage leukemia (MLL) gene (also known as the KMT2A gene), which occur in approximately 5-10% of overall childhood ALL cases, are common adverse prognostic factors of ALL [6,7]. MLL gene rearrangement (MLL-r) is significantly associated with high risks of treatment failure, relapse and CNS involvement [8]. The 5-year event-free survival (EFS) and overall survival (OS) rates of these patients are worse than those of patients without MLL rearrangement (non-MLL-r). MLL-AF4 or MLL-AF9 is associated with poorer outcome (5-year EFS<60%) than other types of MLL-r [9]. Even worse, allogeneic hematopoietic stem cell transplantation (allo-HSCT) with HLA-matched related or unrelated donors has failed to improve the outcome of ALL cases with MLL-AF4 [7,9,10]. The MLL gene codes for large nuclear protein histone lysine methyltransferase 2A (KMT2A) functions as a transcriptional regulator that catalyzes the methylation of histone H3 lysine 4 (H3K4). It has been confirmed that MLL participates in the regulation of hematopoietic differentiation, while MLL-r leads to hematological abnormalities, further contributing to leukemogenesis [11,12]. However, MLL fusion proteins are unable to promote leukemogenesis on their own [13]; some additional events such as high levels of MEIS1 and HOX gene expression [7,14]; and FLT3 activation [15], K-Ras mutations [16] and epigenetic abnormalities [17] are required. Not only that, these cooperative events have only been found in a portion of leukemia cases with MLL-r, indicating that these abnormities might not be essential events for the MLL-dependent leukemogenic process. Thus far, the exact mechanisms of how MLL-r causes leukemogenesis remain unclear.
Because traditional approaches, including chemotherapy and allo-HSCT, are curative for few patients with pediatric ALL and MLL-r. Several novel targeted treatment options are emerging. FLT-3 inhibitors, including lestaurtinib [18] and quizartinib [19], have been used in clinical trials of patients with leukemia and MLL-r. The results failed to demonstrate that FLT-3 inhibitors are a beneficial therapeutic approach. The proteasome inhibitor bortezomib has been used alone in five leukemia patients with MLL-r, while only three cases showed temporary hematologic responses and the other two cases showed no response. Pinometostat, which is an inhibitor of the H3K79 methyltransferase DOT1L, has shown promising efficacy in a preclinical study of MLL-r leukemia [20,21]. However, the result was disappointing when it was used in patients with pediatric ALL and MLL-r [22]. Another epigenetic agent, histone deacetylase (HDAC) inhibitor, has been used in a clinical trial for the treatment of MLL-r acute leukemias in pediatric patients, but its efficacy has yet to be demonstrated. Immunotherapy, particularly chimeric antigen receptor (CAR) T-cell technology, is a promising therapeutic regimen for high-risk B-ALL. However, the application is limited due to the low/negative expression levels of targeted antigens, such as CD19 and CD22, in ALL cases with MLL-r [23][24][25]. In a word, only a minority of pediatric ALL patients with MLL-r may benefit from these directed strategies. Thus, it is urgently necessary to identify innovative treatment approaches to improve the unfavorable prognosis of these patients.
In the present study, a co-expression network was constructed by weighted gene co-expression network analysis (WGCNA) to identify co-expressed gene modules. The aim of the present study is to investigate the genes significantly related to MLL-r ALL and elucidate the mechanisms of how MLL fusions and their partner genes lead to leukemogenesis, which may provide novel therapeutic targets for ALL with MLL-r.

Gene expression data and sample trait collection
The dataset GSE68735, containing microarray data and clinical phenotype information from 207 pretreated patients with pediatric ALL, was downloaded from the GEO database (https://www.ncbi.nlm.nih.gov/geo/). The gene expression data were generated by an Affymetrix Human Genome U133 Plus 2.0 Array. The sample traits, including the age, sex, leukocyte count, MLL-r, TEL-AML, E2A-PBX1, combined trisomy 4 and 10 and central nervous system (CNS) statuses, of 207 ALL cases were extracted.

Co-expression network construction and module identification
Scale-free gene co-expression networks were constructed using the WGCNA package in R [26]. Outlier samples were identified and removed before co-expression analysis. A similarity matrix of all pairwise genes was constructed based on Pearson's correlation analysis. The soft-thresholding power β was used when the scale-free topology fit index was approximately 0.90, which caused the matrix to achieve a scale-free network. The similarity matrix was converted into an adjacency matrix, and the adjacency matrix was then transformed into a topological overlap matrix (TOM). The dynamic tree cut method was used to identify co-expressed gene modules with a minimum gene group size of 30. The module eigengene (ME) was used to represent the expression profiles of the module genes [27]. Similar modules with correlations > 0.75 were merged.

Module-trait relationship analysis
Relationships between the modules and sample traits were identified by calculating the correlation coefficient. In the present study, the MLL-r status was selected as a target sample trait. Co-expression modules significantly related to the MLL-r status were screened out for the subsequent analysis. The correlation between a specific gene and the eigengene of a module (module membership, MM) and the correlation between a specific gene and sample traits (gene significance, GS) were calculated in the candidate modules.

Gene functional annotation and pathway enrichment analysis
Gene ontology (GO) functional annotation and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment analysis of the module genes mentioned above were performed with the clusterProfiler package in R [28] with an adjusted threshold of P < 0.05.

Hub gene identification and validation
Genes with high connectivity in the network were identified by their kME (eigengene connectivity) values. Genes with |kME| ≥ 0.8 were identified as candidate hub genes. A protein-protein interaction (PPI) network of the module genes was constructed using the STRING online database (version 10.5, http://www.string-db.org/) [29], with a minimum required interaction score > 0.4 (median confidence). Hub genes were screened out by the CytoHubba plug-in (version 2.1.6, http://apps.cytoscape.org/apps/cytohubba) in Cytoscape [30]. The same hub genes identified by the two methods were selected and defined as hub genes. The hub gene mRNA expression in samples with or without MLL-r was analyzed, and the results were validated by the Coustan-Smith leukemia study [31] in the Oncomine database (https://www.oncomine.org/).

The relationship between hub genes and sample traits
To reveal the relationship between the hub genes and sample traits, correlation analysis was performed using Spearman's correlation method.

Figure 2. Gene clustering dendrogram based on dissimilarity measure
Co-expression modules were identified by the dynamic tree cut method and similar modules were merged when the correlation higher than 0.75. Each co-expression module was assigned a distinctive color, and each vertical line represents a gene.

Gene expression data and sample traits
The dataset GSE68735 contained expression values for 207 patients diagnosed with pediatric ALL. Three cases, which included GSM1679955, GSM1680005 and GSM1680015, were excluded because the TEL-AML and trisomy 4 and 10 statuses were unknown. A total of 204 cases with a median age of 13 years were selected for further analysis. MLL-r was identified in approximately 10% of the patients. Three of the 207 cases had TEL-AML, 23 cases had E2A-PBX1, and 5 cases had combined trisomy 4 and 10. The clinical and laboratory characteristics of patients according to the MLL rearrangement status are shown in Table 1.

Co-expression network construction and module identification
Two outlier samples were removed based on the sample clustering result. The top 25% (13669) of the most variant genes were selected to construct the co-expression network. The scale-free topology fit index reached 0.85 when β was set to 7 ( Figure 1). As a result, 21 different co-expression modules clustered from 37 to 3654 genes were identified ( Figure 2).

Module-trait relationship analysis
The green module was positively related to the MLL-r status (r = 0.8, P = 5e-47) and leukocyte count (r = 0.31, P = 7e-06) ( Figure 3). Moreover, the GS for the MLL-r status and the MM of genes in the green module were calculated, and a scatter plot of the correlation was generated. The result showed that there was a significantly positive correlation between the GS for the MLL-r status and the MM of genes in the green module ( Figure 4).

GO annotation and pathway enrichment analysis of the genes in the green module
All 213 genes in the green module were analyzed by the clusterProfiler package in R for GO annotation and KEGG analysis. The results revealed that most significant terms for cellular component (CC), molecular function (MF) and biological processes (BPs) were focal adhesion (GO:0005925), calcium-dependent phospholipid binding (GO:0005544) and connective tissue development (GO:0061448), respectively ( Figure 5A-C). The KEGG analysis indicated that 11 genes in the green module were significantly enriched in transcriptional misregulation in cancer (hsa05202) ( Figure 5D). with kME values > 0.8 were simultaneously identified as hub genes by the CytoHubba plug-in in Cytoscape. The expression levels of all five hub genes were significantly increased in the samples with MLL-r compared with the samples without MLL-r ( Figure 6). Only one study that compared the expression levels of the five hub genes in patients with pediatric ALL with or without MLL-r was identified in the Oncomine database. Similarly, all five hub genes showed significantly higher expression levels in patients with pediatric ALL and MLL-r (Figure 7).

The relationship between hub genes and sample traits
As mentioned above, the green module was positively correlated with the MLL-r status and leukocyte count. The results of the correlation analysis revealed that PROM1, LGALS1, CD44 and HOXA10 were positively correlated with the leukocyte count with P-values less than 0.05 ( Figure 8).

Discussion
MLL-r remains a major unfavorable prognostic factor in ALL. The EFS and OS of pediatric ALL with MLL-r are inferior to those of patients without MLL-r due to chemotherapeutic refractoriness and relapse. It remains challenging to elucidate the molecular mechanisms of MLL-r leukemia and identify new therapeutic targets. Therefore, WGCNA was carried out to identify co-expressed genes associated with MLL-r in pediatric ALL. The green module was screened for further analysis because it had the highest correlation coefficient with the MLL-r status of pediatric ALL. To understand the biological functions of the co-expressed gene module, GO annotation and pathway enrichment analysis were performed. The results showed that the most significant GO-CC term enriched by the co-expressed genes was focal adhesion, which is a crucial step in cell migration [32]. The calcium-dependent phospholipid binding (MF term) and connective tissue development (BP term) were mainly involved in the green module genes. The most significantly enriched KEGG pathway in the green module was transcriptional misregulation in cancer, which is involved in ALL with MLL-r.
To distinguish important genes in the green module, a PPI network of the module genes was constructed. Then, hub genes, which included PROM1, LGALS1, CD44, FUT4 and HOXA10, were identified by CytoHubba based on the PPI network, and all hub genes had kME values ≥ 0.8. The expression levels of all five hub genes were significantly increased in the samples with MLL-r compared with the samples without MLL-r, and the results were validated by an independent investigation in the Oncomine database. Furthermore, the results of the correlation analysis revealed that PROM1, LGALS1, CD44 and HOXA10 were positively correlated with the leukocyte count, which is an adverse prognostic factor of pediatric ALL. The HOXA10 gene, which is a member of the homeobox gene family, is typically expressed in decidualizing stromal cells and hematopoietic stem cells. It codes a transcription factor that participates in hematopoietic cell differentiation and normal implantation [33][34][35]. The abnormal expression of homeoboxgenes, which contributed to oncogenesis and progression of cancers, were found in breast, colon and ovarian cancers [36]. HOXA10, HOXA9 and HOXC6 were consistently up-regulated in cell lines and primary cells from ALL patients with MLL-r [14,37]. Importantly, HOX genes, including HOXA10, HOXA9 and HOXC6, are significantly up-regulated in both T-and B-lineage ALLs with MLL translocations. Conversely, homeobox genes knockdown decreased the proliferation of leukemia cells of acute leukemia with MLL-r [38]. These results strongly indicated that these genes are involved in MLL-r ALL as central factors. The possible mechanism could be that the dysregulation of HOX genes directly caused by MLL fusion proteins promotes the excessive self-renewal of hematopoietic cells, which is associated with the origin and maintenance of leukemogenic events [39].
CD44, which encodes a cell adhesion glycoprotein, is crucial to various physiological processes, such as cell migration, limb development, extracellular binding, lymphocyte homing and hematopoiesis. Additionally, CD44 plays crucial roles in pathological processes, particularly in tumor progression, metastasis and chemoresistance, suggesting  that it is a potential marker of stem cells in solid tumors and leukemia-initiating cells. At present, CD44 was identified as a stemness-related gene of several cancers including glioblastoma, breast cancer, hepatocellular carcinoma, colorectal cancer and acute myeloid leukemia [40]. Significantly higher expression levels were observed in patients with ALL compared with umbilical cord blood precursor cells [41] and contributed to chemoresistance by increasing drug efflux [42]. Interestingly, the gene expression level of CD44 in ALL cases with MLL-r was much greater than that in ALL cases without MLL-r [43,44], which was consistent with the results of our study. These findings demonstrated that MLL-r may participate in the maturation arrest of early lymphoid progenitors because CD44 expression is involved in early steps of lymphoid development [45]. Previous studies have shown that anti-CD44 monoclonal antibodies inhibit the proliferation and reverse the differentiation arrest of acute myeloid leukemia cell lines and primary acute myeloid leukemia cells by increasing the expression level and stability of the p27 protein [46][47][48]. Moreover, anti-CD44 monoclonal antibodies have also shown the promising potential of eradicating acute myeloid leukemia stem cells by inhibiting acute myeloid leukemia stem cell homing [49]. Whether CD44-targeted therapy is an effective strategy for the treatment of ALL with MLL-r and acute myeloid leukemia requires further investigation. The PROM1 gene encodes the surface antigen CD133, which acts as a marker of hematopoietic stem cells and progenitor cells [50]. In addition, CD133 was used for identifying many types of cancer stem cells (CSCs) [40]. And CD133+ cells were characterized by self-renewal, proliferation, invasion and multi-drug resistance, the mechanism was associated with activation of the CSCs-related signaling such as PI3K/Akt, epidermal growth factor receptor and src-focal adhesion kinase [51]. Overexpression of the PROM1 gene has been identified in ALL with MLL-AF4 [37]. PROM1 transcription, which is required for the growth of ALL cells and is regulated by AF4, is downregulated due to the knockdown of AF4 [52]. However, few studies have focused on the expression level and effect of PROM1 in ALL with other MLL translocations. Recently, a bispecific chimeric antigen receptor (CAR) targeting CD19 and CD133 was designed, and tandem CAR (TanCAR) T cells were then applied to cell lines and a mouse model with MLL-AF4 B-cell ALL, which is frequently accompanied by the loss of CD19 expression. The results showed that Tan-CAR T cells selectively kill CD19+CD133+ and CD19-CD133+ leukemia cells with slight cytotoxicity against normal hematopoietic stem cells and progenitor cells [53]. But other investigators have highlighted concerns regarding CAR targeting CD133 as a suitable approach for treating MLL-r leukemia because their data indicated that the expression level of PROM1/CD133 in early normal hematopoietic stem cells is equal to or greater than that in B-cell ALL with MLL-AF4 or MLL-AF9 [54]. Thus, the potential of PROM1/CD133 as a target for immunotherapy requires further confirmation. The LGALS1 gene encodes galectin-1 (Gal-1), which is acarbohydrate-binding protein with the ability to bind β-D-galactopyranoside glycans and is involved in glycosylation [55,56]. The ligation of galectin and glycan contributes to cell proliferation, apoptosis, migration and tumor immunity [56]. However, the dysregulation of glycosylation and galectin expression have been shown to be significant events in oncogenesis, metastasis and resistance to chemotherapy [55,57,58]. The gene and protein expression levels of LGALS1 have been shown to be significantly higher in MLL-r B-ALL due to MLL-dependent epigenetic modifications [59]. Several inhibitors designed to target galectin-1 have shown therapeutic potential in different tumors and inflammatory disorders [60]. PTX008, which is an inhibitor of Gal-1, have been shown to reduce the proliferation and migration of B-cell precursor cells in ALL [61]. However, the challenge is that cancer cells produce different galectin isoforms by alternative splicing, which may result in resistance to galectin inhibitors [62]. Coincidentally, the fucosyltransferase-4 (FUT4) gene encode sanα-1,3-fucosyl transferase named Fut-IV and participates in fucosylation, which is an important type of glycosylation. Aberrant fucosylation has been shown in a number of pathological events, including the proliferation, invasion and metastasis of cancer cells and chemoresistance [63,64]. FucT-IV is distributed in leukocytes and myeloid cells and contributes to the final step of Lewis (Le) antigen biosynthesis [65]. The high expression levels of FucTs accompanied by the overexpression of certain glycan structures have been identified in several cancers. Similarly, increased expression levels of Le x have been observed in various solid tumors [66]. To date, few studies have examined the relationship between the FUT4 gene and leukemia, and most of these studies focused on myeloid leukemia cell lines [67,68]. Recently, a study revealed that FUT4 is overexpressed in patients with ALL [69]. However, the relationship between the FUT4 gene and MLL-r status remains unknown and requires further exploration.
In conclusion, a scale-free co-expression network was constructed by WGCNA. The green module was identified as a meaningful module that is significantly related to MLL-r, and PROM1, LGALS1, CD44, FUT4 and HOXA10 in this module were identified as hub genes with higher mRNA expression levels in MLL-r cases than in non-MLL-r cases. Additionally, the results were validated in other independent database. Interestingly, among these hub genes, HOXA10, CD44 and PROM1 were all recognized as important markers of hematopoietic stem cells and progenitor cells. Both LGALS1 and FUT4 are involved in the dysregulation of glycosylation. Their overexpression may play dominant roles in the leukemogenesis, development, chemoresistance and relapse of MLL-r ALL by promoting the characteristic self-renewal of leukemia stem cells. Overall, these findings might provide novel insights regarding the mechanisms and potential therapeutic targets for patients with pediatric ALL and MLL-r. lysine 4; HDAC, histone deacetylase; KEGG, Kyoto Encyclopedia of Genes and Genomes; KMT2A, histone lysine methyltransferase 2A; MLL, mixed lineage leukemia; MLL-r, MLL rearrangement; OS, overall survival; PPI, protein-protein interaction; WGCNA, weighted gene co-expression network analysis.