Abstract
Genome sequencing can be used to detect DNA repair failures in tumors and learn about underlying mechanisms. Here, we synthesize findings from genomic studies that examined deficiencies of the DNA mismatch repair (MMR) pathway. The impairment of MMR results in genome-wide hypermutation and in the ‘microsatellite instability’ (MSI) phenotype—occurrence of indel mutations at short tandem repeat (microsatellite) loci. The MSI status of tumors was traditionally assessed by molecular testing of a selected set of MS loci or by measuring MMR protein expression levels. Today, genomic data can provide a more complete picture of the consequences on genomic instability. Multiple computational studies examined somatic mutation distributions that result from failed DNA repair pathways in tumors. These include analyzing the commonly studied trinucleotide mutational spectra of single-nucleotide variants (SNVs), as well as of other features such as indels, structural variants, mutation clusters and regional mutation rate redistribution. The identified mutation patterns can be used to rigorously measure prevalence of MMR failures across cancer types, and potentially to subcategorize the MMR deficiencies. Diverse data sources, genomic and pre-genomic, from human and from experimental models, suggest there are different ways in which MMR can fail, and/or that the cell-type or genetic background may result in different types of MMR mutational patterns. The spectrum of MMR failures may direct cancer evolution, generating particular sets of driver mutations. Moreover, MMR affects outcomes of therapy by DNA damaging drugs, antimetabolites, nonsense-mediated mRNA decay (NMD) inhibitors, and immunotherapy by promoting either resistance or sensitivity, depending on the type of therapy.
Cancer sequencing studies have yielded a repository of genomic data of an extraordinary scale, which is growing by the day. Major catalogs of tumor genomes include The Cancer Genome Atlas (TCGA), containing approximately 10000 tumor whole-exome sequences (WESs) [1,2], the Pan-cancer Analysis of Whole Genomes (PCAWG) consortium [3], with approximately 2700 whole-genome sequences (WGSs), and the Hartwig Medical Foundation (HMF) dataset of approximately 2500 WGS metastatic tumors as published [4], and growing. A major motivation behind sequencing a large number of cancer exomes or genomes was to discover new cancer driver genes. This was largely a success: approximately 300–800 cancer genes, depending on methodology [1,5], were identified as recurrently mutated, indicating positive selection in tumors [6,7] and suggesting a role in driving cancer evolution. In addition to driver gene discovery, the tumor genomic data were used to study mutational processes by analyzing the passenger (non-selected) mutations, which are the majority of mutations observed in a typical cancer genome. These analyses helped elucidate mechanisms of mutagenesis and DNA repair that operate in tumor cells [8–12]. The mutation spectra are often used to derive ‘mutational signatures’, an output of a computer algorithm that captures the DNA sequence propensity towards distinct mutational processes that are active across many tumors; see below for more detail. These recent advances in genomic data generation and in the methodologies for the extraction of mutational signatures have revolutionized the study of DNA repair deficiencies in tumors. These are a common occurrence and an important mechanism that promotes cancer evolution.
Components of the mismatch repair pathway and their causative role in cancer risk
DNA mismatch repair (MMR) is a highly conserved system with a major role in maintaining genomic stability. A proficient MMR mechanism greatly increases DNA replication fidelity by detecting and initiating the replacement of DNA base mismatches and small loops (reviewed in [13–15]). MMR can also terminate erroneous recombination attempts (reviewed by [16]) and initiate a DNA damage response signal [17,18]. This article will not be dissecting the MMR pathway and will instead focus on the effects of a deficient MMR machinery on the genomic landscape, and the relevance thereof to cancer therapy. Nonetheless, we will describe in brief the most relevant MMR factors, further referring the reader to other excellent reviews on the molecular biology of the MMR system [15,19–22].
MMR has two crucial components, MutS, which detects the mismatches, and MutL, necessary for mismatch processing and excision. In human cells, mismatches and short loops of one to two nucleotides are detected by MutSα (a heterodimer of MSH2 and MSH6) and loops of varying sizes are detected by MutSβ (MSH2 associated with MSH3) [23–25]; unrepaired mismatches result in single-nucleotide substitution mutations, while unrepaired loops in DNA result in indel mutations. MutL is also a heterodimer with three different types in human cells: MutLα (MLH1 and PMS2), MutLβ (MLH1 and PMS1) and MutLγ (MLH1 and MLH3) [26–28]. Currently, only MutLα is considered to have a notable role in the general process of post-replicative MMR. The EXO1 nuclease is another important protein that participates in the MMR process during mismatch excision [29].
The most severe phenotypes, both in human cancers [30] and in mouse studies, have been observed when the cells lack MSH2 or MLH1, with either knockout behaving similarly [31]. This indicates that MMR cannot resolve the mutations without either MutS or MutL complexes. Mutations in other MutS and MutL genes, such as MSH6 and PMS2, have also been associated with cancer development at lower rates [15,21,32]. In mouse knockout studies, Msh6-deficient mice have a similar phenotype to Msh2-null mice, but can repair larger insertion-deletions and tumors with longer latencies [33], mirroring human tumors lacking MSH6 [20]. Msh3 (MSH3 in human) deficiencies have much milder phenotypes [34,35]. As for the MutL heterodimer, Pms2−/− mice develop cancers less frequently than Mlh1−/− and Msh2−/−, suggesting that Pms1 and/or Mlh3 might be partially substituting for Pms2 [36,37].
Detecting microsatellite instability resulting from MMR failures in tumors
The MMR pathway sometimes malfunctions in cancer. Dysfunctional MMR is often diagnosed in the clinic by measuring the levels of MLH1, MSH2, MSH6 and PMS2 proteins by immunohistochemistry. However, some cancers with defective MMR have normal expression levels of these proteins and so the activity of the MMR pathway can be more directly observed via a ‘microsatellite instability’ (MSI) mutational phenotype. In MSI, the lengths of microsatellite (MS) loci in DNA become unstable (these loci are also called ‘short tandem repeats’ or ‘simple sequence repeats’, here referring to those consisting of 1-mer to 4-mer repeat units, which are used to infer MMR activity; of note, a common definition of MS includes repeats with up to 6-mer units). Effectively, such DNA repeats contract and expand via indel mutations at very high rates, such that a population of cells may contain a mixture of alleles of various lengths at a single locus. This property allows MMR failures to be measured via a PCR and capillary electrophoresis-based assay, which provides a facile readout of MS lengths at chosen loci. Typically, five loci are examined that are a part of the Bethesda panel [38]: two mononucleotide repeats (Big Adenine Tract [BAT]-25 and BAT-26) and three dinucleotide repeats (D2S123, D5S346 and D17S250). Instability at ≥2 Bethesda loci means the tumor is MSI-H (unstable, high), at 0 loci is MSS (stable), while the intermediate case with only 1 unstable locus is labeled MSI-L (low). By this commonly applied criterion, >10% of uterus, colorectal, and stomach cancers are MSI-H, as well as 1–10% in some other cancer types (esophagus, stomach, ovary, prostate, leukemia, brain); additionally, there exist rare examples of MSI-H in other cancers such as breast or skin cancers.
Alternative systems also exist, e.g. a pentaplex panel was derived by screening 266 repeats to focus on five loci that were highly discriminative for MSI samples [39]; they are all mononucleotide repeats and are almost monomorphic (BAT-25, BAT-26, NR-21, NR-24 and NR-27). These loci may provide a more accurate readout than the Bethesda panel [40,41], for which it was noted that MMR-deficient cancers resulting from loss-of-function (LOF) in the MSH6 gene are challenging to classify correctly, since MSH6 loss does not cause strong instability at dinucleotide repeats. Simpler systems based on experimentally assaying only a single repeat locus (e.g. CAT-25 or HT-17) were also proposed, as well as more complex panels consisting of more than ten MS loci (see [42] for a recent review, and the references therein).
The advent of genomics has brought advances to the methods for determining MSI status. Instead of experimentally assaying only selected loci, a WES or a WGS can be analyzed, searching for indels at many MSs at once. Next, statistical analysis integrates the instability signal across various MS loci, and is used to predict global MSI or MSS status for each tumor sample (Figure 1A). These predictions often have high concordance with the standard PCR-based MSI assays, suggesting genomics can largely supplant the standard PCR+capillary electrophoresis methodology. In particular, in a head-to-head comparison of three genomic software tools against the PCR MSI calls as a gold standard in colorectal, uterine and stomach cancers, the concordance ranged from 83 to 97% for different tools [43]. It should be noted that the genomic algorithms may also reproduce the errors of these PCR-based MSI assays, whose output they were explicitly designed to mimic. Another limitation in using genomics to determine MSI status is the cost of the sequencing, and also that some bioinformatics expertise is required to analyze the data. However, the tumor WES or WGS that have been generated for other diagnostic purposes (e.g. identifying actionable driver mutations) can also be used in the detection of MSI, thus reducing the global cost and complexity of the analysis.
Genomic signatures of MMR failures, and possible subtypes thereof
(A) Methods for detecting MMR deficiencies (left) are based on assaying the instability of MS loci across the genome (middle). This can be done either by a PCR+electrophoresis experimental assay of various MS panels such as the commonly employed Bethesda panel (middle, above), or more recently by a statistical analysis of WGS/WES data across many MS loci simultaneously by various bioinformatics tools (middle, below). In both cases, the output is the binary-labeled MSS (stable) or MSI (instable) (right). The Bethesda panel can distinguish MSI-H (high) versus MSI-L (low), however the significance of this distinction is unclear. (B) Different molecular mechanisms are known to cause MMR failures, either through pathogenic germline variation, somatic mutations and/or copy-number alterations, or epigenetic silencing of different MMR genes or MMR-associated genes (left). All converge on a set of broadly similar genomic patterns involving a high burden of SNV and indel mutations. However recent data suggest that these mechanisms may be separable based on classifying the SNVs and indels by type, as defined in mutational signatures (middle) observed in genomes of cancers and of experimental models where MMR deficiencies were induced (right). For example, MSH6 LOF may generate similar burden of SNVs as MLH1 or MSH2 loss, however with fewer indels generated. Another example is that PMS2 loss might generate a different SNV trinucleotide spectrum than the MSH2 or MLH1 loss. Abbreviation: SNV, single-nucleotide variant.
(A) Methods for detecting MMR deficiencies (left) are based on assaying the instability of MS loci across the genome (middle). This can be done either by a PCR+electrophoresis experimental assay of various MS panels such as the commonly employed Bethesda panel (middle, above), or more recently by a statistical analysis of WGS/WES data across many MS loci simultaneously by various bioinformatics tools (middle, below). In both cases, the output is the binary-labeled MSS (stable) or MSI (instable) (right). The Bethesda panel can distinguish MSI-H (high) versus MSI-L (low), however the significance of this distinction is unclear. (B) Different molecular mechanisms are known to cause MMR failures, either through pathogenic germline variation, somatic mutations and/or copy-number alterations, or epigenetic silencing of different MMR genes or MMR-associated genes (left). All converge on a set of broadly similar genomic patterns involving a high burden of SNV and indel mutations. However recent data suggest that these mechanisms may be separable based on classifying the SNVs and indels by type, as defined in mutational signatures (middle) observed in genomes of cancers and of experimental models where MMR deficiencies were induced (right). For example, MSH6 LOF may generate similar burden of SNVs as MLH1 or MSH2 loss, however with fewer indels generated. Another example is that PMS2 loss might generate a different SNV trinucleotide spectrum than the MSH2 or MLH1 loss. Abbreviation: SNV, single-nucleotide variant.
Many software tools can predict MSI status from WES or WGS data, including MSISensor [44], mSINGs [45], Mantis [46] or MSISeq [47]. While the implementations differ, the general principle uniting these tools is that they analyze the distributions of MS lengths at many loci in short-read sequencing data. However, repeat-containing loci are particularly challenging for WGS/WES analyses, because of difficulties in aligning the short-reads to the reference genome and other types of artefacts (e.g. during library preparation, if using PCR-based protocols). These concerns are usually addressed by comparing the lengths of each locus between a tumor and a normal sample, which would, presumably, both be similarly affected by various biases, and a significant difference would reflect a genuine biological process. This is followed by a step where data are integrated across all loci to classify the sample as MSI, usually requiring that a certain percentage of the loci genome-wide are unstable (see a comparison in [43]). However, instead of simply counting stable versus unstable loci, more complex machine-learning approaches can also be used to classify MSI versus MSS, performing implicit selection of loci that maximize ability to discriminate known MSI and MSS examples on a training dataset [47,48].
Mutation signature analysis from cancer genomes can identify DNA repair phenotypes
Mutational processes that were operating in a tumor genome are often quantified by estimating risk of each possible trinucleotide to accrue different types of single-nucleotide variants (SNVs) [8]. The process to obtain this ‘mutational spectrum’ consists in the classification of the somatic mutations according to the sequence context features thought to be relevant for mutation rates. The mutation spectra of many tumor samples are then jointly input to a Non-negative Matrix Factorization (NMF) algorithm, which aims to generate independent patterns (‘signatures’) corresponding to mutational mechanisms, as well as the weights of each pattern in each tumor sample (often referred to as ‘exposures’); of note, in addition to NMF other factorization algorithms were also applied, such as topic models [49]. Commonly, the process is performed using the 5′ and 3′ adjacent nucleotides for the six possible mutation types (considered strand-symmetrically, e.g. a T > C mutation is tallied together with an A > G mutation), for a total of 96 categories. This concept was also extended to pentanucleotides [10,50,51], indels [8,51,52], double-nucleotide substitutions [51] and copy number alterations (CNAs) [53–56]. Due to the independence of each signature, it is often assumed that each pattern represents a single molecular mechanism, thus describing how mutations in that tumor were generated. For some mutational signatures, underlying mechanisms were identified via associations with carcinogenic exposures, for instance C > A mutations correlated with tobacco smoking exposure, YC > YT substitutions (where Y = C or T) correlated with UV exposure, A > T changes correlated with exposure to the aristolochic acid carcinogen, and TC > TG and TC > TT substitutions correlated with expression levels of the APOBEC3A gene encoding a mutagenic enzyme. Experimental work using controlled exposures of human cell lines to mutagenic agents usually recapitulated the signatures suspected to be generated by that agent in tumors [57–59], suggesting that mutational signatures are a useful tool to ascertain the mutagenic exposures of a tumor cell (or its non-cancerous progenitor). This is not limited to chemicals or radiation: also the DNA repair failures engineered into human cultured cells or into model organisms [52,60,61] result in mutation patterns resembling those found in tumor genomes. This suggests that genomic data analysis approaches can distinguish different DNA repair failure mechanisms. It also bears mentioning that results of mutational signature inference do vary depending on the computational method employed and on the parameters used in running the algorithm, and might also be affected by tissue-specificity to some extent [62,63]; additional variables that affect results are the method used to call the mutations, and the quality of sequencing data. Because of this variation, it is not straightforward to establish the mechanism underlying an observed mutational pattern by comparing to mutation signatures generated from model systems. There may not always be a one-to-one mapping—at least using current methods—between a specific DNA repair mechanism and an individual mutation signature. Extracted mutational signatures are often difficult to resolve, and some signatures may represent mixtures of mechanisms in various proportions. Additionally, currently most of the SNV trinucleotide signatures observed in tumors ahave not been linked to a specific mechanism at all; elucidating their mechanistic underpinnings is a matter of much interest. Some of the mutational signatures observed are thought to capture artefacts, either from sequencing or from idiosyncrasies of the computational analysis. Statistical methods for investigating mutational signatures are still actively developed. Recent and upcoming work [49,64–67] may address many of the standing issues with the methodologies.
In addition to trinucleotide signatures and derivatives thereof such as clustered mutation signatures [68,69], other types of mutation patterns may provide deeper insight into mutagenic mechanisms. For example, signatures can be derived from spectra of short indels [51], or from CNAs or other types of large-scale structural variation (also referred to as rearrangements) [54,70,71]. Even though mechanisms remain unclear for most of the proposed indel and rearrangement signatures, these methods did demonstrate practical utility. Applications to cancers deficient in homologous recombination repair (HRR) [55,72] has revealed distinct rearrangement patterns between subtypes of HRR deficiencies. The two genomic HRR deficiency signature subtypes associate with pathogenic variants in the BRCA1 gene, versus BRCA2 or PALB2 genes; it is conceivable that this dichotomy might have implications for disease prognosis and treatment response. More generally, mutational signatures derived from SNVs and from CNA spectra were suggested to be predictive of survival in various tumors, for example ovarian and esophageal cancers [56,73].
In addition to SNVs, indels and rearrangements, other types of patterns in cancer genomes can inform mutational signature analyses. This includes, e.g. strand-biases of mutations, which can quantitate transcription-coupled and DNA replication-coupled mechanisms [74,75]. Another example is the local density of mutations, which correlates with activity of MMR and/or nucleotide excision repair pathways in that tumor [76–78]. Clustered mutagenesis describes mutations due to activity of the TLS enzymes (error-prone DNA polymerases) or to APOBEC3 mutagenic cytosine deaminases [68,69]. These various types of mutation patterns can be incorporated into statistical frameworks to jointly inform mutational signature analysis and help subclassify processes that generate or prevent mutations, including DNA repair pathways [67,79]. In particular, faults in the DNA MMR pathway are reflected in several different types of ‘mutational scars’ observed in cancer genomes. Joint analysis of different mutation pattern types has the potential to pinpoint the mechanism of how this important DNA repair pathway failed in each particular tumor.
Mutational patterns associated with MMR failures observed in tumor genomes
Genome-wide statistical analyses of MS indels across many different cancer types [48,80–82] have identified cancer driver mutations emerging at MS loci. Additionally, and more pertinent to the topic of this text, these studies have also characterized the genome-wide landscape of somatic indel mutations at all MS loci accessible to short-read WGS or WES (i.e. the majority of the human genome, except the highly repetitive DNA such as pericentromeric regions). One suggestive finding of these global analyses of thousands of tumors is that MSI may be present in additional cancer types than was previously appreciated. For instance, adrenocortical carcinoma, cervical cancer and mesothelioma present instances of genomically MSI-like tumors, which (as supporting evidence for MMR failures) also have higher overall mutation burdens [46] suggesting they are bona fide cases of failed MMR. Interestingly, these analyses hint at a tissue-specificity of MSI phenotypes—different MS loci appear unstable in different cancer types [81]. This is consistent with regional patterns observed in somatic SNVs, which also vary across cancer types, likely due to heterogeneity in the activity of the MMR pathway coupled to replication timing changes across tissues [76,83].
In addition to the MS-based genomics studies described above, the analysis of the SNV mutational signatures [8,51] has shown that several trinucleotide signatures associate with the MSI-H phenotype (COSMIC Signature 6, 14, 15, 20, 21, 26 and 44) in tumors from different organs. Why there exist so many different MMR-associated SNV signatures is enigmatic. Some of them—in particular Signature 14 and Signature 20—associate with concurrent mutations in replicative DNA polymerases (ϵ and δ, respectively) and so might plausibly reflect different combinations of mechanisms, however for the remainder of the MMR signatures differences in etiology are less clear. One possible explanation is that there are tissue-specific factors which can modulate a mutational process (e.g. differences in free nucleotide pools), thus resulting in somewhat different signatures resulting from that same process in each tissue [63].
The SNV trinucleotide mutational signatures can, similarly as a MS length analysis, be used to construct an MSI classification model based on genomics data (see e.g. MMRDetect, [60]). The more recently introduced indel mutational signatures also associate with MSI-H status in tumor genomes [51] (COSMIC signatures ID1, ID2 and ID7, containing an abundance of indels at MSs) and thus would be predictive, however this type of indel signatures would be largely redundant with the MS length analysis. In addition to genome sequences from human tumors, MMR failures have been induced in human cell lines, followed by WGS. This yielded increased mutational burdens and distinct trinucleotide SNV mutational patterns in cell lines, which were broadly comparable with those observed in cancer genomes for MMR genes commonly inactivated in cancer, MLH1, MSH6, MSH2 and PMS2 [60,84,85]. However it should be noted there were also differences in mutation patterns between these individual genotypes, as detailed below.
In summary, recent genomic comparative analyses of the MSI versus the MSS tumors [48,80–82] have revealed indel-associated cancer driver genes, diverse indel mutation patterns at MS loci and also diverse trinucleotide SNV mutation signatures resulting from failed MMR (Figure 1B). This genomics work hints that the MMR failures might occur in multiple forms and that the classical MSI-H versus MSS dichotomy might be an oversimplification. Moreover, some data presented in these studies suggest that MMR failures might be more common across tumors than anticipated by the classical, PCR and electrophoresis-based MSI tests based on the Bethesda panel (see below).
Evidence for different subtypes of MMR failures observed in human cancer
As detailed above, genomics analyses of diverse types of mutation patterns can comprehensively identify MMR failures, providing an estimate of their prevalence. We posit that future statistical analyses of genome-wide mutational patterns may discover subtypes of MMR failures, as well as the relevance of such subtypes for cancer evolution and for the clinical course of the disease, including response to chemotherapy and immunotherapy. This reasoning is supported by a considerable amount of pre-genomic studies suggesting that there may exist different subtypes of MMR failures and/or different ways in which MMR failures may become manifest, perhaps depending on the cell type or on the genetic background of each tumor.
The EMAST phenotype
In addition to the commonly recognized MSI-H phenotype, there exists a distinct repeat instability phenomenon termed EMAST, for ‘elevated MS alterations at selected tetranucleotide repeats’. This generates instability mostly at the AAAG or ATAG tetranucleotide repeats; in contrast, MSI-H is detected at mononucleotide and/or dinucleotide repeats. There is evidence to suggest that the two phenotypes affect cancer types differentially. While MSI-H is usually seen in uterus, colorectal, and gastric cancers, EMAST is also common in other cancer types: one-third or more of lung cancers, skin cancers and bladder cancers were reported to show EMAST [86,87]. This suggests that EMAST is a mechanistically distinct phenomenon from MSI. In support of this, ∼40% of the MSS colon cancers show some degree of EMAST associated with the loss of the MSH3 expression [88], while MSI tends to be associated with loss of activity (via mutation, damaging germline variant or epigenetic silencing) of other MMR genes, in particular MLH1, MSH6 and MSH2. This dichotomy is consistent with the known molecular function of the MSH3/MSH2 dimer (MutSβ), which preferentially recognizes DNA loops of 2–13 nucleotides [89], while the competing MSH6/MSH2 dimer (MutSα) recognizes single base mispairs and two-nucleotide bulges. Thus, it is expected that MSH6 efficiently prevents 1–2 nt long but not longer indels, even in absence of MSH3. This is consistent with the absence of classical MSI (largely observed as indels—often 1-nt deletions—at mononucleotide repeats) in many EMAST tumors (which exhibit indels at tetranucleotide repeats, possibly due to loss of MSH3 activity). In a WGS study of genetically engineered human cell lines, the MSH6−/− cells indeed generated a large excess of SNVs and 1-nt indels, while the increase in 2+ nt indels was somewhat milder relative to e.g. MSH2−/− and MLH1−/− cells (the latter are expected to indirectly abrogate also the MSH3 activity) [60]; note that MSH3 LOF was not explicitly assayed therein, nor the instability of these mutants at particular loci.
The MSI-L phenotype
Further, the MSI-L (low) phenotype readout is defined as 1 (of 5) Bethesda MS locus affected by instability. This was originally intended to signify an intermediate degree of MMR failure between MSS (0 loci) and MSI-H (≥2 loci), or alternatively it may represent those MSS cases that randomly accrued an indel in one of the five marker loci. However, it was also proposed that MSI-L may be a distinct type of MMR failure, rather than simply a lower-intensity MSI-H phenotype [90]. In particular MSI-L seems to be more commonly associated with instability at dinucleotide rather than mononucleotide repeat loci, suggesting a different underlying mechanism. MSI-L is observed in melanoma and pancreatic tumors; these tumor types are infrequently MSI-H. MSI-L is associated with gene expression patterns that are distinct from MSI-H, supporting that it is generated by a distinct biological process [91]. Evidence from mouse models implicates loss of activity of the PMS2 gene homolog in MSI-L [92]. In the Mlh1/Pms2 MMR complex in mouse, the loss of Pms2 functionality can be partially replaced by the Mlh3 subunit [37]. Thus, some activity of the MMR pathway is retained upon Pms2 loss, which appears compatible with the observation of a reduced-intensity MSI-like phenotype. Consistently, a human PMS2−/− cell line genome sequence exhibits a different spectrum of indel mutations at MS repeats, with an enrichment of insertions compared with MLH1−/− and MSH2−/− cell lines (which were similar to each other) [60], although it is not clear whether specific MS loci were differentially affected. Apart from MMR, there is evidence to implicate base excision repair pathway and/or translesion DNA polymerases as contributing to MSI-L [90], and moreover an overlap was reported between the MSI-L designation of colorectal cancers and the MSH3-deficient EMAST phenotype [88]. Additionally, it was suggested that deficiencies in the POLE protein (replicative DNA polymerase ε) can also result in MSI-L [80]. These examples further illustrates how different DNA repair pathway disturbances might converge on to apparently similar mutational phenotypes. Genome-wide studies using statistical methods designed to distinguish MSI-H from MSS tumors usually group MSI-L together with MSS, and not with MSI-H [93], however such analyses do not necessarily focus on stratifying mutagenic processes by repeat unit type and other characteristics. Future genomics analyses may shed light on the question whether MSI-L tumors result from distinct mechanisms of DNA repair failures, rather than simply representing a milder version of the MSI-H phenotype.
Multiple SNV signatures associated with MMR failures
Analysis of trinucleotide mutational signatures suggest that up to seven different trinucleotide mutational signatures are associated with the MSI-H label across cancer types (as mentioned earlier) [51], suggesting multiple underlying mechanisms of MMR failures. Two of seven signatures are due to concurrent mutations in replicative DNA polymerases [94,95], but the mechanisms underlying the other five are unaccounted for. Clearly one possibility is that these result from deficiencies in different core MMR factors: for example the PMS2−/− cultured cells exhibited a different mutational spectrum, considerably more rich in A>G/T>C mutations, compared with generally C>T/G>A-rich spectra of MSH2 or MSH6 mutant cells [60,96]. Furthermore, alterations in trans-acting factors can impact on MMR efficiency, for example the overexpression of the microRNA, miR-155, in colon cancers [97], deletions in genes MTOR (FRAP1), HERC1, PRKCZ and PIK3C2B in leukemia [98] and loss of the SETD2 chromatin modifier [99], which occurs in kidney, lung and colon cancers; it is well conceivable that some of these might result in particular mutation signatures. LOF in the ARID1A chromatin remodeler was suggested to result in an MMR-like mutational phenotype [100,101]. A somewhat related explanation underlying this diversity of MMR signatures is that there may exist tissue-specific variation in the ‘expression’ of the same mutational process such as MMR failing [63]. This could happen conceivably due to different activity of accessory or backup DNA repair proteins or relevant metabolites in different tissues, which change the signature of MMR failure sufficiently such that it is detected as discrete mutational signatures. While this tissue-specificity may or may not be pathologic in origin (i.e. it may not represent a failure of DNA repair per se), arguably the overall mutagenic mechanism is different and may result in varied implications for cancer evolution and for therapy response. Additionally, some of the MMR signatures might result from interactions of failed MMR with exposures to external DNA-damaging agents [52,102,103], in which case they would not, strictly speaking, be considered subtypes of MMR failures. Finally, it cannot be excluded that the multiplicity of MMR signatures observed could, in part, be due to limitations of the signature inference algorithms that have overestimated their numbers.
The data on tissue specificity, possible underlying genes/mechanisms, and genomic instability signatures of known and anticipated subtypes of MMR failures are summarized in Supplementary Table S1.
Evidence for a higher prevalence for MMR failures across cancer types
Furthermore, recent analyses of genomic data suggest MMR failures are not only of diverse types, but that they might also be more prevalent than currently appreciated. There are various circumstantial evidence anticipating that some degree of MMR dysfunction might be more common in some tumor types than reported by the MSI-H phenotype of the Bethesda MS panel, although currently this remains speculative and should be followed up in future work. In particular:
- ‐
Previously reported—not based on genome-wide searches but on assaying various MSI loci individually—frequencies of MSI in ovarian cancer were approximately 10–12% [104,105]. However, a classifier that reproduces the Bethesda MSI-H label (using genomics) classed only 3.2% ovarian tumors as MSI-H [48]. This suggests that the commonly employed Bethesda panel based readout might underestimate MSI prevalence in some cancer types such as ovarian cancer, head-and-neck and cervical cancer [48]. However, MMR failures would be expected to also generate a high burden of SNV mutations in those tumors, which was not always reported.
- ‐
Studies that used WES/WGS to examine instability of MS loci showed that some tumors from tissues such as breast [80] or kidney and liver [81] or adrenocortical and cervical cancer and mesothelioma [46], which are rarely classed as MSI-H, do sometimes have substantial numbers of indel events at MSs. This suggests that a milder MSI-like phenotype might be widespread (or, alternatively, that there exists other factors that result in unstable MSs but are not related with MMR). For adenocortical and cervical cancer, the genomics-based observation of MSI was also supported in the increased mutation burden of SNVs [46].
- ‐
Because the distribution of the number of MS indels in tumor genomes is not strictly bimodal, it was suggested that MSI is a continuous phenotype (which also associates with survival outcomes, meaning this continuum is physiologically relevant) [48,80,81], rather than a discrete subdivision of MSI-H versus MSS groups. This suggests that some tumors may have sufficiently functional MMR to get an MSS readout on the Bethesda panel (perhaps because a small number [n=5] of loci examined) but might still deviate from a baseline MSS state, suggesting a possibility of a subtle MMR dysfunction.
- ‐
Trinucleotide SNV signature analysis suggests that the signature resulting from a PMS2 gene knockout in cell lines resembles the SNV Signature 12 (also abbreviated as SBS12, for ‘single base substitution’) [60]. SBS12 is common in liver hepatocellular tumors and is sometimes seen in other cancer types [51] and, importantly, it was not reported to be associated with MSI-H. Thus, some cancers—hypothetically, resulting from a PMS2 gene silencing or mutation—may have MMR failures that are not seen as MSI by the Bethesda panel. Another study has reported a mutational signature called RefSig MMR2 [63] with the trinucleotide spectrum resembling SBS12 (and also SBS26, a known MSI-associated signature) and commonly occurring across multiple tissues, particularly ovary, liver, lymphoid, bone and soft tissue, central nervous system and breast.
- ‐
A related observation is that of the indel mutational signature ‘ID4’ [51], which consists of short deletions only, however it does not appear associated with the canonical MSI-H phenotypes. This was observed in approximately 10–30% kidney, lymphoma/CLL, bladder and breast tumors (i.e. not the classically MSI-prone tissues). ID4 also occurs in the sometimes MSI-prone tissues of esophagus and ovary and uterus, but is not associated with MSI status thereof. An important caveat about interpreting SNV and indel mutational signatures is that the statistical methodologies to infer the signatures are still maturing, and the signature catalogs might change substantially with newer methods and arrival of more data. Thus results from mutational signature analyses should, currently, not be taken as definitive evidence of prevalence of MMR failures.
- ‐
It was proposed that differential activity of MMR is a cause of tumor mutation rate heterogeneity at the megabase scale; thus MSI-H cancers lose such heterogeneity—their genome-wide mutation landscapes become ‘flat’ [76]. However some cancers that are not MSI-H also have ‘flat’ mutation rate landscapes, seen in e.g. kidney or breast or pancreas [unpublished data]. A part of these cases can be explained by a presence of mutational signatures which do not obey the normal mutation rate variability patterns such as APOBEC mutagenesis [69,106], but many of the remaining cases of the ‘redistribution’ in mutation rates [107] remain unexplained. For example this occurs in kidney cancer, where APOBEC mutation patterns are rare, however the MLH1 gene is sometimes deleted (since it is located near VHL, a major tumor suppressor gene). This suggests MMR activity loss as a possible explanation for the unusual distribution of mutation rates across these kidney genomes.
- ‐
Overall, genomics has generated various reports hinting at that MMR failures might have diverse causes, which are not fully captured by the unidimensional Bethesda panel MSI-H versus MSS dichotomy. These studies moreover suggest that MMR might fail in more tumors than anticipated, however at more moderate levels than the extreme MSI-H, although further supporting evidence is needed to ascertain whether these genomic patterns correspond to bona fide MMR deficiencies.
Impact of MMR failures on cancer evolution
MMR status as a determinant of mutation rates across model organisms
The MSI-H phenotype has strong potential to affect gene function as the increased accumulation of indels and SNVs can lead to disruption of gene coding sequences and regulatory sequences. Studies in model organisms have measured these increased mutation rates by comparing strains deficient in core MMR genes to background mutation rates in the wild-type controls [108]. Whole-genome mutation rate estimates were measured in yeast strains deficient in MSH2 [109,110]. These studies yielded a mutation rate of tens (∼32–71) of mutations Gd−1 (per Gbp and per division), a ∼150× increased mutation rate compared with their MMR-proficient counterparts.
Earlier reports [111,112] in an animal model (the worm Caenorhabditis elegans), estimated a similar increase in mutation rates (∼100-times) in MutSα (worm genes msh-2 and msh-6, whose human orthologs are MSH2 and MSH6, respectively)-deficient strains. However, these reports used mutational gene reporter assays, which can give biased estimates. More recently, WGS of worm strains defective in MutLα (mlh-1 and pms-2) [52,113] have yielded similar (∼71.5 Gd−1) mutation rate estimates [52].
In humans, cultured cells were sequenced after knocking out 43 DNA repair genes and letting the cells undergo tens of divisions. Of those, nine genetic backgrounds reported a significant increase in mutation rates, six of which were MMR genes [60]. This result highlights the importance of MMR in controlling the stability of genomes of unexposed dividing cells. The average mutation rate for the clones deficient in the core MMR genes (MLH1, MSH2, MSH6, PMS2 and EXO1) yielded a value of ∼93 SNV per division (31 Gd−1) and ∼69 indel mutations per division (23 Gd−1) [60], which is roughly an order of magnitude increase over the baseline rate in the same cell line with a control gene k.o. Other human cell lines/organoids were examined for k.o. of the MLH1 [85] and MSH6 [84] genes, similarly reporting mutation rate increases, and additionally confirming results from genomic analysis suggesting altered trinucleotide spectra [8,114] and genome-wide distributions of mutations [76,115] in MMR-deficient tumors.
Increased cancer incidence due to inheritable MMR deficiencies
Overall, the major increase in mutation rate due to MMR deficiency in the different experimental models is remarkably consistent, highlighting an important role of MMR in maintaining genome integrity, across the tree of life.
Thus, it is not surprising that the germline inactivation of MMR genes causes a higher predisposition and earlier onset for several types of cancer, a consequence of increased mutation rates. Lynch syndrome (also called the hereditary non-polyposis colorectal cancer syndrome, HNPCC) is the most studied case of an MMR deficiency condition, responsible for many hereditary colorectal cancer cases [116,117], as well as for multiple other cancer types, including but not limited to uterus (endometrial), stomach, ovarian and bladder tumors. Although most of the causal variants affect the core MMR genes: MSH2, MLH1, MSH6 and PMS2, the pathogenic variants in these genes can be detected in only a fraction of familial colon cancer cases [117,118]. A potential explanation is that another condition apart from Lynch syndrome underlies the familial colorectal cancer risk; this includes genes not involved in MMR such as APC, or the base excision repair genes MUTYH and NTHL1 or the replicative DNA polymerases POLE or POLD1 [119]. Remarkably, in approximately half of the MMR-deficient colorectal cancers a genetic cause cannot be determined. This ‘Lynch-like syndrome’ also runs in families and has a broadly similar tissue-type spectrum as the standard Lynch syndrome [120–122]. This might perhaps be ascribed to variants can affect accessory genes that modulate MMR activity (this was demonstrated for somatic deletions in several genes in leukemia [98], as well as somatic mutations in some chromatin modifiers; see above). There is a clear need for more studies to identify causes of yet-unexplained part of familial tumor risk, which is likely to be in part due to MMR failures [118].
Although Lynch syndrome has a dominant pattern of inheritance, the single mutated allele is not sufficient to cause the disease. A second-hit somatic alteration is needed to develop cancer, which can occur via loss of heterozygosity (LOH) [123], somatic mutations [124] or by methylation of the promoter for MLH1 [117,125]. Interestingly, the tissue spectrum of Lynch patients may differ depending on the gene affected, where for instance MSH6 and PMS2 carriers were reported to have increased breast cancer risk, unlike MLH1 and MSH2 carriers [126,127]. A condition closely related with Lynch syndrome is the Muir–Torre syndrome, where sebaceous skin tumors co-occur with gastrointestinal tumors, and this results from germline variants in MSH2 or MLH1 genes [128].
A rare type of a germline MMR syndrome is the constitutional MMR deficiency (CMMRD), which drastically increases the rates of early-onset cancers in the brain and blood [11]. In these cases, the germline inactivating variants are homozygous, thus generating an increased mutation rate much earlier in life. As in Lynch syndrome, the core MMR genes are affected [11,129], however the most commonly associated genes are PMS2 and MSH6, while in the standard Lynch syndrome the MSH2 and MLH1 genes predominate [130]. Somatic mutation patterns in CMMRD tend to be different from sporadic MMR-deficient cancers and from Lynch cancers [131]. Some patients develop a characteristic ultramutator phenotype, which results from the combination of the MMR loss and mutations in either POLE or POLD1 replicative DNA polymerases [11]. A closely related condition is Turcot syndrome, where colon polyps occur together with tumors of the central nervous system. A part of the Turcot syndrome cases result from germline variants in MLH1, MSH2 or PMS2 genes which are homozygous or compound heterozygous [132].
Consequences of MMR failures to the integrity of gene sequences
The most characteristic mutation type in MSI tumor samples is the lengthening or shrinkage of MS repeats. This is caused by the slippage of DNA strands, resulting in a loop, which is, during replication, converted into an indel at the MS locus—a type of mutation that is effectively prevented by active MMR. The usual functional consequence of indels in the protein-coding regions is a frameshift mutation, which commonly results in downstream termination codons and thus causes the silencing of the gene via nonsense-mediated mRNA decay (NMD) surveillance pathway [133].
Although the majority of MS repeats are located outside coding sequences, there are nonetheless ∼380000 candidate repeats in exons in human [48,80]. By analogy to estimating positive selection on driver genes from recurrence of SNVs in exons [6,7], methods exist that estimate the positive selection via recurrence of MS events in gene exons [80]. The genes that appear selected due to the accumulation of MS indel events are in general tumor-suppressor genes, consistent with the tendency of MS mutations to generate LOF events. The most recurrently mutated gene is ACVR2A which was previously identified [81,134] as a cancer gene in colorectal and gastric cancer. Other examples of tumor suppressor genes with recurrent MS events are MSH3 (curiously, an MMR gene itself), RNF43 (also previously identified in [135]), JAK1 and PRDM2 [48,80,81]; see also the cases of MRE11 and RAD50 described below.
The MS mutational events can also target non-coding regions, which may be important for gene regulation. One example is the accumulation of MS mutations in UTRs of certain genes such as FOXP1 and STC1 [48,136]. A significant accumulation of MS mutations in the 3′ UTR was also systematically detected [48] suggesting that a potential mechanism underlying this could be the selection for altered regulation of the mRNA stability of oncogenes [48,137]. Data from MMR-deficient cell lines [136] seem however to suggest a depletion of expression. Thus, the causal role of somatic MS events within the non-coding portions of the genome such as UTRs are still unclear, as is more generally the case for other kinds of somatic mutations in UTRs, although it seems likely that individual examples of cancer genes will be affected [138–140].
One of the important considerations of MSI that became evident only from WGS is that, in addition to overall increased mutation rates, there is a re-distribution of mutation density genome-wide [107]. Normally, somatic mutations accumulate preferentially in the late-replicating, heterochromatic parts of the genome [141,142], which contain fewer genes and fewer other functional elements. In MMR-deficient samples however, mutation densities lose their normal correlation with replication time, wherein the protection for domains with a high density of active genes is removed [76,85,115,143]. One consequence of this is that the genes located in very early-replicating regions will suffer a disproportionally large increase in mutation rates in MSI tumors, while genes in later-replicating regions are expected to have a smaller increase in mutation rates in MSI. This redistribution is expected to affect the repertoire of cancer driver genes mutated in MMR-proficient versus MMR-deficient cancers, although this to our knowledge still remains to be studied.
Mutagenic DNA mismatch repair in cancer
Recent genomic studies have raised interesting possibilities that MMR activity (rather than lack thereof) may actually promote accumulation of certain mutation types under certain circumstances. One example of this scenario is the non-pathogenic process of somatic hypermutation (SHM), which diversifies antibody genes during maturation of B lymphocytes. The MMR pathway is recruited to the SHM target regions in its non-canonical form, which recruits the error-prone DNA polymerase η (POLH protein) that increases mutation rates during repair [144]. Genomic evidence suggests that POLH mutational signatures appear abundant in cancer, are associated with activity of normal MMR, and also with exposures to certain carcinogenic agents like UV, oxidative stress, tobacco smoking or alcohol consumption [68,145]. Hypothetically, under these exposures, the association of MMR activity with the early-replicating, gene-dense part of the genome is detrimental, as errors from the error-prone synthesis of POLH accumulate therein [68].
Another example of how MMR changes the distribution of mutations can be found is the activity of the APOBEC enzymes, important mutators in many types of human cancers [146–148]. APOBEC activity is confined to single-stranded DNA, which can commonly occur as an intermediate during repair of double-stranded breaks [147–149]; APOBEC activity thereon results in a characteristic pattern of ‘mutation showers’ (kataegis). Moreover, experimental evidence suggested that a potential source of this substrate could be originated by MMR, in particular the excision by the EXO1 nuclease (possibly, by hijacking a repair intermediate during BER) [150,151]. In our recent genomics study [69], we analyzed mutation clustering patterns in human tumors, detecting a prevalent ‘mutation fog’ pattern (omikli), which can be associated with MMR activity. This mechanism may explain around two-thirds of all the APOBEC mutations in cancer, with increased accumulation in early replicating, gene-rich regions thus increasing the functional impact of APOBEC mutagenesis [69].
MMR genes as pro-apoptotic tumor suppressors
MMR genes are tumor suppressor genes not only because they control mutation rates, but also because of the role of MMR as a pro-apoptotic pathway under certain DNA damaging conditions. A clear example where this role is relevant is treatment by the DNA alkylating agent and anti-cancer drug TMZ. TMZ is metabolized and reacts with DNA to form O6- and N7-methylguanine (MeG) or N3-methyladenine (MeA) [152,153]. The O6-MeG can be reversed by MGMT, a direct DNA repair enzyme [154], while N7-MeG and N7-MeA are repaired by base-excision repair (reviewed in [153,155]). Importantly, O6-MeG lesion does not block the replicative DNA polymerase during copying, but results in mismatches (the methylated G base-pairs with a T) that can be recognized and processed by the MMR machinery [156]. TMZ is mostly employed in brain tumors with a silenced MGMT gene [157,158] but these patients can develop resistance to the treatment. These relapsing brain tumors often carry a hypermutator phenotype and are associated with inactivating mutations in the MSH6 gene [102,103,159,160]. The mutational signature SBS11 can be extracted from genomes of these hypermutators [8,113] suggesting its association with the TMZ treatment. Further reports in agt-1 (an MGMT homolog) deficient C. elegans worms [113] provided an experimental validation for this association.
Overall, the above suggests a genetic interaction of both a first MGMT depletion event, combined with a posterior MMR deficiency that promotes the survival of relapsing clones. This phenotype is possibly due to the avoidance of the ‘futile cycling’ of MMR [157], which occurs when MMR repetitively tries to but fails to correct a mismatch (here, involving a methylated nucleotide), signaling apoptosis.
MMR failures generate genetic events that sensitize to therapeutics
The mutations resulting from MMR failures also can provide markers of resistance or sensitivity to chemotherapeutic drugs (Figure 2). This can occur in two ways: either the process itself can be targeted, or its downstream consequences can be targeted. Firstly, the failed MMR pathway itself may present a vulnerability as it increases reliance on auxiliary pathways for DNA replication or repair of certain lesions. Secondly, the mutations that accumulate at increased rates after failed MMR can disable specific genes—most notably those genes bearing MS repeats, which are hotspots for mutagenesis—therefore rendering the tumor cells vulnerable to interventions.
MMR failures in cancers can cause resistance or sensitization to certain therapies
(A) MMR dysfunction may sensitize tumors to various therapies via different mechanisms; some of these are well established and applied in the clinic (immunotherapy), while others are proposed based on known molecular mechanisms and/or studies on experimental models, however awaiting clinical validation (in panel A). A common consequence of MMR failures are indel mutations at MS repeats, which can be in coding regions and so generate frameshifted, and usually truncated, mutant proteins (note that some indels at repeats also in introns can induce misplicing, consequently again resulting in frameshifting and truncation of coding regions). Expression of such truncated proteins is often silenced by the NMD surveillance pathway that degrades mutant mRNA. However some frameshifted mRNAs can escape NMD detection and generate neoantigens that sensitize to immunotherapy, or that generate toxic proteins (e.g. HSP110 mutant), suggesting that beneficial effects of therapy might be potentiated by NMD inhibition. Frameshifting indels also can cause LOF in DNA repair proteins e.g. MRE11 and RAD50, sensitizing to irinotecan and potentially to DNA damage-signaling drugs e.g. ATR inhibitors. Finally, expansions at repeats due to MMR failure increase reliance on the WRN DNA helicase. (B) MMR inactivation causes resistance to some commonly used cancer drugs, which can trigger MMR activity—either by methylation of nucleobases [TMZ, dacarbazine], or by incorporation of chemically modified nucleotides [6-TG, 5-FU] that can register as mismatches—potentially resulting in apoptosis via DNA damage signalling. A dysfunctional MMR results in reduced clinical utility of such drugs, as well as in increased mutation rates in the MMR-deficient tumor upon treatment, which can generate further mutations in cancer driver genes or in drug resistance genes. Abbreviations: 5-FU, 5-fluorouracil; 6-TG, 6-thioguanine.
(A) MMR dysfunction may sensitize tumors to various therapies via different mechanisms; some of these are well established and applied in the clinic (immunotherapy), while others are proposed based on known molecular mechanisms and/or studies on experimental models, however awaiting clinical validation (in panel A). A common consequence of MMR failures are indel mutations at MS repeats, which can be in coding regions and so generate frameshifted, and usually truncated, mutant proteins (note that some indels at repeats also in introns can induce misplicing, consequently again resulting in frameshifting and truncation of coding regions). Expression of such truncated proteins is often silenced by the NMD surveillance pathway that degrades mutant mRNA. However some frameshifted mRNAs can escape NMD detection and generate neoantigens that sensitize to immunotherapy, or that generate toxic proteins (e.g. HSP110 mutant), suggesting that beneficial effects of therapy might be potentiated by NMD inhibition. Frameshifting indels also can cause LOF in DNA repair proteins e.g. MRE11 and RAD50, sensitizing to irinotecan and potentially to DNA damage-signaling drugs e.g. ATR inhibitors. Finally, expansions at repeats due to MMR failure increase reliance on the WRN DNA helicase. (B) MMR inactivation causes resistance to some commonly used cancer drugs, which can trigger MMR activity—either by methylation of nucleobases [TMZ, dacarbazine], or by incorporation of chemically modified nucleotides [6-TG, 5-FU] that can register as mismatches—potentially resulting in apoptosis via DNA damage signalling. A dysfunctional MMR results in reduced clinical utility of such drugs, as well as in increased mutation rates in the MMR-deficient tumor upon treatment, which can generate further mutations in cancer driver genes or in drug resistance genes. Abbreviations: 5-FU, 5-fluorouracil; 6-TG, 6-thioguanine.
Several studies have found an association between the MSI background of a tumor and sensitivity towards therapies using irinotecan [161–164]. The proposed mechanism is that the failure of MMR is not in itself sensitizing the cancer cells to irinotecan, but that MSI generates mutations in other DNA repair genes that would otherwise be able to fend off the damage caused by the drug. Irinotecan binds to topoisomerase I, inhibiting the religation of DNA; this can result in dsDNA breaks upon conflicts with the replication machinery. Once the breaks occur, MRE11 together with RAD50 and NBS1 can repair them, avoiding apoptosis. Both MRE11 and RAD50 genes are known to be commonly mutated in MSI cancers [164–166] due to the MSs within their coding sequence (for RAD50) [165] or intronic sequence (for MRE11; despite being intronic the indels were reported to be deleterious to gene function) [166,167]. Further studies using cancer cell lines with mutated MRE11 and RAD50 have also reported increased sensitivity to irinotecan [164]. Considered together with experiments using isogenic cell lines differing only in MRE11 status, these indicate that MMR activity itself is not directly causal to the sensitivity to irinotecan [163,168]. Notwithstanding, with approximately three-quarters of MSI cancers bearing mutations in MRE11 [166,169], MSI tumor status could serve as a proxy biomarker for the effectiveness of an irinotecan treatment.
In a similar fashion, PARP-1 inhibitors have been tested against MSI cell lines and the success of the treatment was negatively correlated to the expression levels of MRE11 or the activity of the interacting protein NBS1 [170–172], further demonstrating roles in resolving double-strand damage and suggesting MSI as a possible marker for PARPi usage. However, the PARPi olaparib as a single-agent treatment did not benefit colorectal cancer patients, neither to the MSI nor to MSS group [173]. This outcome seems consistent with an application of the PARPi niraparib to a panel of colorectal cancer cell lines, where only a modest, non-significant MSI-associated trend was observed [174].
In addition to MRE11 and RAD50 genes, also the MBD4 gene of the base excision repair pathway has been proposed to be affected by coding region indels in some MSI cancers, resulting in a truncated form of MBD4 protein that may sensitize to 5-fluorouracil (5-FU) and to trifluridine [175,176].
Failed MMR was additionally proposed to result in (potentially exploitable) vulnerabilities to cancer cells not only by LOF mutations, as described above, but also by gain-of-function mutations. In particular, indel mutations can generate frameshifted and/or truncated, potentially toxic polypeptides. One known example of this is the HSP110 chaperone protein. Mutations in an intronic microsatellite repeat of the HSP110 gene, commonly occurring in MSI cells, cause skipping of exon 9 and generate a dominant-negative acting truncated protein [177]. Colorectal cancers bearing the mutations may have better survival and better response to chemotherapy [178]. Pharmacological inhibition of the NMD pathway, which can silence nonsense-mutation bearing mRNAs, further promoted expression of these HSP110 mutants and possibly other similarly affected proteins, selectively poisoning MSI cells and tumors [179].
Reduced MMR function confers resistance to some chemotherapies
In contrast with the above mechanism, where mutations occurring after MMR failure inactivate various genes and thereby sensitize to interventions, MMR activity itself can result in sensitivity or resistance to therapeutics (Figure 2). The TMZ example was already covered above; other drugs can also have reduced benefits when given to patients suffering from an MSI cancer. One example is 5-FU, an antimetabolite commonly administered to breast and colon cancer patients, among others [180]. The introduction of this drug into regimens to treat MSS colorectal cancers drastically improves the outcome, but not so for MMR-deficient tumors, where 5-FU does not provide a benefit [181–184]. 5-FU depletes thymidine triphosphate by inhibiting the thymidylate synthase enzyme; due to the nucleotide pool imbalance, 5-FU (in the form of FdUTP) and uracil may be incorporated during DNA synthesis instead of thymine [180,185]. This nucleotide misincorporation, upon repair by MMR, can lead to DNA breaks accumulated during S phase that can activate the apoptotic program [185]. Additionally, the MMR machinery can recognize and bind to the incorporated 5-FU and then trigger cell death [186,187]. Due to this evidence, administration of 5-FU is currently not advised in MSI colorectal cancers [188].
A similar mechanism as for 5-FU also underlies treatments using 6-thioguanine (6-TG), a therapeutic for various types of leukemia, which is also used as an immunosuppressant. MMR recognizes the mismatch when the DNA antimetabolite is incorporated into the DNA strands [189,190] and fails to repair the damage, potentially leading to apoptosis [189]. Thus, cells that are MMR-deficient are more resistant to the cytotoxic effects of 6-TG [191,192]. Additionally, the tumor likely obtains an evolutionary benefit from the increased mutation rates caused by the unrepaired 6-TG mismatches, which can produce further cancer driver mutations [192]. One example is a TP53 gene disruption that makes the cancer resistant to many treatments; other mutagenic hotspots related with 6-TG treatment were reported in drug resistance genes [192]. In addition to 6-TG, cytotoxic activity of other thiopurine drugs (6-mercaptopurine, azathioprine) is likely at least in part mediated via MMR acting on mismatched bases resulting from the treatment [193].
Implications of MMR failures in tumors for immunotherapy
Finally, occurrence of a high burden of somatic mutations (that can stem from MMR failures) predicts response to immunotherapy. Thus, ascertaining the extent and characteristics of MMR failures in tumors has important implications for stratifying patients for immunotherapy.
The introduction of immune checkpoint inhibitors (also called checkpoint blockade or simply referred to by the more general term imomunotherapy) has had a dramatic impact on overall patient survival in many types of cancers such as melanoma or lung adenocarcinoma. However, the response to these therapies tends to be quite variable across patients, which prompted a search for predictive biomarkers of efficacy. Currently, the tumor mutational burden (TMB) is considered to be a useful predictor of efficacy for these immune checkpoint inhibitors [194–197]. This is consistent with known mechanisms by which the immune system can recognize mutated proteins and, in response, activate the aberrant cell’s programmed death [198–201]; thus, cancerous cells holding a greater mutational load should be more readily recognized and cleared. Consistently, cancer types with commonly higher TMBs—such as melanomas [202,203] and smokers’ lung cancers [204], where most of the mutations result from the exposure to external mutagens—respond better to immune checkpoint inhibitors. Another important cause of the cancer’s somatic mutational burden can be MMR failures, which can be estimated by exome or genome sequencing, or indirectly via an MSI assay. An MSI cancer will typically acquire many times more mutations than their MSS counterparts [205,206], and will thus likely express more neoantigens [206] recognizable by the immune cells. The predictive power of MSI-status for immune checkpoint inhibitor patient-response has made MSI the first FDA-approved genetic biomarker valid across different cancer types [207].
It is likely that MMR failures sensitize to immunotherapy mainly through generating a much increased mutation burden. Consistenly, a statistical analysis of multiple determinants of immunotherapy response did not find an independent association with MSI status, after controlling for mutation burden [208]. Similarly, other causes of high TMB unrelated with MMR similarly predict a favourable immunotherapy response, including UV-exposed skin cancer, and tobacco-smoke exposed lung cancer cells. Additionally, kidney cancers often respond even though they do not commonly exhibit MSI (at least not in the classical sense, see above), which was ascribed to their relatively high burden of indel mutations [209]. Recent studies by us and others [133,210] have highlighted specifically the frameshifting indel mutations—which are abundantly occurring upon MMR dysfunction—rather than single-nucleotide substitutions, as more robust predictors of immunotherapy response (Figure 2). These genomic analyses further highlighted the role of the NMD pathway in silencing mRNAs containing frameshifting indel mutations in human cancers [133,210], in line with prior reports showing that inhibition of NMD boosts antitumor immunity in mouse models [211].
MMR failures result in synthetic lethalities with other DNA repair mechanisms
As noted above, MMR-deficient tumours can respond to drugs differently than MMR-proficient ones. They resist specific chemotherapeutics (such as TMZ, 5-FU and 6-TG), but can also be sensitized to other therapeutics via mutating genes participating in various DNA repair pathways, or via giving rise to neoantigens targetable by immunotherapy. Additionally, MMR genes may, similarly as many other tumor suppressor genes, participate in synthetic lethal interactions allowing another window of opportunity to target MMR-deficient cells. A recently discovered synthetic lethal interaction with the Werner syndrome helicase (WRN) [212–215] has revealed that a mutational process caused by the MMR deficiency, over time, can also lead to vulnerabilities that can be exploited [216]. In this case, MSI cells develop a dependency on WRN due to large expansions at A:T repeats and thus require the activity of WRN for normal DNA replication; without it the cells suffer large-scale chromosome breakage [216]. Currently, there is no approved WRN inhibitor drug, but the proposed mechanism suggests that ATR inhibitors might show some selectivity towards MSI cells, presumably by preventing to recruit WRN by ATR [216].
Concluding remarks and outlook
DNA MMR is a key pathway for ensuring genome stability in all eukaryotic cells. Loss of MMR activity causes vastly increased mutation rates and, consequently, an increase in cancer risk. MMR-deficient tumors are commonly detected by immunohistochemistry to measure MMR protein levels, or by PCR and electrophoresis-based assays that detect instability at microsatellite repeats (MSI), which become hypermutable when MMR function is compromised. Importantly, loss of MMR activity is clinically relevant as it contributes to resistance or sensitivity to various cancer therapies such as certain nucleobase analogs, DNA alkylating agents, topoisomerase inhibitors and immune checkpoint inhibitors. Other experimental therapeutic strategies such as NMD inhibition to express toxic or immunogenic mutated proteins might be used to target MMR-deficient cancer cells. Recent developments in genome sequencing and bioinformatics tools facilitate routine detection of MSI status, however genomes can also provide information in addition to the binary MSI-H versus MSS classification. Interestingly, bioinformatics analyses of mutational signatures suggest that MMR deficiencies may be multifaceted, consistent with LOF of individual MMR genes resulting in partial failures of the pathway. Genomics also provides evidence that a certain level of MMR malfunction may be more widespread among cancer types than anticipated previously. We envision that studies of genomes of MMR-deficient tumors, as well as experimental models (e.g. engineered cell lines) will further elucidate subtypes of MMR failures and their prevalence in tumors by analysis of various mutational patterns. We anticipate that such mutational patterns will provide refined predictive markers for applying chemo-, radio- and immunotherapies to MMR-deficient tumors.
Data Availability
The statement is not applicable: there are no primary data in this review article.
Competing Interests
The authors declare that there are no competing interests associated with the manuscript.
Funding
This work in the F.S.’s lab was supported by the ERC Starting Grant ‘HYPER-INSIGHT’ [grant number 757700]; the Horizon2020 RIA project ‘DECIDER’ [grant number 965193]; the Spanish Ministry of Science, Education and Universities project ‘REPAIRSCAPE’ [grant number PID2020-118795GB-I00]; the State Agency for Research of the Ministry of Science and Innovation - Severo Ochoa Centre of Excellence Award [grant number CEX2019-000913-S]; and the SO-FPI Fellowship [grant number BES-2017-079820 (to D.M.-P.)].
Acknowledgements
We apologize to colleagues whose work was not referenced due to article length constraints.
Abbreviations
- CMMRD
constitutional MMR deficiency
- CNA
copy number alteration
- EMAST
elevated microsatellite alterations at selected tetranucleotide repeats
- HRR
homologous recombination repair
- LOF
loss-of-function
- MeA
N3-methyladenine
- MeG
O6- and N7-methylguanine
- MMR
mismatch repair
- MS
microsatellite
- MSI
microsatellite instability
- NMD
nonsense-mediated mRNA decay
- SHM
somatic hypermutation
- SNV
single-nucleotide variant
- TMB
tumor mutational burden
- UTR
untranslated region
- WES
whole-exome sequence
- WGS
whole-genome sequence
- 5-FU
5-fluorouracil
- 6-TG
6-thioguanine