Abstract
DNA mismatch repair (MMR) is a highly conserved genome stabilizing pathway that corrects DNA replication errors, limits chromosomal rearrangements, and mediates the cellular response to many types of DNA damage. Counterintuitively, MMR is also involved in the generation of mutations, as evidenced by its role in causing somatic triplet repeat expansion in Huntington’s disease (HD) and other neurodegenerative disorders. In this review, we discuss the current state of mechanistic knowledge of MMR and review the roles of key enzymes in this pathway. We also present the evidence for mutagenic function of MMR in CAG repeat expansion and consider mechanistic hypotheses that have been proposed. Understanding the role of MMR in CAG expansion may shed light on potential avenues for therapeutic intervention in HD.
Keywords
INTRODUCTION
DNA repair processes are characterized by their ubiquity across all domains of life [1]. These mechanisms have evolved to maintain genomic stability, especially in light of the high levels of DNA damage generated by exogenous and endogenous agents such as ultraviolet light, ionizing radiation, and reactive chemical species. Errors made by DNA synthetic enzymes (either during the normal course of cell division or perhaps while carrying out DNA repair itself) further imperil the integrity of the genetic information [1, 2]. Replicative DNA polymerases make base insertion errors infrequently (at a rate of ∼10–4–10–5), and their fidelity is further enhanced by ∼100 fold by the exonucleolytic proofreading machinery associated with many such enzymes [2]. Errors that escape these mechanisms are rectified by the highly conserved DNA mismatch repair (MMR) pathway, which confers an additional ∼100–1000 fold enhancement in fidelity [3–7]. In addition to its role in error correction and replication fidelity, MMR also plays important roles in preventing chromosomal rearrangements and mediating the cellular response to several types of DNA damage [8–10].
The importance of MMR is exemplified by the profound consequences of its loss. MMR defects result in a 100–1000 fold elevation in mutation rate in most organisms [3–7], and modulation of MMR has been suggested to be a powerful evolutionary survival mechanism [11, 12]. Loss of MMR function in humans is the cause of Lynch syndrome, a hereditary cancer predisposition condition that is characterized by an increased risk of gastrointestinal, uterine, and ovarian tumors [13, 14]. More recently, biallelic inactivation of MMR has been linked to constitutive mismatch repair deficiency syndrome (CMMR-D), a rare childhood/young adult condition associated with a higher propensity for developing colorectal, brain, and blood cancers [15–17]. In addition, MMR defects underlie a significant subset of sporadic tumors across various tissues [18]. Current evidence suggests that the high mutation load of MMR-deficient tumors results in neoantigen production, rendering such cancers highly sensitive to immune checkpoint inhibitors such as pembrolizumab and nivolumab, resulting in dramatic improvements in patient survival rates [19].
Counterintuitively, this mutation prevention mechanism can in some instances be subverted to produce mutations. This phenomenon is exemplified by the requirement of MMR proteins for somatic hypermutation, a mutagenic process in B lymphocytes that generates immunoglobulin diversity [20–22]. This type of MMR-dependent mutagenesis has been attributed to a non-canonical MMR (ncMMR) mechanism which is activated in a variety of cell types, and is independent of DNA replication [23–25]. A number of MMR proteins have also been implicated in triplet repeat expansions that underlie several neurodegenerative diseases, including Huntington’s disease (HD) [26–28]. Herein, we review what is known about the molecular mechanisms of MMR, and consider its role in somatic CAG triplet repeat expansion in HD.
MECHANISM OF MISMATCH REPAIR
Mismatches, strand slippage, and extrahelical extrusions
The best understood function of MMR is its ability to efficiently correct mistakes made by DNA polymerases during DNA synthesis [3, 29]. A number of factors involved in the error-correction reaction have been identified and are listed in Table 1. Replication errors may be base substitution errors that result in a base-base mismatch, or strand slippage errors that generate an extrahelical extrusion or loop. Inability to rectify these errors result in transitions, transversions, and small insertion/deletion mutations. Although the bulk of base insertion errors occur during DNA replication, they may also occur during repair DNA synthesis in non-replicating cells, and such mistakes if left unrectified could be a significant source of mispairs.
MMR proteins and their functions
Strand-slippage errors occur with high frequency within microsatellite sequences (e.g., mono-, di-, and trinucleotide repeats), the repetitive nature of which renders them particularly prone to the formation of extrahelical extrusions by strand misalignment [30–33]. In principle, two DNA strands can misalign not only during DNA synthesis but also whenever the duplex strands separate and reanneal during DNA metabolic processes such as transcription. Helix opening can also be driven by the energy of negative supercoiling. While superhelical tension in eukaryotic cells is normally kept restrained by nucleosomes, release of negative superhelicity upon nucleosome disassembly can cause transient helix destabilization [34]. Likewise, helix opening due to accumulation of negative superhelical tension within the underwound DNA in the wake of a translocating RNA polymerase has been documented [35, 36]. Thus, duplex melting within repetitive DNA tracts can occur under a variety of circumstances, resulting in the formation of extrahelically extruded slipped-strand structures. It is noteworthy in this regard that long CTG repeat tracts are among the strongest nucleosome binding sequences known due to their high intrinsic negative superhelical writhe [37, 38]. These observations may explain why CTG repeats may be particularly prone to supercoil-induced spontaneous helix opening and formation of slipped-strand structures. In fact, slipped-strand structures readily form both in vitro and in vivo, are structurally heterogeneous, and thermodynamically stable [33, 40]. Thus, extrahelically extruded slipped strand structures could form within resting DNA repeat tracts in non-dividing cells such as neurons during DNA repair, transcription, or chromatin remodeling.
Mismatch recognition
To deal with the diversity of base-base mismat-ches and extrahelical loops that are generated during DNA metabolic processes [2], eukaryotic MMR has evolved a modular approach to mispair recognition, employing one of two heterodimeric MutS homologs, MutSα (MSH2-MSH6) and MutSβ (MSH2-MSH3) (Fig. 1). MutSα recognizes base-base mismatches and small DNA extrusions (1–4 extrahelical residues), and MutSβ exclusively recognizes DNA extrusions of 2 to about 10 extrahelical residues [3]. The structural basis for the mismatch recognition has been clarified by crystallographic studies of MutSα, MutSβ, and bacterial MutS homologs in complex with mispair- or lesion-containing DNA substrates [41–44]. These studies have shown that mismatch recognition is asymmetric: whereas MSH2 makes limited contact with the DNA in both MutSα and MutSβ, extensive interactions occur between residues in MSH3 and MSH6 with the heteroduplex [41, 42].
Because of their overlapping substrate recognition specificities, MutSα can largely compensate for the absence of MutSβ, and the overwhelming majority of mispairs are recognized and rectified by a MutSα-initiated pathway. Hence, MutSα is critical for mutation avoidance, and its inactivation (by mutations in MSH2 or MSH6) is a risk factor for carcinogenesis [13, 14]. By contrast, loss of MutSβ function (by inactivation of MSH3) has not been conclusively linked to an elevated cancer risk in humans [45]. Consistent with these observations, mice lacking MSH2 or MSH6 display increased susceptibility to spontaneous tumorigenesis and an attenuated lifespan, whereas MSH3 knockout mice are indistinguishable from wild-type mice both in terms of tumorigenicity and life expectancy [46–49]. These striking differences are particularly relevant in light of the observation (considered below) that CAG somatic expansion is attenuated in HD mice lacking Msh2 or Msh3 (but not Msh6), thus suggesting that pharmacological modulation of MSH3 may pose a lower risk than targeting MSH2 or MSH6.
In addition to their DNA binding functions, MutSα and MutSβ possess ATP hydrolytic activities and belong to the ABC (ATP Binding Cassette) ATPase superfamily [41, 42]. The MutSα and MutSβ ATPases are stimulated by heteroduplex DNA, an effect that is indicative of long-range conformational changes elicited by DNA cofactor binding at a site distal to the nucleotide pocket [50–55]. Although the functional significance of ATP binding/hydrolysis of MutS homologs has been the subject of intense debate for over three decades, there is general agreement that inactivation of the ATPase function is deleterious for overall mismatch repair [3, 4]. Consequently, such mutations result in elevation in mutation rate in bacteria [56] and a higher cancer predisposition risk in humans [42].
Formation of mismatch repair protein assemblies and mismatch removal
Mismatch recognition by MutSα/β is followed by the recruitment of the heterodimeric MutLα (MLH1-PMS2). MutL homologs belong to the GHKL ATPase superfamily that is characterized by the unique Bergerat ATP-binding fold [57]. Although detailed structure-function studies of the MutL ATPases are lacking, structures of both MLH1 and PMS2 ATPase domains are available [58–60], and biochemical and genetic studies have shown that the MutL ATPase function is required for MMR [61–63]. Interaction between MutSα/β and MutLα results in the formation of a dynamic ATP-dependent DNA-MutSα/β-MutLα ternary complex that is required for MMR [64–72]. These observations have led to the notion that ATP binding/hydrolysis drives conformational changes in MutS homologs (and presumably in MutL homologs as well), thereby facilitating ternary complex assembly.
The PMS2 subunit of MutLα harbors a latent zinc-dependent endonuclease that is activated by DNA-loaded replication sliding clamp PCNA (proliferating cell nuclear antigen) in the presence of MutSα/β, a mispair, and ATP-Mg+2 [73–75]; this activity depends on the integrity of a conserved metal-binding DQHA(X)2E(X)4E motif and requires a physical interaction between PMS2 and PCNA [74–76]. The MutLα-catalyzed strand breaks are restricted to the strand harboring a pre-existing break (and hence assumed to harbor the incorrect base) and bracket the mismatch on both 3′ and 5′ sides (Fig. 1); these breaks provide initiation sites for excision of the strand harboring the incorrect base as discussed below. The importance of MutLα in the MMR reaction is highlighted by the observation that loss of MLH1 or PMS2 not only results in elevation of mutation rate, but also increases the lifetime risk of tumorigenesis in both mice and humans [77]. Interestingly, both MLH1 and PMS2 have been identified as onset modifiers of HD, and loss of Mlh1 or Pms2 substantially attenuates somatic triplet repeat expansion in animal and cellular models of HD, myotonic dystrophy type 1 (DM1), and Fragile X-related disorders (FXDs) [78–80].

Mechanisms of 5′ and 3′ human mismatch repair. Distinct molecular mechanisms mediate mismatch repair, depending on strand-break polarity. Left, DNA mismatches or extrahelical extrusions are recognized by MutSα or MutSβ. When the strand-break is located 5′ to the mismatch, MutSα/β activates the processive 5′–3′ exonuclease activity of ExoI in an ATP-dependent manner. The ensuing gap is protected by the single-stranded DNA binding protein complex RPA, followed by DNA resynthesis across the gap by DNA polymerase δ, aided by the replication sliding clamp PCNA and the clamp loader RFC. Right, if the strand-break is located 3′ to the mispair, error correction relies on oriented loading of PCNA by RFC at the strand break. Thus, MutSα/β recruits MutLα in an ATP-dependent manner, resulting in the activation of a latent endonuclease function in MutLα in the presence of DNA-loaded PCNA. The additional strand-breaks catalyzed by MutLα bracket the mismatch, and facilitate processive 5′–3′ hydrolysis of the nicked strand by MutSα-activated ExoI. Gap protection and filling occur as in the 5′ nick-directed reaction.
Mismatch excision is carried out by the 5′ to 3′ hydrolytic activity of EXO1, the only exonuclease that has been identified in eukaryotic MMR. EXO1, an otherwise distributive enzyme (i.e., it excises only a few nucleotides at a time from the DNA before it dissociates), is rendered highly processive (i.e., capable of removing hundreds of nucleotides without having to dissociate) when it associates with MutSα in the presence of a mispair and ATP. Loading of MutSα-activated EXO1 at 5′ DNA termini results in processive strand excision that proceeds until the mismatch is removed [81–83]. This MutSα-stimulated EXO1 reaction relies on pre-existing 5′ ends (but not 3′ ends) such as those that exist in Okazaki fragments on the lagging strand of DNA replication [3–7], and is notable in its lack of an absolute requirement for MutLα [84] (Fig. 1).
In the presence of MutLα however, the MMR system can utilize either pre-existing 5′ or 3′ DNA termini. In fact, the 3′ nick-directed reaction displays an absolute requirement for MutLα [74, 86] (Fig. 1). The additional strand-breaks introduced by the activated MutLα endonuclease serve as initiation sites for processive 5' to 3' excision by MutSα/β-activated Exo I [74, 87]. The ensuing EXO1-catalyzed gap is protected by the single-stranded DNA binding protein RPA, followed by resynthesis with high fidelity by DNA polymerase δ (Polδ), and ligation by DNA ligase I, thereby maintaining the integrity of the genetic information [83, 88]. Interestingly, EXO1-independent mismatch repair has also been documented wherein Pol δ conducts strand displacement synthesis initiated from 3′-OH DNA termini, thereby replacing the strand that contains the incorrect information [87].
Mammalian cells also express two other MutL complexes—MutLγ (MLH1-MLH3) and MutLβ (MLH1-PMS1). Both MLH3 and PMS1 are GHKL ATPases, but only MLH3 possesses the metal-binding endonuclease motif found in PMS2 [74]. However, although MutLγ endonuclease is activated by DNA-bound MutSβ, unlike MutLα this effect does not require DNA-loaded PCNA [89–91]. The primary function of MutLγ is in meiotic recombination wherein it plays an essential role wherein the MutLγ endonuclease is activated by MutSγ (MSH4-MSH5), and to a lesser extent by RFC, PCNA, and EXO1 [92–95]. MutLγ also plays a modest role in MMR since loss of MLH3 function results in a moderate increase in mutation rates, and in vitro complementation of MLH1-deficient cells with MutLγ-enriched fractions results in partial restoration of MMR function [96–100]. Consistent with a possible role for MLH3 in cancer avoidance [101], Mlh3-/- mice display elevated mutation rates, increased tumor susceptibility, and reduced lifespans relative to wild type mice [99, 102]. Interestingly, whereas human GWA studies have not identified MLH3 as an HD onset modifier, Mlh3 is required for CAG and CGG somatic expansions in HD and FXD mouse models, respectively [80, 104].
As noted above, PMS1 is the only MutL ortholog that does not contain the DHQA endonuclease motif. PMS1 interacts with MLH1 to form MutLβ, a heterodimer that associates with MutSα or MutSβ to form an ATP-dependent DNA-MutSα/β-MutLβ ternary complex [105–109]. However, MutLβ appears to have no role in canonical human MMR as judged by the inability of recombinant MutLβ to restore MMR activity to MLH1-deficient cell extracts [105]. Consistent with these findings, Pms1 knockout mice show no observable defects in either MMR or tumor susceptibility [77]. Nevertheless, studies with the yeast homolog have indicated a possible accessory role for MutLβ in the repair of a subset of heteroduplexes [92, 109]. Thus, although PMS1 modifies HD onset age [103], given the limited understanding of the molecular function of PMS1 or MutLβ, the possible mechanisms underlying its role in HD remain to be investigated.
Strand directionality of mismatch repair
Upon encountering a mismatch, the MMR system must act with exquisite strand bias in order to specifically rectify the strand containing the incorrect information. This is achieved by using “strand signals” that mark one strand relative to the other, and thus enable the MMR system to distinguish between the template strand and the newly synthesized strand (Fig. 1). Whereas in E. coli, DNA hemi-methylation has been established as the mechanism that directs the system to the appropriate strand, the identity of eukaryotic strand signals has been intensely debated for over three decades. In in vitro biochemical experiments, strand-breaks serve effectively as strand signals, and mismatch repair of circular DNA substrates is restricted to the strand that contains a pre-existing break. When such strand-breaks are located 5′ to the mismatch, MutSα/β-activated EXO1 processively and exclusively excises the nicked strand with 5′-3′ polarity thereby conferring strand-directionality to the mismatch repair reaction. However, when the strand-break is located 3′ to the mismatch, strand directionality is conferred by the introduction of additional strand-breaks by MutLα in a reaction that also requires MutSα/β, a mispair, DNA-loaded PCNA, and ATP. These new strand-breaks are restricted to the nick-containing strand. The strand-directional activation of MutLα can also occur when the original strand-break is located 5′ to the mismatch. The strand bias of the MutLα endonuclease is determined by the unique spatial orientation (relative to 3′-OH termini) with which the donut-shaped PCNA (with its two non-equivalent faces; Fig. 2) is loaded at DNA strand discontinuities by the clamp loader replication factor C (RFC). This effect requires a physical association of oriented PCNA with the PMS2 subunit of MutLα [76]. Thus, it is relatively straightforward to envision PCNA loaded at DNA termini as strand signals for MMR in the context of active DNA replication in dividing cells (e.g., DNA ends of Okazaki fragments). In fact, recent studies have suggested that loaded PCNA continues to direct strand-specific MMR even after the removal of strand-discontinuities, and that a physical interaction between MutSα and PCNA enhances the temporal window for effective strand-directed MMR [110, 111]. Thus, persistence of DNA-loaded PCNA after completion of DNA synthesis can continue to provide strand-directionality “memory” to the MMR system [110].

Models for involvement of mismatch repair in CAG/CTG repeat expansion. Strand slippage within long repetitive CAG/CTG tracts results in the formation of extrahelical extrusions that are not only recognized by MutSβ, but also serve as loading sites for PCNA even in the absence of strand-breaks. PCNA, a ring shaped homotrimeric protein with two distinct faces (inset, indicated in green and brown) preferentially associates with its partner proteins via residues on one face. Although the two faces of PCNA are functionally non-equivalent, due to the symmetry of the extrahelical extrusions, PCNA loading at such structures occurs in both possible spatial orientations. Since the strand directionality of the MutSβ-dependent activity of the MutLα endonuclease is determined by the orientation of DNA-loaded PCNA, the disoriented loading of PCNA misdirects MutLα catalyzed incisions to either DNA strand. Left, when incision occurs on the extrusion-containing strand (shown in blue), strand excision results in removal of the extrusion, and faithful repair by Polδ results in a contraction (not shown). However, error-prone gap resynthesis by Polη or Polβ as illustrated in the diagram may provide additional opportunities for strand slippage and formation of new extrahelical extrusions, which either result in a net increase in CAG repeat length (i.e., expansion) or trigger additional rounds of MutSβ-initiated incision/excision. Middle, when MutLα-mediated strand-breaks are formed on the complementary (red) strand, error-free resynthesis by Polδ results in inclusion of the extrusion, leading to a net increase in CAG repeat length (expansion). Gap resynthesis may also be driven by Polβ as on left, resulting in additional strand slippage (not shown). Right, extrusion-bound MutSβ can also activate MutLγ in a PCNA-independent manner. The incisions catalyzed by MutLγ are restricted to the complementary (red) strand opposite to the extrusion. DNA resynthesis results in inclusion of the extrusion, leading to a net increase in CAG repeat length (expansion).
Sequences that adopt unusual DNA secondary structures are found throughout the human genome [112–115]. A substantial body of literature supports the idea that non B-DNA conformations have profound effects on DNA metabolic processes (reviewed in [26]). The DNA conformational transitions that govern the formation of these non B-DNA structures are energetically driven by negative superhelical tension, which in normal eukaryotic cells remains constrained within chromatin but can be unleashed during nucleosome disassembly. Interestingly, such structures may also affect the strand-directionality of MMR. DNA molecules that contain “open bubbles” (due to unpairing of a segment of the duplex) could provide sites for loading of PCNA even in the absence of strand-breaks. Such bubble structures contain single-strand/double-strand junctions with mirror symmetry and conformationally resemble extrahelical extrusions that form by strand slippage within repetitive DNA tracts (e.g., long CAG/CTG repeats). PCNA loading at such structures occurs in either spatial orientation (disoriented loading) and, consequently, MutSα/β-dependent activation of MutLα endonuclease on these DNAs also occurs without substantial strand bias [75, 117]. The implications of these findings are three-fold. First, DNA-loaded, spatially oriented PCNA may direct the strand-specificity of MMR in post-replicative non-dividing cells. Second, PCNA loading at extrahelical extrusions such as those formed within long repetitive DNA tracts may misdirect the MMR system and cause it to act aberrantly on both DNA strands (see below). Third, strand-independent activation of ncMMR could involve PCNA ubiquitination and recruitment of error-prone DNA polymerases, resulting in repeat instability [23, 118].
Mismatch repair in the context of chromatin
The mechanistic dissection of the MMR reaction has relied largely on in vitro biochemical studies using a combination of purified proteins, cell extracts, and naked heteroduplex DNA substrates [6]. However, there has been growing interest in understanding how mismatch repair functions in the cellular context, especially within chromatin (for comprehensive review, see [119]. Studies have established that trimethylation of histone H3 at lysine 36 (H3K36me3) is required for recruitment of MutSα to replicating chromatin [120]. This effect is mediated by a physical interaction between the modified histone and the PWWP domain of the MSH6 subunit of MutSα. The importance of this histone modification for MMR is established by the observation that knockdown of SETD2 (a histone methyltransferase that trimethylates H3K36) causes a mutator phenotype characterized by high microsatellite instability and an elevated mutation rate [120]. Mutations in histone H3 (G34V/R/D) that block H3K36 trimethylation by SETD2 also inhibit the interaction between H3K36 and MutSα; this mutation not only causes a mutator phenotype in cells, but also has been identified as a driver of pediatric glioblastoma [121]. MMR modulation by histone modification helps maintain genomic stability of actively transcribed genes [122]. Not only are H3K36me3 and MutSα co-enriched in exons (relative to introns and non-transcribed regions), but also disruption of the H3K36me3-MutSα interaction elevates the spontaneous mutation rate in actively transcribed genes, with little influence on non-transcribed regions. These findings have implications for the role of MMR in maintaining genomic stability in non-dividing, but transcriptionally active, cells.
In addition to histone modification, interplay of the MMR proteins with nucleosome assembly and remodeling factors has also been documented [123–125]. MutSα and PCNA interact with the nucleosome assembly factor CAF-1, and these interactions inhibit the CAF-1 mediated nucleosome assembly on mispair-containing DNAs [123–125]. The formation of the MutSα-CAF-1 complex is mediated by residues within the MSH6 subunit of MutSα [123]. Investigations to date have focused on MutSα function in the context of chromatin dynamics. However, because MSH3 does not possess a PWWP domain [126], and since no studies have been carried out on MutSβ interactions with nucleosome assembly factors, the mechanisms that recruit MutSβ to chromatin are not known.
In the cellular environment, the MMR machinery is also regulated by post-translational modifications. Phosphorylation of several MMR proteins—including MutSα, MutLα, PCNA, RFC, ExoI, and Polδ—has been documented, and these modifications modulate protein-protein interactions, protein stability, as well as nuclear-cytoplasmic distribution [127]. A full mechanistic understanding of these diverse effects is still underway; nevertheless, based on available data it is expected that future investigations will unravel novel aspects of MMR regulation by post-translational modification.
Other functions of mismatch repair
In addition to its role in mismatch correction, the MMR machinery also mediates the cellular response to several types of DNA damage [8, 128] and prevents illegitimate recombination [10]. However, these functions of MMR are not currently thought to play a role in triplet repeat instability [3, 129], and will therefore not be considered further.
Mismatch repair in the central nervous system (CNS)
Our current understanding of canonical post-replicative DNA mismatch repair is based on fun-ctional studies in dividing cells complemented by mechanistic biochemical investigations. Because the primary phenotype of loss of mismatch repair activity is elevation in mutation rate, and since mutation rate measurements are (by definition) not feasible in non-replicating cells, the role of MMR in terminally differentiated cells such as neurons has not been studied extensively. Yet, the functional importance of MMR in the brain is underscored by the observation that ∼50% of CMMR-D patients who harbor biallelic germline mutations in one of four MMR genes (MSH2, MSH6, MLH1, and PMS2) develop brain tumors of various types including glioblastomas, astrocytomas, and oligodendrogliomas [16]. Cells derived from such tumors display low levels of MMR activity [130]. Immunohistochemistry and immunoblot analyses have established that MSH2, MSH6, MSH3, MLH1, and PMS2 are robustly expressed in normal human and rodent brains, suggesting that these proteins have functional significance [80, 131–136]. Insofar as MMR protein levels are reflective of the capacity of the brain to rectify mismatches, the lack of replicative errors in terminally differentiated brain cells suggest that mismatches must occur post-mitotically.
Because the brain consumes ∼20% of the total oxygen budget of the human body to support neuronal activities, it is also subjected to high levels of oxidative stress, a condition that has been correlated with oxidative DNA damage and accumulation of DNA strand breaks [137–140]. Therefore, base excision repair (BER), single-strand break repair (SSBR), and double-strand break repair (DSBR) mechanisms are highly active in the CNS, and in fact, defects in these repair pathways are the cause of several neurodegenerative diseases [139, 142]. Physical and functional cross-talk between MMR and BER proteins has been documented [143], and therefore it is plausible that MMR factors may be recruited to sites of BER activity in neurons. It is also possible that even limited gap filling DNA synthesis during BER, DSBR, or SSBR may provide ample opportunities over several decades of human life for DNA polymerases to make errors that may then require the attention of the MMR system. Nevertheless, direct evidence for MMR activity in neurons is lacking, and although nuclear extracts prepared from rat brain retain mismatch binding activity [132], the capacity of whole brain or neuronal extracts to repair mismatches or extrahelical extrusions has not been studied, and brain- or neuron-specific MMR factors have not been identified. Therefore, biochemical and cellular studies of MMR function in brain-derived cells and tissues is likely to be a fruitful and instructive avenue of investigation.
ROLE OF MISMATCH REPAIR IN HUNTINGTON’S DISEASE
The diverse genome-stabilizing activities of MMR notwithstanding, this pathway has also been implicated in mutation production in the context of neurodegenerative disease. The most compelling corpus of data in this regard pertains to HD and will be considered here, although recent findings have also suggested a role for MMR in the repeat expansions underlying FXDs and Friedreich’s ataxia [78, 144–147].
HD belongs to a family of slow-progressing neurodegenerative disorders caused by CAG triplet repeat expansions within the coding regions of distinct and unrelated genes [148]. A key feature of HD (and indeed other triplet repeat diseases as well) is the strong inverse correlation between the inherited length of the CAG repeat tract (located within exon1 of the Huntingtin HTT gene) and the age of disease onset [148, 149]. Moreover, HD gene expanded carriers (HDGEC) display high levels of CAG repeat expansion, not only in the germline, but also in somatic cells wherein such expansions occur in a tissue-specific manner: the greatest degree of expansion is observed in the terminally differentiated neurons of the striatum and cortex [131, 150–152] [153]. Recent studies have established that the inherited length of uninterrupted CAG repeats rather than the length of the encoded polyQ tract, (since both CAA and CAG can encode glutamine) modifies disease onset [103, 154–156]. Because uninterrupted repeats are thought to have a higher propensity to expand than repeat tracts harboring CAA interruptions, it has been suggested that somatic CAG instability is a key driver of CAG length dependent HD pathogenesis [103, 155]. One implication of these observations is that factors that hasten or attenuate CAG-repeat instability may also be expected to affect disease onset. It is noteworthy that a recent study has suggested a role for huntingtin protein levels in promoting CAG repeat expansion in knock-in mouse models of HD and spinocerebellar ataxia type 2, as well as in HD patient iPS-derived medium spiny neurons, although the pathways underlying these effects are yet to be delineated [157].
Genetic evidence for the role of MMR in HD
Recent human genome-wide association studies (GWAS) have identified genetic modifiers of age at disease onset that map to a constellation of DNA repair genes that include the MMR genes MSH3, MLH1, PMS1, PMS2, as well as LIG1, and FAN1 (Table 2). In the case of MSH3, a genetic variant that is associated with reduced gene expression and/or function appears to not only delay age at onset, but also reduce somatic instability and disease severity in HD and DM1 patients [161]. Curiously, this MSH3 variant itself arises from instability of a 9-bp tandem repeat that codes for an alanine repeat element near the N-terminus of the MSH3 polypeptide [161]. Repeat instability measurements in the blood have revealed effects of polymorphisms in MSH3, MLH1, MLH3, and FAN1 on somatic CAG-repeat variation in HDGEC [155], and of MSH3 genetic variants on CTG repeat instability in DM1 patients [164]. Effects of MSH3 and FAN1 on HD progression have also been documented [148, 163]. These observations made in human HD patients suggest that the MMR pathway influences both disease onset and progression and, taken together with data from HD mouse models, imply that disease manifestation is modulated by somatic CAG expansion.
Summary of genes implicated in HD. The ratios of observed versus expected genome variants for each of the genes was derived from the gnomAD database [211], and are reflective of how tolerant a gene is to genetic variation. A low o/e score is indicative of stronger selection for the gene and lower tolerance for loss of function (LoF). Phenotypic effects of variants, knockout, or knockdown of the listed genes in human GWA studies, HD mouse models, and cellular systems are listed, and pathological effects of LoF in other disease states are summarized. References are shown in parentheses
A pathogenic role for a genome stabilizing DNA repair system in human disease may seem counterintuitive; nevertheless, multiple lines of evidence have converged in recent years to point to MMR as a likely culprit in CAG-repeat expansion. The earliest evidence that MMR genes may promote triplet repeat instability emerged from studies in E. coli, wherein inactivation of the MMR genes mutS, mutL, or mutH resulted in a dramatic reduction of CTG/CAG repeat instability [165, 166]. The mutagenic role of MMR in mammals was conclusively established by work in rodent models of HD and DM1 wherein inactivation of MMR strongly attenuated both somatic expansion of the CAG/CTG repeats in multiple tissues as well as intergenerational repeat expansion (although it is unclear whether somatic and intergenerational expansions are mediated by the same molecular mechanism(s)) [79, 167–177]. Germline knockout of either of the MutS homologs Msh2 or Msh3 or the MutL homologs Mlh1 or Mlh3 in an HttQ111 knock-in mouse model of HD blocks somatic CAG expansions in the striatum. Loss of Msh2, Msh3, or Mlh1 also reduces nuclear accumulation of mHTT protein, suggesting not only that CAG somatic expansions are associated with some aspects of mHTT pathology, but also that inactivation of MMR can mitigate such disease-associated signatures [80, 177]. Similarly, Msh2, Msh3, and Pms2 have been shown to drive CTG-repeat expansion in mouse models of DM1, suggesting that MutSβ and MutLα play a role in this process [79, 176]. Interestingly, studies in a mouse model of FXD has identified Msh3 as a driver of somatic CGG repeat expansion [144].
Although the MMR system drives CAG repeat expansion, the overall increase in CAG repeat number may be a net consequence of individual expansion and contraction events. This type of expansion “biased” instability has been documented in the blood of DM1 patients [178, 179]. Bidirectional CAG somatic and intergenerational instability has also been observed in both DM1 and HD mouse models, with longer inherited CAG or CTG repeat tracts subject to higher rates of contraction versus expansion [170, 180]. Thus, it seems likely that factors that alter the balance between expansion and contraction may modulate disease onset. In fact, it has been suggested that genetic or pharmacological approaches to induce CAG repeat contractions might hold therapeutic promise [181, 182]. However, the mechanistic role of MMR proteins in controlling the balance between expansion and contraction remains poorly understood.
A striking feature of the tissue-specificity of somatic instability in HD is that the striatum and the cortex (the most severely affected parts of the CNS in HD patients) display high levels of CAG expansions; by contrast, CAG repeats are relatively stable in the cerebellum, which is pathologically unaffected [150, 183]. In the periphery, CAG repeat expansions are observed in the liver, although this tissue does not display disease pathology. The simplest interpretation of these findings is that (a) the cellular milieu of striatal medium spiny neurons is more permissive to MMR-dependent CAG repeat expansion than the environment within cerebellar Purkinje or granule cells, and (b) cell vulnerability likely involves tissue- or cell-type specific factors that may render the striatum more susceptible to the consequence of CAG expansion than the liver. Since medium spiny neurons and Purkinje cells are both post-mitotic, differences in CAG instability between these cell types must arise due to factors other than cell division. It has been suggested that transcriptionally active genes are subject to higher levels of MMR than silent genes or non-transcribed regions of the genome [122, 185]. In fact, transcription-mediated destabilization of CAG and CTG repeats has also been documented in in vitro systems [186–188], and a small-molecule that induces CAG repeat contractions in a transcription-dependent manner has been reported [182]. Thus, it is possible that locus-specific CAG repeat instability may be transcription-dependent. In summary, studies of MMR in different neuronal sub-types and its interface with transcription in these cells is likely to shed light on the possible mechanisms underlying CAG repeat instability in humans.
Mechanisms of MMR-mediated CAG repeat expansion
It has been known for over half a century that the two DNA strands within repetitive DNA elements have a high propensity to slip relative to one another, and that slippage-mediated mispairing of the DNA strands can give rise to changes in repeat tract length [30, 190]. The probability of strand slippage is governed by the sequence composition as well as the number of repeating units within the repetitive element, meaning that a vital determinant of mutagenic propensity is encoded within the DNA sequence itself. An example of this phenomenon is the tendency of the CAG/CTG repeat in exon1 of the HTT gene to undergo length changes (thereby driving HD) by mechanisms presumed to involve strand slippage within the repeat tract. This results in the formation of extrahelical extrusions composed of CAG or CTG repeats, structures that are natural substrates for MutSβ. Studies from several laboratories have established that (CAG)1,2,3,4,7,or13 or (CTG)1,2,3,4,or13 extrahelical extrusions are recognized by MutSβ with high affinity in vitro, with reported dissociation constants ranging from 4–35 nM [51, 191]. Because MutSβ recognizes non-triplet repeat extrahelical extrusions with similar affinities, it is generally believed that the structure of the extrusion (rather than its sequence composition) governs the protein-DNA interaction [50, 173].
The general consensus is that extrahelical extrusions composed of 5 or more CTG or CAG units are rectified efficiently by a MMR-independent mechanism (as judged by robust repair of such extrusions by extracts of cells deficient in MSH2 or MLH1) [192–194]. These findings are consistent with previous observations that rectification of large loops (>12 nucleotides) in human cells is independent of canonical mismatch repair, although the molecular mechanisms governing these events have not been fully clarified [195–198]. By contrast, small extrahelical loops (1–4 triplet repeats) are not only high affinity substrates for MutSβ binding but are also subject to robust rectification by the MMR system in cell extracts, and repair activity is dependent on MutSβ [116, 200]. Interestingly, repair of CAG and CTG extrusions also requires an MLH1-containing heterodimer since the reaction is not supported by extracts prepared from cells lacking MLH1 [116, 200]. Repair activity can be restored to MLH1-deficient extracts by supplementation with purified recombinant MutLα [116], indicating that this heterodimer suffices for this purpose. Furthermore, the inability of PMS2-deficient HEC-1A cell extracts to support repair of a (CTG)1 extrusion suggests that a PMS2-containing heterodimer is required for this process [200]. The inability of a PMS2 endonuclease-inactive MutLα derivative to restore (CAG)2 loop repair capacity to MLH1-deficient extract demonstrates that MutLα plays a catalytic role in extrusion repair, and that the strand incisions it catalyzes are critical intermediates in the MutSβ-dependent processing of small triplet repeat loops [116]. Thus, although CAG hairpins harboring 5 or more repeats are routinely invoked as hypothetical substrates that initiate repeat expansion, available evidence suggests that such hairpins are refractory to the MMR pathway. Because shorter CAG or CTG extrusions (1–4 repeats) not only trigger recognition by the MMR pathway, but also display preferential processing by a MutSβ- (rather than a MutSα-) initiated repair process, such extrahelical extrusions are the presumptive substrates that drive somatic CAG expansions in human cells.
As noted earlier, activation of the MutLα endonuclease on bubble structures occurs without strand bias, presumably due to the disoriented loading of PCNA onto the DNA. Interestingly, triplet repeat extrahelical extrusions behave in much the same way as bubble structures. PCNA is loaded by RFC efficiently on DNAs harboring a (CTG)1–3 or (CAG)1–3 extrahelical extrusion, and in the presence of MutSβ, MutLα, and ATP, strand incision activity is directed to both DNA strands [116]. Thus, these extrahelical extrusions (although restricted to one strand, and lacking a juxtaposed complementary strand extrusion) not only provoke their recognition by MutSβ, but also dysregulate the strand-directionality of the MutLα endonuclease. Such MutLα catalyzed non-strand-specific breaks stimulates DNA synthetic activity on both DNA strands in cell extracts [116], opening up the possibility for further strand-slippage and expansion. Because these events are not coupled to DNA replication, the “dysregulated strand directionality” model (Fig. 2) has been suggested to play a role in CAG-repeat expansion in post-mitotic neurons in the striatum [26, 116]. In this model, expansions/contractions occur either due to the repair of double-strand breaks caused by MutLα-catalyzed incisions in close proximity to one another on the two strands (not shown), or during DNA re-synthesis of the gaps generated upon excision of the nicked strand. Whereas faithful resynthesis by Polδ results in a contraction (not shown), strand-slippage during error-prone resynthesis by Polβ or Polη results in an expansion. Alternatively, expansion can occur by inclusion of an extrahelical extrusion into the primary structure of the DNA. Overall expansion may be a net result of individual expansion and contraction events, as has been suggested to occur in both HD and DM1 [178, 179]. In fact, there is evidence that recruitment of DNA polymerase β (Polβ) by MutSβ to CAG or CTG extrusions triggers error-prone DNA resynthesis in vitro, resulting in expansions [201]. It has also been suggested that the MutSβ-Polβ complex may promote CAG-repeat expansion during base excision repair (BER) [202, 203], although the interplay between the MMR and BER pathways in CAG-repeat expansion remains to be fully dissected [26]. Also, given the limited processivity of Polβ (1–6 nucleotides) [204], it is unclear as to how it might carry out error-prone synthesis of disease-relevant CAG repeat tracts. Error-prone DNA synthesis has also been invoked to explain non-canonical MMR-mediated mutagenesis [23]. This process has been suggested to involve processive reiterative DNA synthesis of long triplet repeat tracts by Polη. However, the involvement of Polη in CAG strand slippage is yet to be experimentally validated.
An alternate model (Fig. 2) for CAG expansion has invoked PCNA-independent activation of the MutLγ endonuclease by MutSβ on CAG extrahelical extrusions. Whereas PCNA-directed MutLα endonuclease activity on such molecules produces nicks on both DNA strands [116], the MutLγ-catalyzed incisions are restricted to the strand complementary to the extrusion. Based on these observations, it has been proposed that DNA re-synthesis induced by the strand-break result in retention of the extrusion, leading to a net expansion of 1 CAG unit [91].
These models provide a mechanistic framework to understand the role of MutLα and MutLγ in CAG-repeat expansion; nevertheless, the GWAS have also suggested a role for PMS1 and FAN1 in modulating the process. Repeat instability measurements in cell lines have established that overexpression of wild type (but not catalytically inactive) FAN1 attenuates CAG-repeat expansion [158] (considered in detail elsewhere in this issue). Recent studies have shown that CAG somatic expansions are enhanced in an HttQ111 knock-in mouse model of HD that also lacks Fan1 [205]. These instabilities are abolished by the additional knockout of Mlh1, suggesting that CAG expansions in the absence of Fan1 require functional Mlh1 [205]. Interestingly, knockout of Fan1 also exacerbates CGG repeat expansion in a mouse model of FXD [206]. These findings suggest a generalized protective role for FAN1 in repeat expansion, although the mechanistic bases remain unclear. Nevertheless, a physical interaction between FAN1 and MLH1 has been documented, and this interaction has been suggested to play a role in the DNA damage response [108, 207–209]. Also, it has been suggested that FAN1 binding to CAG extrusions may block access of MutSβ such structures, thus preventing CAG expansion [158, 210]. However, the functional relevance of these interactions either in the context of the error correction function of MMR or its mutagenic role in CAG expansion remains to be elucidated. Likewise, there is currently no data available on the role of PMS1 in modulating CAG repeat expansion. Future mechanistic studies evaluating the role of both FAN1 and PMS1 will be required before a mechanistic understanding accounting for all the implicated activities can be constructed.
CONCLUSION
Implications for therapeutic interventions in HD and/or other triplet repeat diseases
The convergence of evidence from human genetics, mouse models, and biochemical studies has highlighted the MMR pathway as a potential target for therapeutic intervention in HD. Because MutSβ-initiated processing of CAG and/or CTG extrusions by MutL homologs is the presumptive proximal step of the somatic expansion process, and since human genetics have provided evidence supporting the idea of somatic CAG expansion as a disease onset driver, inactivation of MutSβ and/or one or more of the MutL homologs is likely to impact disease. Given the wealth of the available structural and biochemical information on the MMR proteins, small molecule approaches for attenuation of somatic CAG expansion may be particularly tractable. Alternatively, recent advances in RNA-based gene inactivation approaches also provide hope that this avenue of therapeutically targeting MMR genes may meet with success. Insofar as diseases with such diverse pathophysiologies as HD, DM1, FXD, and Friedreich’s ataxia share related underlying mutational processes involving MMR, targeting this pathway may result in treatments that benefit a wide patient population.
CONFLICT OF INTEREST STATEMENT
The authors have no conflicts of interest.
Footnotes
ACKNOWLEDGMENTS
We thank Vickers Burdett, Seung Kwak, Paul Modrich, Ignacio Munoz-Sanjuan, Simon Noble, Thomas Vogt, and Hilary Wilkinson for comments. AP’s research is supported by grants from the NIH (R03 NS114976), Hereditary Disease Foundation, and the Gies Foundation.
