Neurodevelopmental disorders (NDDs) encompass a diverse group of disorders characterised by impaired cognitive abilities and developmental challenges. Short tandem repeats (STRs), repetitive DNA sequences found throughout the human genome, have emerged as potential contributors to NDDs. Specifically, the CGG trinucleotide repeat has been implicated in a wide range of NDDs, including Fragile X Syndrome (FXS), the most common inherited form of intellectual disability and autism. This review focuses on CGG STR expansions associated with NDDs and their impact on gene expression through repeat expansion-mediated epigenetic silencing. We explore the molecular mechanisms underlying CGG-repeat expansion and the resulting epigenetic modifications, such as DNA hypermethylation and gene silencing. Additionally, we discuss the involvement of other CGG STRs in neurodevelopmental diseases. Several examples, including FMR1, AFF2, AFF3, XYLT1, FRA10AC1, CBL, and DIP2B, highlight the complex relationship between CGG STR expansions and NDDs. Furthermore, recent advancements in this field are highlighted, shedding light on potential future research directions. Understanding the role of STRs, particularly CGG-repeats, in NDDs has the potential to uncover novel diagnostic and therapeutic strategies for these challenging disorders.
Neurodevelopmental disorders and short tandem repeats
In humans, neurodevelopmental disorders (NDDs) represent one of the most clinically and genetically diverse groups of disorders. These disorders arise from disruptions in early brain development and can encompass conditions such as autism spectrum disorder (ASD), intellectual disability (ID), motor impairments, seizures, learning difficulties, and attention issues. Symptoms of these disorders often encompass cognitive limitations, delays in both gross and fine motor skills, hypotonia, and difficulties in social and behavioural aspects. [1,2]. Short tandem repeats (STRs), previously known as microsatellites or simple sequence repeats, are short DNA motifs of one to six bp that are consecutively repeated in a head-to-tail-like fashion [1,2]. To date, multiple human diseases and disorders have been attributed to STRs. There appears to be a link between STRs and neurological disorders, as a large proportion of STR disorders are neuromuscular, neurodegenerative, or neurodevelopmental in nature. Currently, the primary repeat class that is associated with NDDs is the trinucleotide CGG STR. Several NDDs have been definitively associated with CGG STRs and several other NDD and neurodegenerative phenotypes have had the involvement of CGG STRs implicated (Table 1). The majority of these disorders involve some form of cognitive disorder including, but not limited to ID, developmental delay, and ASD. The typical mechanism linked to NDD-causative STRs is repeat expansion, which we will discuss further.
Folate-sensitive fragile site . | Chromosome [19] . | Localised gene/s . | Gene region . | Disease phenotype . | Reference . |
---|---|---|---|---|---|
FRAXA | Xq27.3 | FMR1 | 5′-UTR | Fragile X syndrome | [20] |
FRAXE | Xq28 | AFF2 | 5′-UTR | Fragile XE syndrome | [21] |
FRAXF | Xq28 | TMEM185A1 | 5′-UTR | Developmental delay | [22] |
FRA1M | 1p21.3 | ABCD32 | 5′-UTR | Candidate | [23,24] |
FRA2A | 2q11.2 | AFF3 | 5′-UTR/intron | Intellectual disability | [25,26] |
FRA2B | 2q13 | BCL2L112 | 5′-UTR/intron | Candidate | [23,24] |
RGPD12 | 5′-UTR/intron | Candidate | [24] | ||
FRA2K | 2q22.3 | ACVR2A2 | Promotor/5′-UTR | Candidate | [24] |
FRA5G | 5q35 | RASGEF1C2 | 5′-UTR | Candidate | [6,7] |
FAM193B2 | 5′-UTR | Candidate | [24] | ||
FRA6A | 6p23 | RANBP92 | Exon | Candidate | [23,24] |
GFOD12 | Promotor | Candidate | [6] | ||
FRA7A | 7p11.2 | ZNF7131 | Intron | Autism spectrum disorder | [27] |
FRA8A | 8q22.3 | FZD62 | 5′-UTR | Candidate | [23,24] |
FRA9A | 9p21 | C9orf722 | Intron | Candidate | [23,24] |
ELAVL22 | 5′-UTR | Candidate | [24] | ||
FRA9B | 9q32 | TMEM2682 | 5′-UTR | Candidate | [6] |
ZNF6182 | Promotor/5′-UTR | Candidate | [6] | ||
FRA10A | 10q23.3 | FRA10AC1 | 5′-UTR | Neurodevelopmental disorder | [28–30] |
FRA11A | 11q13.3 | C11orf801 | Exon | Candidate | [31] |
FRA11B | 11q23.3 | CBL | 5′-UTR | Jacobsen syndrome | [32] |
FRA12A | 12q13.1 | DIP2B1 | 5′-UTR | Intellectual developmental disorder | [33] |
FRA12D | 12q24.13 | TPCN12 | 5′-UTR | Candidate | [6] |
HECTD42 | 5′-UTR/exon | Candidate | [6] | ||
FRA16A | 16p13.11 | XYLT1 | 5′-UTR | Baratela–Scott syndrome | [4] |
FRA19B | 19p13 | LINGO32 | Promotor/5′-UTR | Candidate | [23,24] |
SAMD12 | 5′-UTR/exon | Candidate | [6] | ||
GIPC12 | 5′-UTR | Candidate | [6] | ||
BRD42 | Promotor/5′-UTR | Candidate | [6] | ||
CARM12 | Promotor/5′-UTR | Candidate | [6] | ||
FRA20A | 20p11.23 | RALGAPA22 | 5′-UTR | Candidate | [23,24] |
FRA22A | 22q13 | CSNK1E1 | 5′-UTR | Autism spectrum disorder | [34] |
Folate-sensitive fragile site . | Chromosome [19] . | Localised gene/s . | Gene region . | Disease phenotype . | Reference . |
---|---|---|---|---|---|
FRAXA | Xq27.3 | FMR1 | 5′-UTR | Fragile X syndrome | [20] |
FRAXE | Xq28 | AFF2 | 5′-UTR | Fragile XE syndrome | [21] |
FRAXF | Xq28 | TMEM185A1 | 5′-UTR | Developmental delay | [22] |
FRA1M | 1p21.3 | ABCD32 | 5′-UTR | Candidate | [23,24] |
FRA2A | 2q11.2 | AFF3 | 5′-UTR/intron | Intellectual disability | [25,26] |
FRA2B | 2q13 | BCL2L112 | 5′-UTR/intron | Candidate | [23,24] |
RGPD12 | 5′-UTR/intron | Candidate | [24] | ||
FRA2K | 2q22.3 | ACVR2A2 | Promotor/5′-UTR | Candidate | [24] |
FRA5G | 5q35 | RASGEF1C2 | 5′-UTR | Candidate | [6,7] |
FAM193B2 | 5′-UTR | Candidate | [24] | ||
FRA6A | 6p23 | RANBP92 | Exon | Candidate | [23,24] |
GFOD12 | Promotor | Candidate | [6] | ||
FRA7A | 7p11.2 | ZNF7131 | Intron | Autism spectrum disorder | [27] |
FRA8A | 8q22.3 | FZD62 | 5′-UTR | Candidate | [23,24] |
FRA9A | 9p21 | C9orf722 | Intron | Candidate | [23,24] |
ELAVL22 | 5′-UTR | Candidate | [24] | ||
FRA9B | 9q32 | TMEM2682 | 5′-UTR | Candidate | [6] |
ZNF6182 | Promotor/5′-UTR | Candidate | [6] | ||
FRA10A | 10q23.3 | FRA10AC1 | 5′-UTR | Neurodevelopmental disorder | [28–30] |
FRA11A | 11q13.3 | C11orf801 | Exon | Candidate | [31] |
FRA11B | 11q23.3 | CBL | 5′-UTR | Jacobsen syndrome | [32] |
FRA12A | 12q13.1 | DIP2B1 | 5′-UTR | Intellectual developmental disorder | [33] |
FRA12D | 12q24.13 | TPCN12 | 5′-UTR | Candidate | [6] |
HECTD42 | 5′-UTR/exon | Candidate | [6] | ||
FRA16A | 16p13.11 | XYLT1 | 5′-UTR | Baratela–Scott syndrome | [4] |
FRA19B | 19p13 | LINGO32 | Promotor/5′-UTR | Candidate | [23,24] |
SAMD12 | 5′-UTR/exon | Candidate | [6] | ||
GIPC12 | 5′-UTR | Candidate | [6] | ||
BRD42 | Promotor/5′-UTR | Candidate | [6] | ||
CARM12 | Promotor/5′-UTR | Candidate | [6] | ||
FRA20A | 20p11.23 | RALGAPA22 | 5′-UTR | Candidate | [23,24] |
FRA22A | 22q13 | CSNK1E1 | 5′-UTR | Autism spectrum disorder | [34] |
CGG STR expansions within the given gene are known to be causative of the associated fragile site and have been observed within individuals with a NDD phenotype, however, the association between the expansion and phenotype is not yet unequivocally linked;
CGG STR expansions within this gene have been identified through either methylation epigenetic signal or bioinformatic repeat expansion detection analyses marking these loci/genes as candidates causative for NDDs and folate-sensitive fragile sites.
CGG STR expansion may manifest as a folate-sensitive fragile site (FSFS). These are cytogenetically visible breaks in the chromosomes that have been observed throughout the human genome, which are inducible by culturing cells in a folate-deficient medium [3]. To date, 22 FSFS have been identified. At the molecular level, FSFS are due to a CGG-repeat expansion. While for a long time, scientific interest in STR CGG expansions has been almost exclusively limited to the fragile X syndrome (FXS), recently, interest in this class of STRs has exploded for a number of reasons. First, several FSFS due to CGG-repeat expansions that have long been considered harmless polymorphisms have now been unequivocally linked to NDD [4,5]. Second, the bioinformatic discovery that CGG-repeats are much more common than anticipated [6] and third the outcome of an epigenetic survey that discovered additional disease-relevant CGG Expansions [7]. The purpose of this review is to highlight and connect these recent developments in this rapidly emerging field.
CGG-repeat expansion-mediated epigenetic silencing
CGG STRs mutations (at least from the perspective of the FMR1 gene) are typically divided into three different mutation classes. These are ‘full mutation' >200 repeat units [8,9], ‘premutation' 200–56 repeat units [10,11], and ‘gray zone' 55–41 repeat units [12,13]. Currently, full mutation repeat expansions appear to be the principal mutation mechanism linked to NDDs in humans. While the premutation and gray zone classes have been linked to human disease phenotypes, they are primarily associated with neurodegenerative disorders, such as Fragile X-associated tremor/ataxia syndrome and Neuronal intranuclear inclusion disease [14,15]. Furthermore, the pathogenic mechanisms at play differ considerably between the full mutation and the premutation and gray zone class mutations. Within this discussion, we shall focus on the full mutation class.
When a CGG STR locus expands beyond its full mutation length, for instance, 200 CGG units in the case of FXS, this triggers an epigenetic methylation event. Here, the CpG islands within and immediately surrounding the repeat tract all become methylated [16]. This hypermethylation of the repeat locus then causes the hyper-condensation of the DNA strand and the surrounding chromatin [17]. If the hypermethylated repeat in question is localised within a gene or gene promotor region (which is often the case for CGG STRs), the hyper-condensation will block access to the binding sites required by cellular transcription machinery in order to transcribe the gene in question [17]. Therefore, the gene is silenced as no mRNA is produced and consequently, no gene product can be realised. This produces an effect similar to gene loss-of-function or a gene deletion as no protein will be produced from that allele [18]. As stated above, a further consequence of CGG STR expansion may be the manifestation of a cytogenetically visible FSFS that is induced by culturing cells in a folate-deficient medium [3]. All 22 FSFS that have been described are listed in Table 1. The STRs that have been associated or linked with disease due to repeat full mutations are discussed in the following section.
FRAXA — FMR1
By far, the most well-known and comprehensively studied CGG STR is found within the 5′-untranslated region (UTR) of the Fragile X Messenger Ribonucleoprotein 1 (FMR1) gene on the X chromosome. Full mutation expansions of the FMR1 CGG-repeat results in FXS. Not only was FXS the first defined CGG-repeat expansion disorder, but it is also the most common inherited form of ID and syndromic ASD [8]. FXS occurs in 1 in 4000 males and 1 in 8000 females, with males generally displaying a more severe phenotype. Females may display less severe phenotypes due to X-inactivation [35,36]. As females present with two X chromosomes, during embryonic development, in each cell, most genes on one of the X chromosomes are randomly silenced [36,37]. Therefore, in many cases, the expansion-containing allele would be silenced regardless and the healthy allele would be available for the transcription of FMR1 [38]. To date, most research on the CGG STR is framed within the context of the FMR1 repeat. Typically, the normal population length of the FMR1 repeat is ∼30 CGG units but ranges from ∼6 to 40 repeat units. FXS is primarily characterised by mild to moderate ID, ASD, and particular facial appearance. Other common symptoms of FXS include joint laxity, macroorchidism (in males), hyperactivity, seizures, social and language impairments, and increased anxiety [38,39]. The highest levels of FMR1 expression have been observed within neurons and the foetal brain in the early stages of proliferation and migration of nervous system cells [40,41]. Broadly, FMR1 plays a role in development, mRNA binding, and translation regulation [42,43]. It seems that as FMR1 is a crucial gene in the development of the central nervous system (CNS) that its absence during development results in FXS.
FRAXE — AFF2
Fragile XE (FRAXE) syndrome is a rare genetic condition of X-linked ID. The syndrome was initially identified in individuals who had been referred for FXS testing [21]. It is named for the FRAXE fragile site observed within the patients occurring on chromosome Xq28 distal to the FRAXA site. FRAXE syndrome is categorised by mild ID, however the phenotype can be highly variable between individuals and symptoms can include learning difficulties, language delay, hyperactivity, and a shortened attention span [44]. The cause of this disorder is a full-mutation CGG STR in the 5′-UTR of the ALF Transcription Elongation Factor 2 (AFF2) gene (formerly known as FMR2) which results in transcriptional silencing of the gene [45,46]. Though the exact molecular consequence of AFF2 silencing and why FRAXE syndrome arises is unknown, it is suggested that AFF2 is an RNA-binding protein that regulates alternative splicing through interaction with G quartet RNA structures and silencing of the gene may disrupt crucial gene regulation during development [44,47].
FRA2A — AFF3
ALF Transcription Elongation Factor 3 (AFF3) is a gene, like AFF2, which is part of the AF4/FMR2 gene family. Interestingly, AFF3 contains two CGG-repeats that are capable of expanding to large sizes. There is a CGG-repeat present within the 5′ promotor region and another repeat present in intron 2 of AFF3, which is a brain-active alternative promoter [6,25]. Expansion of the brain-specific promotor CGG STR beyond 200 units results in hypermethylation of the repeat and the adjacent CpG islands which accounts for gene silencing [25]. However, while the FRA2A fragile site was associated with neurodevelopmental phenotypes such as delay of motor and language skill acquisition, it was not possible to unequivocally link this expansion to neurodevelopmental delay as yet [25]. Recently, an enrichment of the AFF3 STR expansion was observed in 6371 probands with neurodevelopmental problems [26]. cSNP analysis indicated monoallelic expression of the AFF3 gene within FRA2A carriers, which may indicate that AFF3 is a functionally haploinsufficient gene [27]. Haploinsufficiency of AFF3 within the developing brain may cause an NDD phenotype. Two other independent studies by Shimizu et al. [48] and Voisin et al. [49] have indicated heterozygous de novo missense or deletion mutations within the AFF3 gene within individuals displaying KINSSHIP (horseshoe kidney, Nievergelt/Savarirayan type of mesomelic dysplasia, seizures, hypertrichosis, ID, and pulmonary involvement) syndrome. This lends further credence to the possibility that the heterozygous silencing of the AFF3 gene through the FRA2A repeat expansion would result in an NDD or ID phenotype. A potential reason behind the phenotypic differentiation between KINSSHIP syndrome patients versus FRA2A patients may be due to the localisation of the STR expansion in the alternate brain-specific transcript of the AFF3 gene.
FRA7A — ZNF713
Zinc Finger Protein 713 (ZNF713) is a gene found on chromosome 7p11.2 in relative proximity to the centromere and has a significantly enriched expression within the brain in comparison with the other body tissues [27]. It is a member of the Kruppel-type zinc finger protein family, and it contains a Kruppel-associated box (KRAB) regulatory domain [27]. While very little is known about the ZNF713 gene, it is known to contain two large and polymorphic CGG-repeats and has been implicated in ASD. One within the 5′-UTR and another within intron 1 [6]. Using FISH analysis, the 5′repeat was identified as the causative repeat for the FRA7A FSFS [27]. While a definitive link was not established, the full mutation and subsequent silencing of the FRA7A repeat has been associated with ASD within two unrelated families [27]. Interestingly, the relatives (of ASD-affected family members) who harboured a premutation-sized CGG-repeat allele displayed increased expression of the ZNF713 gene [27], which is consistent with what is known and previously discussed of premutation-length CGG-repeat alleles.
FRA10A — FRA10AC1
The FRA10A-associated CGG-Repeat 1 (FRA10AC1) gene is known to contain the CGG STR responsible for the FRA10A fragile site. The original study that identified the repeat expansion in FRA10AC1 observed that in their tested population the repeat length was variable, however, all observed individuals that exhibited a FRA10A fragile site were heterozygous, displaying a normal-sized repeat and an expanded repeat of at least 200 CGG unit [28]. Furthermore, it was observed that in the allele containing the expanded repeat was hypermethylated and FRA10AC1 was not transcribed [30], providing further evidence for the full-mutation mechanism of the CGG STR. Due to the lack of a disease phenotype in these individuals combined with another study where a FRA10A carrier lacked any discernible phenotype, it was concluded that (at least in the heterozygous state) the CGG-repeat full-mutation in FRA10AC1 was not pathogenic [28,50]. This would indicate that FRA10AC1 is not a haploinsufficient gene. However, recently it was determined by von Elsner et al. [5] that bi-allelic loss-of-function mutations of FRA10AC1 result in a NDD with growth disruption, dysmorphic features, and corpus callosum abnormalities. This was further supported by several other studies that identified homozygous loss-of-function mutations within FRA10AC1 in NDD patients [29,30]. However, as of yet, an NDD patient exhibiting a loss-of-function mutation along with a full-mutation FRA10AC1 repeat has not been reported. Much like the XYLT1 repeat (to be discussed), if such a case were to arise, we would expect a similar phenotype as reported by von Elsner and colleagues to manifest.
FRA11B — CBL
The FRA11B CGG-repeat is located on chromosome 11q23.3 immediately upstream of the CBL Protooncogene (CBL) gene [51]. The FRA11B CGG-repeat has been implicated in Jacobsen Syndrome [52]. This is an interesting case as the proposed manner in which the full-mutation contributes to Jacobsen syndrome is unique in comparison with the other cases of CGG full-mutations. Jacobsen syndrome is a contiguous gene syndrome caused by partial deletion of the q arm of chromosome 11 [52]. The symptoms of Jacobsen syndrome can be quite varied as the breakpoint of the deletion can vary along the q arm of chromosome 11. However, typical characteristics include particular facial features, Paris–Trousseau syndrome (a bleeding disorder), delayed motor and speech development, and cognitive impairment [53]. In a subset of Jacobsen patients, the breakpoint of the 11q arm deletion is localised at the FRA11B fragile site or within proximity to it [32]. In the FMR1 gene, It is thought that secondary structures produced by the expanded CGG-repeat cause replication stress during DNA replication, which causes stalling of the replication fork and results in a double-stranded break (DSB) [54,55]. Then through the initiation of break-induced repair, mutations are realised downstream of the CGG-repeat [56]. It is suggested that through a similar mechanism of replication stress, a DSB is induced at the FRA11B site which then results in the loss of the chromosome 11q arm downstream of the CGG-repeat [57]. While a distinct correlation has been established between the CBL CCG-repeat expansion and Jacobsen syndrome [57], it is unclear if this mechanism of CGG-repeat mediated chromosomal breakage is replicated elsewhere. While deletions proximal to the FMR1 repeat have been reported [58,59], large deletions with breakpoints at the other FSFSs have not. Heterozygous mutations in the CBL gene have been implicated in Noonan Syndrome, Noonan Syndrome-like Disorder, and leukaemia [60,61], however, expression of FRA11B has not.
FRA12A — DIP2B
Giraud et al. [62] first identified the FRA12A fragile site occurring on chromosome 12q13 within a male patient presenting with ID and congenital complications. It was determined that this fragile site is caused by a CGG-repeat expansion (full-mutation) within the disco-interacting protein 2 homologue B (DIP2B) gene [33]. Since several studies have identified the FRA12A site co-occurring with ID [33,63–65], the disorder has been termed Intellectual developmental disorder, FRA12A type. However, the association of the FRA12A fragile site with ID can be controversial, as the expression of this fragile site has been observed in healthy individuals. However, there may be a dosage effect relating to the effect of the DIP2B expansion. As in patients, the presence of the FRA12A site was observed in 43.7% of lymphocytes, whereas in unaffected carriers of FRA12A, observation was far lower at 16.6% of lymphocytes [23]. Furthermore, in individuals without ID but with the presence of FRA12A, DIP2B expression was reduced to approximately two-thirds of the levels observed in controls. However, in individuals where both FRA12A and ID were present, DIP2B expression was reduced to half that of controls [33]. DIP2B has been determined to be a regulator of axonal development and is essential for the development of neuron cells [66]. Therefore, if DIP2B were to be transcriptionally silenced in a large proportion of cells during early neuronal development this may be a contributing mechanism towards neurocognitive disorders.
FRA16A — XYLT1
Baratela–Scott syndrome is an interesting case. The syndrome is categorised by short stature, dysmorphic facial features, skeletal dysplasia, and developmental delay [67]. In the past, Baratela–Scott syndrome was thought to be caused by pathogenic variants or total and partial deletion of the XYLT1 gene, which encodes for the XT1 xylosyltransferase enzyme, which catalyses the primary step in chondroitin sulfate and dermatan sulfate proteoglycan biosynthesis [4]. It was noticed in several Baratela–Scott syndrome patients that a pathogenic variant or deletion could not be detected in the XYLT1 gene by standard sequencing techniques [4]. In these patients, a CGG-repeat expansion was detected, which was undergoing hypermethylation and silencing expression of the XYLT1, the mechanism observed in FXS [4]. The genomic position of this CGG STR correlates with the previously reported FSFS FRA16A [4,68]. Interestingly, what was demonstrated by LaCroix et al. [4], was that the XYLT1 gene is not haploinsufficient, and for a patient to manifest Baratela–Scott Syndrome they must contain either a pathogenic variant, a deletion, or a repeat expansion on both alleles of the XYLT1 gene. In this case, the repeat-associated disorder is autosomal recessive, whereas the majority of known repeat-associated disorders are autosomal dominant [4]. This case coupled with the difficulty of detecting repeat expansions in patients through standard sequencing means illustrates how repeat expansions could be an overlooked disease-causing mechanism and likely contributes towards the missing heritability problem. Furthermore, it demonstrates how STR expansions and gene silencing through CGG-hypermethylation can replicate disorders that are typically thought to be caused by either pathogenic variants or deletion events.
FRA22A — CSNK1E
The casein kinase 1 epsilon (CSNK1E) gene is a known member of the circadian rhythm control genes and has been previously implicated in neuropsychiatric disorders such as bipolar disorder, schizophrenia, and opiate addiction [69–71]. Through an epigenetic analysis of over 23 000 individuals, hypermethylation signatures in CSNK1E were detected and it was determined that CSNK1E contains the CGG STR that is responsible for the FRA22A fragile site through CGG STR expansion [7]. Recently, a heterozygous de novo variant in CSNK1E (p.Glu117Lys) has been reported in patients presenting with hypotonia, speech apraxia, and ASD [34]. As CSNK1E is expressed in all human brain regions during natal development and is co-expressed with multiple neurological disorders candidate genes this positions the FRA22A CGG STR as another strong candidate for NDD contributing repeat [34].
The discovery of CGG-STRs beyond fragile sites
In addition to the CGG-repeats that were discovered as FSFS Recent studies have identified ∼6000 CGG STR loci throughout the human genome [6,24]. Interestingly, it was observed that hundreds of these CGG STR loci displayed characteristics, such as native repeat length, genetic context, and polymorphism and mutation rates, in line with those observed within the CGG STR loci that have been confirmed to be disease-causing. [6]. An Independent cohort study that aimed at identifying epivariations in the human genome identified 25 loci where rare hypermethylation coincided with the presence of an unstable CGG tandem repeat [7]. In many cases, the presence of CGG expansions at these loci was confirmed.
Taking the bioinformatic approach and the epivariation study together, we now are able to present candidate repeat expansions as the molecular defects underlying most of the known FSFSs in the genome as included in Table 1. In some cases, multiple CGG STR expansion candidates were identified within the breakpoints of the same FSFS. These may represent different CGG-repeat expansions beyond the resolution of discrimination by cytogenetic analysis. For instance, while FRAXA, FRAXE, and FRAXF are now known to be distinct repeat expansion loci, they cannot be discriminated cytogenetically, and in the past, this has led to misidentification. The identification of multiple repeat candidates within a single FSFS may be suggestive of a similar phenomenon as observed on chromosome X.
Many of these repeat-containing genes present as additional candidate genes for NDD. Altogether, the comparison of the genomic location of these genes with known disease genes illustrates that there is an overwhelming enrichment of CGG STRs within genes in which mutations are known to be causative or linked to NDDs (Figure 1). Overall, of thousands of gene-localised CGG STRs, Figure 1 highlights the genes that contain many of the largest and most polymorphic CGG STRs within the human genome. These were detected through the bioinformatic genotyping of a large cohort of whole genome samples [24].
Genome distribution of all known CGG short tandem repeats localised within genes known to contribute or be associated with NDDs.
Genomic positioning (GRCh38) and HGNC names of NDD-linked genes can be observed lining the outside of the ideogram. Track A displays the largest repeat length recorded at that given locus, while Track B demonstrates the rates of heterozygosity of the given repeat locus. The chromosomes and their displayed data are separated by colour for differentiation. The data to generate this figure was taken from Annear et al. [24] where large-scale CGG STR genotyping of NDD patients and their family members was performed.
Genomic positioning (GRCh38) and HGNC names of NDD-linked genes can be observed lining the outside of the ideogram. Track A displays the largest repeat length recorded at that given locus, while Track B demonstrates the rates of heterozygosity of the given repeat locus. The chromosomes and their displayed data are separated by colour for differentiation. The data to generate this figure was taken from Annear et al. [24] where large-scale CGG STR genotyping of NDD patients and their family members was performed.
Additionally, through the comparison of CGG STR mutation rate and mutation size in ASD-affected individuals, compared with their unaffected siblings, multiple CGG STR-containing genome regions were determined to display higher rates and degrees of STR mutation with the ASD-affected individuals [24]. Specifically in ASD, multiple other studies have established a link between the incidence of rare, large repeat expansions in ASD patients when compared with controls [72,73]. Further linking the mutation of STRs to NDDs.
Beyond neurodevelopment disorders, CGG STRs have been implicated in other forms of congenital and developmental disorders, mostly due to the expansion of exon-based repeats encoding for polyalanine tract [74–77]. In these cases, the pathogenic repeat length breakpoints appear far lower than that seen in repeat expansion-mediated epigenetic silencing, with lengths of 20 to 30 repeat units being observed. Furthermore, what is surprising is that it appears that the trinucleotide repeats are over-represented within developmental and congenital disorders [78]. While other STR classes have been associated with neuromuscular and neurodegenerative disorders, such as the GAA, CAG and CCGGGG STRs, few other classes have been definitively associated with NDDs beyond the CGG class. STRs are fascinating in the sense that though they are grouped as a class of genetic features, however depending on their nucleotide composition and the context of their position within the genome, they can exhibit wildly distinct pathogenic mechanisms. Not only that, but the pathogenic mechanism at play can vary greatly within the same class of STR depending on repeat length, the CGG STR is the classical example of this.
The CGG STR remains the primary class of STR linked to NDDs, and as each year goes by the evidence for this mounts. Beyond the known disease-causing CGG STRs, hundreds of CGG loci localised with NDD-related genes have now been identified, with many of these repeat loci reflecting the characteristics exhibited by the disease-causative repeats. Furthermore, with several studies having now identified higher rates and degrees of STR mutation within individuals presenting an NDD, such as ASD, this further solidifies the link between neurological function and repeat expansion [24,72,79]. As we now have many STR loci as strong novel candidates which may contribute towards NDDs, it may only be a matter of time before these diagnoses are matched with patient phenotypes. These efforts are surely bolstered through the development of sequencing technologies, such as the long-read sequencing platforms, and the increased availability of sophisticated DNA analysis pipelines. With the rise of these technologies, we may potentially detect new classes of STRs that contribute to NDDs through mechanisms similar to or differing from the CGG STRs.
Summary
CGG trinucleotide STRs are the primary repeat class that is associated with NDDs.
The principal mechanism behind CGG STR-associated NDDs is repeat expansion-mediated epigenetic silencing.
Repeat expansion-mediated epigenetic silencing is where a given repeat expands to or beyond the full mutation threshold (>200 repeat units for the FXS CGG STR) and the CpG islands within and flanking the repeat become hypermethylated, consequently preventing gene transcription.
22 folate-sensitive fragile sites have been reported. These are cytogenetically visible breaks on the metaphase chromosomes, caused by STR expansions. Approximately half have had the causative CGG STR identified. Several of these CGG STRs are known to be causative of NDDs.
Recently, hundreds of new CGG STRs have been categorised, which reflect the characteristics of known NDD-causing repeats. Therefore, this raises the question of to what degree additional CGG STRs contribute towards NDDs.
Competing Interests
The authors declare that there are no competing interests associated with the manuscript.
Abbreviations
- AFF2
ALF Transcription Elongation Factor 2
- ASD
autism spectrum disorder
- CSNK1E
casein kinase 1 epsilon
- DIP2B
disco-interacting protein 2 homologue B
- DSB
double-stranded break
- FMR1
Fragile X Messenger Ribonucleoprotein 1
- FRA10AC1
FRA10A-associated CGG-Repeat 1
- FRAXE
Fragile XE
- FSFS
folate-sensitive fragile site
- FXS
Fragile X syndrome
- ID
intellectual disability
- NDDs
neurodevelopmental disorders
- STR
Short tandem repeat
- UTR
untranslated region
- ZNF713
Zinc Finger Protein 713