DNA methylation involves the addition of a methyl group to the fifth carbon of the pyrimidine cytosine ring (5-methylcytosine, 5mC). 5mC is widespread in vertebrate genomes where it is predominantly found within CpG dinucleotides. In mammals, 5mC participates in long-term silencing processes such as X-chromosome inactivation, genomic imprinting, somatic silencing of germline genes, and silencing of repetitive DNA elements. The evidence for 5mC as a dynamic gene-regulatory mechanism is mostly limited to specific examples, and is far from being completely understood. Recent work from diverse model systems suggests that 5mC might not always act as a dominant repressive mechanism and that hypermethylated promoters and enhancers can be permissive to transcription in vivo and in vitro. In this review, we discuss the links between 5mC and enhancer activity, and evaluate the role of this biochemical mechanism in various biological contexts.
Chemical modifications to the DNA have the potential to encode transient and permanent changes in gene-regulatory activity that are heritable through cell division and through generations [1,2]. In eukaryotes, the major form of DNA methylation involves the addition of a methyl group to the fifth position of the cytosine ring (5-methylcytosine, 5mC) by DNA methyltransferase enzymes (DNMTs) . 5mC is established de novo by DNMT3A and DNMT3B and is maintained through DNMT1, which recognises hemi-methylated CpGs through the UHRF1 protein . Maintenance of 5mC is essential to guarantee faithful re-establishment of 5mC on the newly synthesised strand following DNA replication. 5mC can be removed from the DNA through two major mechanisms: passive and active DNA demethylation. During passive DNA demethylation, 5mC is diluted following several rounds of DNA replication due to the absence of DNA methylation maintenance following replication (absence of DNMT1) [5,6]. Conversely, in active DNA demethylation, 5mC can be removed enzymatically in a stepwise manner. Active DNA demethylation involves the iterative oxidation of 5mC by Ten-Eleven Translocation (TET) dioxygenase enzymes to the intermediates 5-hydroxymethylcytosine (5hmC), 5-formylcytosine (5fC), and 5-carboxylcytosine (5caC) followed by their subsequent removal via the base-excision repair (BER) pathway .
In the animal kingdom, the predominant form of 5mC is usually found within CpG dinucleotides, however 5mC in CpH dinucleotides (where H is A, C, or T) that is common in plant genomes, has also been identified in invertebrate genomes , and mammalian embryonic stem cells (ESCs) and brain tissue [9,10]. It is worth noting, however, that 5mC levels in the CpH context might be somewhat overestimated due to biases associated with whole-genome bisulphite sequencing (WGBS) library preparation . The levels of 5mC in the CpG context can vary greatly between and within animal phyla [12,13]. For example, in insects, genomic CpG 5mC levels can range from 0% (Diptera) to 14% (Blattella asahinai – asian cockroach) . Invertebrates, on an average, display lower global 5mC levels than vertebrates, with the majority of 5mC localised to gene bodies of active genes . Such intragenic 5mC has also been observed in highly transcribed genes in mouse ESCs where it is thought to prevent spurious transcription initiation as a result of cryptic entry by RNA polymerase II . Unlike invertebrates, vertebrate genomes are characterized by genomic 5mC that often reaches 70–80% in the CpG context . Due to such high CpG 5mC content, CpG sites have been evolutionarily depleted from vertebrate genomes, which is a phenomenon that can be explained by increased deamination (C to T base change) of 5mC . The exception to this global hypermethylated state are CpG islands (CGIs), short stretches of high CpG density that are mostly unmethylated and that often coincide with gene promoters .
Initial experiments that demonstrated the gene-repressive potential of 5mC involved the transfection of in vitro methylated templates to cultured cells and the subsequent measurement of their activity when compared with unmethylated templates [20,21]. These experiments revealed, that when transfected into cell cultures, methylated plasmids are predominantly silenced. This was followed by the description of precise biochemical mechanisms of 5mC-mediated gene repression such as the recruitment of methyl-CpG-binding proteins (MBD) and histone deacetylase (HDAC) containing complexes [22,23]. These important observations resulted in speculations of 5mC potentially acting as a dynamic gene regulatory mechanism, with broad-reaching implications for embryogenesis and cell differentiation. The advent of massively parallel sequencing technologies a couple of decades later allowed for unbiased insights into the dynamics of 5mC in diverse model systems. Collectively, this work unravelled that CGI promoters (i.e the majority of promoters in vertebrates) are by and large unmethylated independently of gene activity [24–26]. Those observations notwithstanding, it is important to note that when methylated, CGIs are strongly associated with transcriptional repression. For example, in both zebrafish and mammals, a set of germline CGI promoters become methylated at early embryonic stages, resulting in permanent somatic silencing of germline genes [27,28]. Distal regulatory elements such as enhancers display much greater differences in 5mC patterning between different tissues, cell types and developmental stages [29–31]. In general, enhancer activity as defined by transcription factor (TF) binding is anti-correlated with 5mC content [32,33]. However, the relationship between 5mC removal and enhancer activation is still not well understood. Here we discuss the functional links between 5mC and the activity of gene-regulatory elements, with a particular focus on transcriptional enhancers.
5mC and the regulation of CGI promoters
CGI promoters are the most common promoter type in vertebrate genomes. They are characterised by high CpG richness and elevated G+C content . Notably, the majority of CGI promoters are unmethylated independent of gene activity, even though some CGI promoters become methylated and silenced during cellular differentiation and X-chromosome inactivation [27,34,35]. It is also worth noting that forceful methylation using targetted epigenome editing technologies does not result in permanent CGI 5mC and silencing . The constitutively unmethylated state of CGIs is established and maintained by active histone marks, TF binding, recruitment of CxxC domain proteins, and TET demethylases [37–44]. When methylated, CGIs can act as recruitment sites for MBDs such as MBD1, MBD2, MBD4, and MeCP2 . These proteins appear to be constitutively bound to a selection of methylated CGIs in both pluripotent and differentiated states , in line with the absence of major 5mC changes during this developmental transition . On the other hand, Polycomb group proteins display more extensive dynamics at CGIs during development and differentiation . Overall, CGIs act as platforms for transcriptional regulation that are largely independent of 5mC dynamics. In the modest number of cases where 5mC is targetted to CGIs it appears that this process acts as an additional silencing safe-lock rather than the primary determinant of gene activity [27,46].
5mC levels at enhancers
Enhancers are traditionally defined as distal cis-acting regulatory elements that enhance transcription through recruitment of TFs and co-activator proteins . Enhancers activate and regulate expression of their target genes over distances that range from several hundreds of base pairs to a few megabases. In the absence of enhancers, transcription of associated genes is often weakened. On average, enhancers display low CpG density, even though CpG-rich enhancers have also been described [25,31,48]. Based on their activity, enhancers can broadly be classified as: active, primed, poised, or silent. Active enhancers are characterised by nucleosome depletion and the presence of active histone modifications such as H3K4me1 and H3K27ac [47,49,50]. Poised enhancers are depleted of nucleosomes and decorated with the permissive H3K4me1 and the repressive Polycomb H3K27me3 mark, whereas primed enhancers bear H3K4me1 and also display nucleosome depletion. Enhancers exhibit variable 5mC levels (10–50%) that correlate with the extent of TF binding [32,33]. An ENCODE study explored these trends globally by performing linear regression analysis between 5mC and chromatin accessibility (as measured by DNAseI hypersensitivity) at >34000 genomic sites, and revealed a significant association ; increased 5mC was negatively correlated with chromatin accessibility at 20% of the examined sites. The same study compared TF expression levels with average 5mC levels at their corresponding binding sites across the genome. Similarly, a significant negative correlation was observed between TF expression and 5mC content at TF binding sites for 70% of the TFs examined, arguing that 5mC deposition might be a direct consequence of the absence of TF expression . Altogether, data from the studies discussed above suggest that active enhancers on an average display less 5mC than poised or silent enhancers, yet their 5mC levels are not as low as 5mC levels observed in constitutively unmethylated CGIs .
5mC, 5hmC, and TF binding
While TF binding at enhancers is necessary for transcriptional regulation of the associated genes, it is not yet established whether TF binding induces the formation and maintenance of hypomethylation at enhancers or whether hypomethylation at enhancers promotes TF binding. Initial work suggested that 5mC might act to repel binding of TFs from targetted sites , however, recent work suggests that this mechanism might not be as common as previously thought. A high-throughput study examined the effects of 5mC on binding of TFs (n=542) in vitro and found that the majority of human TFs assayed (33%) were not repelled by 5mC, whereas 34 and 23% of TFs displayed increased and decreased binding to methylated sequences, respectively  (Figure 1A). Moreover, recent work has demonstrated that 5mC as well as its oxidative intermediates: 5hmC, 5fC, and 5caC can recruit transcriptional activators in vitro .
5mC turnover at gene-regulatory regions
The insulator protein CTCF has been studied extensively in terms of its interactions with 5mC. Data from mouse ESCs suggest that CTCF can mediate active DNA demethylation at bound regions  and that the introduction of CTCF sites in stably integrated reporter constructs results in decreased 5mC . Similarly, reintroduction of TFs is able to reverse a hypermethylated regulatory region to a hypomethylated state once bound by a TF. This has been demonstrated in mouse ESCs that lack the TF REST, in which REST−/− cells display increased 5mC and reduced 5hmC at REST binding sites compared with wild-type cells. Reintroduction of REST leads to decrease in 5mC and increase in 5hmC at REST binding sites, indicative of an active, TET-dependent mechanism of demethylation . This is further supported by studies of the glucocorticoid receptor binding to tissue-specific enhancers, in which binding of the receptor induces DNA demethylation resulting in enhancer activation . Given the modest number of reports describing such examples, it is currently difficult to propose a unifying molecular mechanism of TF-mediated demethylation of regulatory regions. It is likely, however, that TET methylcytosine dioxygenases play a role in at least a subset of such interactions (Figure 1B,C). This is supported by a recent screen that identified a handful of demethylation-inducing TFs, including: RUNX3, GATA2, CEBPB, MAFB, NR4A2, MYOD1, CEBPA, and TBX5 .
Enrichment of 5hmC, the major 5mC oxidised derivative, has been identified on predicted enhancers in mammalian ESCs [57,58]. In human ESCs, 5hmC-enriched enhancers coincide with binding sites of pluripotency regulators such as NANOG and OCT4  and TET2 knockouts in mouse ESCs result in extensive loss of 5hmC at enhancers, accompanied by 5mC enrichment (hypermethylation) and reduction in enhancer activity . Moreover a recent report interrogated the requirements for TET2 in induced pluripotent stem cell (iPSC) reprogramming . This work demonstrated that active DNA demethylation via the hmC intermediate precedes the formation of open chromatin, whereas the deletion of the demethylase TET2 almost completely inhibits reprogramming. This is indicative of robust 5mC oxidation and TF turnover on key regulatory elements associated with mammalian pluripotency. It is also worth noting that DNMT1 has low binding affinity for 5hmC, therefore 5hmC may further promote passive DNA demethylation by excluding DNMT1 and thus preventing methylation of cytosines on the nascent DNA strand . Mechanistically, 5hmC has been shown to disrupt nucleosome stability in vitro, and to decrease binding affinity of a major transcriptional repressor, MeCP2 [61,62]. Taken together, this suggests that 5hmC might contribute to enhancer activity by promoting both active and passive DNA demethylation and by influencing chromatin structure.
Compatibility of 5mC and active chromatin signatures
Despite the insights discussed above, the relationships between 5mC levels and enhancer activity are far from being fully understood. For example, in mouse genomes, distal regulatory elements display tissue-specific differential methylation. A subset of these differentially methylated regions are hypomethylated yet lack the active histone marks H3K4me1 and H3K27ac, and are consequently inactive in adult tissues (Figure 2A). However, these enhancers (‘vestigial’ enhancers) are active during embryonic development, displaying hypomethylation as well as H3K4me1 and H3K27ac, with their putative targets being developmental genes, suggesting that a form of ‘epigenetic memory’ is retained from development to adulthood in the form of DNA hypomethylation [29,63]. More notably however, these findings demonstrate that hypomethylation at these regions is not sufficient for enhancer activity. Moreover, a recent study exploring the correlation between chromatin structure and 5mC has challenged the typical inverse relationship between 5mC content and enhancer activity, demonstrating that 5mC can coexist with active histone marks at distal regulatory sequences . This report suggests that chromatin can exist in a bivalent state, characterised by the coexistence of both active histone marks such as H3K27ac and 5mC in human cells (Figure 2B). Specifically, high levels of 5mC were detected on the same nucleosome with H3K27ac at regular and super-enhancers in the human genome, coexisting in an unprecedented relationship that is not observed at promoters. Moreover, some of these regions were highly CpG dense, displaying CpG content comparable with that of CGIs. As these bivalent regions appear to be located outside TF-binding sites in the majority of the cases, it is possible that 5mC in this context does not repress enhancer activity, but rather stabilises DNA and defines demethylated TF-binding sites . A similar observation regarding the compatibility of 5mC and active histone marks was also recently made in zebrafish embryos . These observations notwithstanding, it is important to note that the majority of vertebrate enhancers are CpG poor and that it is not clear what impact 5mC has on their function. Moreover, whether there is a strict causal relationship between 5mC and enhancer activity remains unresolved.
Non-canonical chromatin configuration at enhancers
Enhancer methylation and embryonic development
In adult tissues, 5mC is a relatively static mark. However, during early mammalian embryogenesis it undergoes two rounds of global reprogramming. The first wave of genome-wide demethylation occurs shortly after fertilisation whereas the second wave takes place in primordial germ cells (PGCs) [1,2]. While such global reprogramming appears to be a mammalian-specific phenomenon, active DNA demethylation at distal regulatory elements during vertebrate body plan formation is an evolutionarily conserved trait [31,66]. A wave of active DNA demethylation at thousands of enhancers associated with conserved regulatory pathways such as Wnt, Notch, and transforming growth factor (TGF)-β results in coordinated transcriptional changes and morphological similarities during the vertebrate phylotypic period, the most highly conserved stage of vertebrate embryogenesis . The necessity of this epigenetic remodelling event is highlighted through triple TET zebrafish morphants, which exhibit embryonic lethality and severe developmental phenotypes including short and blended axes, reduced pigmentation and impaired head structures, and which coincide with increased 5mC at phylotypic stage enhancers. Notably, these >2400 phylotypic stage enhancers are highly CpG rich, displaying a CpG density comparable with that of CGIs, which might make them more prone to regulation via 5mC. Importantly, triple TET morphants also display reduced chromatin accessibility at phylotypic enhancers, as measured by ATAC-seq, indicative of 5mC being an upstream regulatory mark, the removal of which might be required for proper enhancer function. Similar impact on chromatin accessibility at regulatory regions has been demonstrated in mammalian systems where TET2/TET3 function was ablated [67,68]. Studies of mouse and human organogenesis during early embryogenesis have also revealed a conserved and elaborate methylation reprogramming of enhancers strongly associated with the development of almost all human organs, displaying the canonical inverse correlation between 5mC levels and enhancer activity . This requirement for enhancer demethylation is not only limited to embryogenesis but is also observed postnatally. One such example involves Early growth response protein 1 (EGR-1), which is able to recruit TET1, resulting in demethylation of thousands of regulatory regions during postnatal frontal cortex development . Moreover, recent work suggests that developmental removal of enhancer methylation might have preceded the origin of vertebrates. The invertebrate chordate amphioxus (Branchiostoma lanceolatum) has been shown to display robust demethylation of regulatory regions (as defined by ATAC-seq) during tissue differentiation . Thus, evolutionary conservation of developmental enhancer demethylation and the expression of associated developmental genes appears to be a feature required for proper embryonic development of vertebrates and beyond.
While 5mC is one of the most well-studied epigenetic modifications, the functional consequence of this mark in certain genomic contexts remains enigmatic. The significance of 5mC at distal regulatory elements is augmented by the fact that this is an evolutionarily conserved feature that must be carefully orchestrated for normal embryonic development and cell differentiation [31,66,68,71]. A number of studies support the notion that enhancer methylation influences transcription of the associated gene, however there is currently a lack of a definitive causal link that explains how 5mC regulates enhancer activity. Furthermore, the chromatin environment that is necessary for 5mC to influence gene expression is not entirely understood. Studies performed in diverse biological systems typically suggest that active enhancers display lower 5mC levels than poised or silent enhancers, as well as TF binding, 5hmC enrichment and the presence of active histone marks H3K4me1 and H3K27ac [32,47]. Conversely, hypermethylation at enhancers is generally associated with reduced chromatin accessibility and decreased TF binding . This consensus has been challenged by recent studies on the epigenetic environment of enhancers. Enhancers can exist in a bivalent state, displaying the coexistence of characteristic active and inactive chromatin features . It is also important to note that enhancer CpG density might play a major role in determining the regulatory inputs and outputs of enhancers. Thousands of CGI enhancers identified in human tissues display considerable regulatory differences when compared with their CpG poor counterparts. For example, CGI enhancers are: (i) more highly conserved, (ii) more highly enriched in R loops and 3D genomic contacts, (iii) on average provide stronger signal in functional assays, and (iv) are prone to hypermethylation in cancer .
A major unresolved question related to enhancer demethylation is the mechanism of TET protein targeting to regulatory regions. A number of studies have reported biochemical interactions between TET proteins and diverse TFs such as Tex10  or C/EBPα, Klf4, and Tfcp2l1 , and others . Alternatively, differential isoform usage might also contribute to Tet genomic targeting. This has been described in the case of Tet1 where a long isoform, which contains the CxxC domain, is expressed during early embryogenesis and in the developing germline, whereas the short isoform (without CxxC) is expressed in somatic cells . Interestingly, only the long isoform appears to be associated with active demethylation processes, even though both isoforms are able to bind CGIs. As the CxxC domain has high affinity for CGIs, even though in the case of Tet proteins this is not sufficient for CGI binding, it is plausible to think that Tet1 might be associated with active demethylation of promoter CGIs and CGI enhancers whereas Tet2 is targetted to CpG-poor enhancers via diverse TFs. Future work involving technologies such as precise epigenome editing  and single cell genomic and transcriptomic assays should aim to conclusively define how and to what extent 5mC presence at distinct enhancer classes is necessary for interaction with the linked promoter and consequently for the control of gene expression.
In general, enhancer activity is correlated with DNA hypomethylation in vertebrates. Some studies, however, suggest that 5mC and active chromatin marks such as H3K27ac are compatible at enhancers.
DNA hypomethylation is not sufficient to drive enhancer activity as evidenced by embryonic enhancers that remain hypomethylated and yet are inactive in adult mammalian tissues.
A number of studies have shown that TF binding to enhancers is sufficient to promote 5mC turnover that can be mediated by TET-dependent mechanisms of active demethylation.
Developmental 5mC remodelling at enhancers appears to be a conserved feature of vertebrate and perhaps even chordate and to some extent invertebrate development.
The figures were made with Biorender software.
This work was supported by the ARC Discovery Project [grant number DP190103852 (to O.B.)].
A.A and O.B wrote the paper and drafted the figures.
The authors declare that there are no competing interests associated with the manuscript.
cytosine followed by adenine, thymine or cytosine
DNA methyltransferase enzyme
Encyclopedia of DNA elements
embryonic stem cell
Histone H3 lysine 27 acetylation
Histone H3 lysine 4 monomethylation
RE1-silencing transcription factor
Ten-Eleven Translocation methylcytosine dioxygenase
Ubiquitin-like, containing PHD and RING finger domains 1