Imprinted genes provide an attractive paradigm to unravel links between transcription and genome architecture. The parental allele-specific expression of these essential genes — which are clustered in chromosomal domains — is mediated by parental methylation imprints at key regulatory DNA sequences. Recent chromatin conformation capture (3C)-based studies show differential organization of topologically associating domains between the parental chromosomes at imprinted domains, in embryonic stem and differentiated cells. At several imprinted domains, differentially methylated regions show allelic binding of the insulator protein CTCF, and linked focal retention of cohesin, at the non-methylated allele only. This generates differential patterns of chromatin looping between the parental chromosomes, already in the early embryo, and thereby facilitates the allelic gene expression. Recent research evokes also the opposite scenario, in which allelic transcription contributes to the differential genome organization, similarly as reported for imprinted X chromosome inactivation. This may occur through epigenetic effects on CTCF binding, through structural effects of RNA Polymerase II, or through imprinted long non-coding RNAs that have chromatin repressive functions. The emerging picture is that epigenetically-controlled differential genome architecture precedes and facilitates imprinted gene expression during development, and that at some domains, conversely, the mono-allelic gene expression also influences genome architecture.
Introduction
During interphase, the mammalian genome is configurated into compartments and ‘topologically associating domains’ (TADs). Diverse sequence elements within TADs show more frequent association with each other, and less frequent interactions with sequences in flanking domains [1–3]. An important aspect of this level of chromatin organization into TADs is cohesin-mediated loop extrusion, which can transiently bring regulatory sequences in close proximity to each other, and can thereby control gene transcription [4,5]. Boundaries between mammalian TADs show an increased association of nuclear proteins, including that of the zinc-finger domain insulator-protein ‘CCCTC-binding factor’ (CTCF) [2].
Despite tremendous progress during the last years, stimulated by technological advances [6], it remains poorly understood how genome architecture and gene transcription are linked [7]. Various model systems have been used to explore this important question. Amongst these, imprinted genes are of particular interest. These unusual genes show a mono-allelic pattern of expression that depends entirely on the parental origin of the allele [8]. Consequently, expressed and repressed states are present within the same cell, and can thus be directly compared to explore how gene expression states might be linked to chromatin structural aspects [9].
In mammals, ∼200 genes are known to be imprinted and these play important roles in development, homeostasis and behavior [10–12]. The majority of imprinted genes are clustered in domains that span up to several mega-bases of DNA. Each of these chromosomal domains is regulated by an ‘imprinting control region’ (ICR); these are essential CpG islands that are marked by allele-specific, germline-inherited DNA methylation imprints [13]. The allelic DNA methylation at these so-called germline differentially methylated regions (gDMRs) is stably maintained in the embryo. This involves specialized nuclear proteins and possibly also repressive histone methylation [14], and mediates the allele-specific gene expression at imprinted domains [15,16].
How gDMRs bring about the allelic expression of close-by genes during development differs between domains, and for many imprinted genes this process occurs in a lineage-specific manner [15]. At some imprinted domains, the process involves the formation of a secondary ‘somatic DMR’ (sDMR) that has essential gene regulatory functions as well [15,17]. A recurrent theme, observed at many gDMRs and sDMRs, is that the parental allele-specific DNA methylation leads to allelic recruitment of the insulator protein CTCF [9]. Specifically, CTCF recruitment is restricted to the non-methylated parental allele because methylation of CpGs within the CTCF binding motif impairs its binding [18,19]. Since CTCF binding induces focal stalling of the cohesin complex, and thus influences loop extrusion and long-distance chromatin interactions [20,21], its allelic binding potentially engenders differential chromatin architecture between the parental chromosomes. This review presents how CTCF and linked cohesin complexes control imprinted gene expression through their chromatin structural effects. It also discusses non-exclusive opposite scenarios, in which gene transcription influences genome architecture and thereby contributes to the maintenance of genomic imprinting. The emerging picture is that differential chromatin architecture is both a cause and a consequence of imprinted gene expression.
Differential DNA methylation mediates allelic recruitment of CTCF
Studies on different imprinted domains have reported allelic recruitment of CTCF to imprinted DMRs, to the unmethylated allele, and how this controls imprinted gene expression [9,22]. A first example was provided by the growth-regulating Igf2-H19 domain on mouse chromosome 7, which is controlled by an intergenic gDMR that is methylated on the paternal allele. In the mouse, this ∼2-kb gDMR comprises four CTCF binding motifs and, in the analyzed embryonic stem cells (ESCs), mesodermal and endodermal tissues, is bound by CTCF on the unmethylated maternal allele only [23–26] (Figure 1A). The CTCF binding sites are essential for keeping the maternal allele unmethylated in the embryo, and this contributes to the maintenance of the allelic CTCF binding [27].
CTCF controls allelic sub-TAD organization at several imprinted domains.
(A) The H19-ICR controls imprinting at the Igf2-H19 domain. It comprises four CTCF binding sites — labeled m1-m4 as in [23] — that are methylated on the paternal chromosome, and bound by CTCF on the maternal chromosome. The allele-specific CTCF chromatin immuno-precipitation (ChIP) profiles originate from [33]. Filled lollipops, methylated CpG dinucleotides; open lollipops, unmethylated CpGs. (B) The Igf2-H19 domain comprises the maternally expressed lncRNA gene H19 (imprinted in mesoderm/endoderm), the paternally expressed genes Igf2 (imprinted in mesoderm/endoderm) and Ins2 [imprinted in yolk sac, [109,110]], and the lncRNA gene Nctc1, which shows paternally biaised expression in muscle [92]. Endodermal (EE) and mesodermal enhancers (ME) [green ovals, [92,111]] activate Igf2 on the paternal chromosome through long-distance functional interactions (green arrows). On the maternal chromosome, these interactions are prevented by the CTCF-binding onto the H19-ICR, which acts as a chromatin boundary [23]. On this parental chromosome the enhancers activate H19 (orange arrows). Because of these and other parental-specific chromatin interactions, on the maternal chromosome the domain becomes organized into two sub-TADs (red pyramids), whereas on the paternal chromosome it is within a large overarching TAD (blue pyramid) [2,33]. (C) The Dlk1-Dio3 domain comprises the maternally expressed Meg3, Rian and Mirg non-coding RNAs, the paternally expressed Dlk1 and Rtl1 protein-coding genes, and at its 3′-end, the Dio3 protein-coding gene for which conflicting imprinted statuses have been reported. Maternal CTCF binding occurs at the Meg3-DMR, ∼10 kb downstream the locus’ ICR (IG-DMR). On the maternal allele, CTCF binding at the Meg3-DMR allows for the formation of maternal-specific sub-TAD domains, which are important for the silencing of maternal Dlk1 and Rtl1 allele [33,47]. (D) The Peg13-Kcnk9 domain comprises the Kcnk9 and Trappc9 protein-coding genes, expressed from the maternal allele, and the paternally-expressed Peg13 lncRNA. The locus’ ICR overlaps the Peg13 promoter and is bound by CTCF on the paternal allele. The formation of two sub-TADs on the paternal chromosome prevents functional interaction between enhancers (green) and the Kcnk9 promoter [31]. (E) The Grb10-Ddc domain comprises the maternally expressed Grb10 gene and the paternally expressed Ddc gene, and the biallelically expressed Cobl gene. CTCF binds to the ICR and the CBR2.3 sDMR, on the paternal chromosome exclusively. The formation of two sub-TADs on the paternal chromosome prevents the functional interactions between a putative ME (PME) and the Grb10 promoter, as occurring on the maternal chromosome, while favoring interaction with the paternal Ddc allele [34].
(A) The H19-ICR controls imprinting at the Igf2-H19 domain. It comprises four CTCF binding sites — labeled m1-m4 as in [23] — that are methylated on the paternal chromosome, and bound by CTCF on the maternal chromosome. The allele-specific CTCF chromatin immuno-precipitation (ChIP) profiles originate from [33]. Filled lollipops, methylated CpG dinucleotides; open lollipops, unmethylated CpGs. (B) The Igf2-H19 domain comprises the maternally expressed lncRNA gene H19 (imprinted in mesoderm/endoderm), the paternally expressed genes Igf2 (imprinted in mesoderm/endoderm) and Ins2 [imprinted in yolk sac, [109,110]], and the lncRNA gene Nctc1, which shows paternally biaised expression in muscle [92]. Endodermal (EE) and mesodermal enhancers (ME) [green ovals, [92,111]] activate Igf2 on the paternal chromosome through long-distance functional interactions (green arrows). On the maternal chromosome, these interactions are prevented by the CTCF-binding onto the H19-ICR, which acts as a chromatin boundary [23]. On this parental chromosome the enhancers activate H19 (orange arrows). Because of these and other parental-specific chromatin interactions, on the maternal chromosome the domain becomes organized into two sub-TADs (red pyramids), whereas on the paternal chromosome it is within a large overarching TAD (blue pyramid) [2,33]. (C) The Dlk1-Dio3 domain comprises the maternally expressed Meg3, Rian and Mirg non-coding RNAs, the paternally expressed Dlk1 and Rtl1 protein-coding genes, and at its 3′-end, the Dio3 protein-coding gene for which conflicting imprinted statuses have been reported. Maternal CTCF binding occurs at the Meg3-DMR, ∼10 kb downstream the locus’ ICR (IG-DMR). On the maternal allele, CTCF binding at the Meg3-DMR allows for the formation of maternal-specific sub-TAD domains, which are important for the silencing of maternal Dlk1 and Rtl1 allele [33,47]. (D) The Peg13-Kcnk9 domain comprises the Kcnk9 and Trappc9 protein-coding genes, expressed from the maternal allele, and the paternally-expressed Peg13 lncRNA. The locus’ ICR overlaps the Peg13 promoter and is bound by CTCF on the paternal allele. The formation of two sub-TADs on the paternal chromosome prevents functional interaction between enhancers (green) and the Kcnk9 promoter [31]. (E) The Grb10-Ddc domain comprises the maternally expressed Grb10 gene and the paternally expressed Ddc gene, and the biallelically expressed Cobl gene. CTCF binds to the ICR and the CBR2.3 sDMR, on the paternal chromosome exclusively. The formation of two sub-TADs on the paternal chromosome prevents the functional interactions between a putative ME (PME) and the Grb10 promoter, as occurring on the maternal chromosome, while favoring interaction with the paternal Ddc allele [34].
At the Kcnq1 locus, located adjacent to the Igf2-H19 locus on mouse chromosome 7, CTCF binds to the umethylated allele of its intragenic gDMR, in embryonic cells and neonatal tissues [28,29]. Other developmental gene domains that show parental allele-specific CTCF binding to their gDMR, or to a sDMRs, include the Peg13-Kcnk9 locus on mouse chromosome 15 [30,31], the Dlk1-Dio3 domain on mouse chromosome 12 [32,33], the Grb10/Ddc domain on mouse chromosome 11 [34] and the Mest/Klf14 domain on mouse chromosome 6 [22]. In all cases, CTCF, presumably together with locally stalled cohesin complexes, is present on the unmethylated parental allele. Functional studies in murine cell and animal models have shown for different loci, that this allelic CTCF binding is essential for the correct expression of adjacent imprinted genes [23,27,28,31,33–35].
In humans, similarly, allelic binding of CTCF has been reported at specific gDMRs and sDMRs. Genetic mutations that affect the corresponding CTCF binding sites can perturb the allelic expression of imprinted genes, and may thereby give rise to different imprinting-related congenital disorders [35–42].
CTCF-mediated chromatin structural interactions precede imprinted expression
The allelic recruitment of CTCF at imprinted DMRs has been linked to differential 3D chromatin organization between the parental chromosomes [9]. CTCF-mediated proximities between regulatory sequences are particularly interesting to explore in the context of genomic imprinting, which provides the unique opportunity to distinguish opposite epigenetic regulations on the parental chromosomes, within the same cells. Furthermore, at different imprinted domains, the mono-allelic expression arises during embryonic or foetal development, and is pronounced in some tissues while not detected in others [11,16,43]. Comparative analyses across distinct developmental stages can thus provide insights into whether the observed differences in 3D chromatin architecture precede, or arise from, the manifestation of imprinted gene expression. For several imprinted domains (see below), such comparative studies have been performed.
The gDMR of the Igf2-H19 domain is located upstream of the H19 gene, comprises four CTCF binding sites and is methylated on the paternal allele [44]. In all cell types and tissues studied, CTCF binds to the unmethylated maternal allele of this intergenic gDMR [23,33,35,45]. This brings about allelic chromatin looping events, thus creating close proximities with multiple other (biallelic) CTCF-binding regions elsewhere in the domain [33], within the context of a maternal chromosome specific sub-TAD. As a consequence, this gDMR acts as an insulator on the maternal chromosome that separates the Igf2 gene from mesodermal and endodermal enhancers on the other side of the domain (Figure 1B). Importantly, the differential chromatin configuration is already present in ESCs, which are representative of the preimplantation embryo [33]. The imprinted expression of Igf2 and H19, however, arises much later in development, in mesodermal and endodermal tissues, and requires the allelic association of CTCF to the gDMR [27,35,46]. These observations indicate that allelic CTCF binding and the resulting differential chromatin architecture are established before the onset of the imprinted expression, and are maintained subsequently, to control imprinted expression at later developmental stages.
In ESCs, the Dlk1-Dio3 domain shows CTCF-driven preferential interaction on the maternal chromosome between the Meg3 promoter — an sDMR bound by CTCF on the maternal allele — and CTCF binding sites close to the developmental gene Dlk1 [33]. Likewise, a second maternal-specific boundary has been observed downstream of the maternally-expressed genes, which coincides with another CTCF binding site that is yet biallelic [47] (Figure 1C). In ESCs, however, Dlk1 is not yet imprinted and shows low levels of biallelic Dlk1 expression only [48]. The allelic CTCF binding and associated structural features are maintained during ESC-differentiation into neuronal progenitor cells (NPCs), and Dlk1 becomes imprinted and expressed from the paternal chromosome predominantly [33,48]. CRISPR-mediated ablation of the maternal CTCF binding leads to a loss of the maternal-specific structural interactions, and during ESC differentiation into NPCs, Dlk1 no longer became imprinted [33]. A recent study, conversely, showed that CRISPR-dCas9-TET mediated demethylation of the paternal Meg3 sDMR gives rise to biallelic CTCF binding and, hence, a maternal-specific 3D chromatin structure on both the parental chromosomes [47]. The combined studies indicate that also at the imprinted Dlk1-Dio3 locus, CTCF-mediated chromatin structure precedes imprinted gene expression, and becomes functionally important only in differentiated cells, where it facilitates the allelic expression of nearby genes.
At the murine Kcnk9-Peg13 imprinted domain, similarly, allele-specific and biallelic CTCF-driven structural interactions are observed in ESCs already (Figure 1D), in which this locus is still largely silent and without pronounced allelic expression asymmetries [31,49]. Also at this locus, the structural features are largely maintained during neural differentiation [and are observed in brain as well, [30,31]], during which the expression of the domain's genes is induced and becomes imprinted [31,49]. Targeting studies show that the allelic CTCF binding to the domain's intragenic gDMR is essential for the acquisition of allelic expression of the Kcnk9 and Peg13 genes in neuronal tissue [31]. These observations show that also at this domain, CTCF-mediated differential chromatin structure precedes and facilitates the imprinted gene expression.
At the murine Grb10-Ddc locus, the gDMR is located within the 5′ portion of Grb10 gene and is methylated on the maternal allele [50]. This essential ICR is bound by CTCF on the unmethylated paternal allele, and this may control the expression of genes within the domain through differential chromatin loop extrusion [34,51,52]. Allelic CTCF binding occurs also on the unmethylated paternal allele of a sDMR located within an intron of the Grb10 gene, and brings about differential chromatin looping between the two parental chromosomes [34]. Through its allelic CTCF binding, this intragenic sDMR acts as a chromatin insulator that separates the Grb10 promoter from a heart-specific enhancer on the paternal chromosome (Figure 1E) [34]. Deletion of the CTCF-bound paternal sDMR phenocopies what happens on the maternal allele, and leads to biallelic Grb10 expression, together with the loss of Ddc that would otherwise be expressed from the paternal allele only [34]. The combined studies on this locus indicate that the allelic CTCF binding to different DMRs and the resulting chromatin looping and boundary functions most likely pre-empt the imprinted expression in the heart.
At the Kcnq1 domain, and its human counterpart, the gDMR shows maternal methylation and contains multiple CTCF binding sites. CTCF binding was reported to be either paternal allele-specific [28,29], or with a higher enrichment on the paternal-allele [32,53], and this may depend on the exact position of the gDMR analyzed or the cell type studied. Together with biallelically-bound CTCF sites, allele-specific looping events within the locus prevent Kcnq1 imprinted expression in the early embryo [28,54]. In contrast with the H19 gDMR that constitutes a strong boundary between sub-TADs [33], the KCNQ1 gDMR behaves as a weak sub-TAD boundary only [55]. Whether CTCF-mediated differential chromatin organization precedes imprinted gene expression at this domain is unknown.
In conclusion, the different studies so far indicate that CTCF-mediated chromatin structural interactions and sub-TAD organization do not merely reflect the allelic expression of imprinted genes. Rather, at the above loci this process precedes, and facilitates, the acquisition of allelic gene expression during development. Whether this is a general rule is unknown, since other CTCF-bound imprinted domains have not been explored in enough detail to draw conclusions.
Role of imprinted lncRNAs in chromatin architecture
Diverse studies in different model species have shown that long non-coding RNAs (lncRNAs, defined as being >500 bases in length) can influence the expression of close-by genes [56]. How the observed effects arise is unclear for many lncRNAs, with often the question as to whether lncRNA transcription or the lncRNA itself is involved [56,57].
The lncRNA Xist, which mediates X-chromosome inactivation (XCI) in mammals, undergoes imprinted expression during early mouse development, with expression from the paternal allele only. This imprinted expression persists in the extra-embryonic lineages. While both the imprinted XCI and the subsequent random XCI in the epiblast/embryo proper exhibit unique specificities [reviewed in [58]], they share fundamental principles. Once expressed, Xist lncRNA accumulates in cis, triggering extensive remodeling of the epigenetic landscape of the future inactive X-chromosome. This notably involves the recruitment of Polycomb repressive complexes (PRCs), leading to an enrichment of repressive H3 lysine-27 trimethylation (H3K27me3, by the PRC2 complex) and H2A lysine 119 mono-ubiquitination (H2AK119u1, by the PRC1 complex) across the entire inactive X-chromosome [reviewed in [59]]. The process also involves the recruitment of histone deacetylase (HDAC complexes), via a protein called Spen that binds a repeat element located at the 5’-end of Xist lncRNA [60,61]. Spen binding to Xist, and the subsequent HDAC recruitment, are crucial for the cis-silencing of X-linked genes, both in extra-embryonic lineages with imprinted Xist expression [62], and in the embryo proper with Xist expression from either parental chromosome [60–63]. The recruitment of PRC1 and PRC2 also contributes to cis-silencing, notably in imprinted XCI [64,65]. In addition, within the context of random XCI, the inactive X-chromosome coated by Xist lncRNA undergoes a global loss of TADs, with the formation instead of two large mega-domains hinged by the Dxz4 macrosatellite region [66–68]. Whether these two megadomains also form in extra-embryonic tissues with imprinted Xist expression remains to be determined. Yet, in conclusion, Xist lncRNA accumulates in-cis, modulates the histone modifications landscape, rewires the 3D-genome organization at least in random XCI, and represses most genes on the X chromosome.
Intriguingly, almost all imprinted domains on autosomes express at least one lncRNA, from one of the two parental chromosomes only [69,70]. Studies on two imprinted lncRNAs have been particularly instructive. These are the lncRNA Kcnq1ot1 at the Kcnq1 domain on mouse chromosome 7 and the lncRNA Airn at the Igf2r domain on mouse chromosome 17 [71]. Below, we describe how these two nuclear lncRNAs induce allelic gene repression and how their expression influences chromatin architecture at their respective domains, which evoke similarities with the prototypical X-chromosome inactivation mediated by Xist lncRNA.
At the Kcnq1 locus, the expression of Kcnq1ot1 lncRNA is driven by a maternally methylated gDMR, and this lncRNA is transcribed from the paternal chromosome only [72]. It is retained in cis onto the locus with a relatively high stability [73]. In the extra-embryonic lineages, this >90-kb lncRNA exerts long-range chromatin repressive effects on multiple genes. This process involves interaction with components of PRCs, which are initially recruited to the chromatin through specific target sequences in the locus [73–79]. Whether the interaction of this lncRNA with Polycomb group (PcG) proteins is RNA sequence-specific, or promiscuous [as is the case for many RNAs, [80]], remains to be determined, with, so far, no in vivo insights into which part of the RNA is involved.
The histone lysine methyltransferase G9A (also called EHMT2) is recruited by Kcnq1ot1 RNA as well, and this mediates repressive H3K9me2 along the domain in the extra-embryonic lineages [78]. In addition, Kcnq1ot1 lncRNA has been reported to interact with the nuclear matrix RNA-binding protein hnRNP-K, and this may enhance the PRC-mediated chromatin repression at the locus via tethering of PRC1 complexes [74,81]. The emerging picture is that Kcnq1ot1 lncRNA interacts with, and enhances the local recruitment of, different chromatin repressive complexes through RNA-protein interactions. This process likely initiates at specific promoters that are recognized by PRC complexes, which would explain the local retention of the lncRNA, and has been suggested to occur within phase-separated RNA-protein aggregates [71,74]. Within such a scenario, Kcnq1ot1 lncRNA levels would not only influence the extent of chromatin repression, but would influence chromatin architecture as well. Indeed, fluorescence in situ hybridization studies have shown that Kcnq1ot1 locally associates with chromatin bound by PcG proteins (PRC complexes), and that this forms a repressive nuclear compartment that is contracted in the three-dimensional space [73] (Figure 2A).
Effects of imprinted nuclear lncRNAs on 3D chromatin architecture.
(A) Imprinted lncRNAs can mediate chromatin repression in cis, through the recruitment of chromatin modifiers such as PRC1, PRC2 and G9A (orange ovals) that bring about repressive histone modifications (violet pentagons). This occurs together with the formation of local chromatin 3D architecture (blue parabolas), in a process that may involves liquid-liquid phase separation. The provided example depicts reported in-cis effects in the trophoblast of the lncRNA Kcnq1ot1 (green wave and green shapes) at the Kcnq1 imprinted domain, on the paternal chromosome [73–76,78]. Filled lollipops, methylated CpG dinucleotides; open lollipops, unmethylated CpGs (at the domain's gDMR). (B) The cis-retention of imprinted lncRNAs (green) can affect the degree of compaction on the parental chromosome that is ‘coated’ by the lncRNA (green waves) [71]. This is illustrated for the Igf2r imprinted domain (i.e. the Airn lncRNA), at which long-range contacts are enriched on the paternal chromosome (bottom, in blue) in trophoblast cells, as compared with the maternal chromosome (top, in red), which shows more frequent short-range contacts [88]. These differential contacts contribute to the differential TAD structuration (pyramids) between the parental chromosomes.
(A) Imprinted lncRNAs can mediate chromatin repression in cis, through the recruitment of chromatin modifiers such as PRC1, PRC2 and G9A (orange ovals) that bring about repressive histone modifications (violet pentagons). This occurs together with the formation of local chromatin 3D architecture (blue parabolas), in a process that may involves liquid-liquid phase separation. The provided example depicts reported in-cis effects in the trophoblast of the lncRNA Kcnq1ot1 (green wave and green shapes) at the Kcnq1 imprinted domain, on the paternal chromosome [73–76,78]. Filled lollipops, methylated CpG dinucleotides; open lollipops, unmethylated CpGs (at the domain's gDMR). (B) The cis-retention of imprinted lncRNAs (green) can affect the degree of compaction on the parental chromosome that is ‘coated’ by the lncRNA (green waves) [71]. This is illustrated for the Igf2r imprinted domain (i.e. the Airn lncRNA), at which long-range contacts are enriched on the paternal chromosome (bottom, in blue) in trophoblast cells, as compared with the maternal chromosome (top, in red), which shows more frequent short-range contacts [88]. These differential contacts contribute to the differential TAD structuration (pyramids) between the parental chromosomes.
In somatic tissues, Kcnq1ot1 silences only a few genes located in the central portion of the domain [72]. In mouse embryos that express a truncated form of the lncRNA, altered chromatin structural interactions occur on the paternal chromosome in heart and brain, indicative of chromatin structural changes [82]. Interestingly, the repressive chromatin domain formed by Kcnq1ot1 lncRNA tends to localize around the nucleolus or towards the nuclear periphery, and this seems to require the first 890-bp of the RNA [75,83]. This ‘5′-silencing domain’ comprises several conserved repeat sequences and is essential for the cis-retention of the RNA onto the locus [84]. The emerging picture is that Kcnq1ot1 plays roles in the spatial organization of the imprinted locus in somatic tissues as well, and like many other lncRNAs, contributes to local chromatin structure and its higher-order organization [reviewed in [85]].
At the Igf2r domain, Airn lncRNA induces spreading of PRC- and G9A-mediated repressive histone methylation in the trophoblast lineage [71,74,86,87]. In trophoblast stem cells (TSCs) and placenta, the Airn-mediated chromatin repression extends across a 15-Mb genomic region [16,74], which makes this the largest imprinted domain in the mouse. In TSCs, the focal accumulation of Airn lncRNA is determined by its level of expression, with increased Airn expression giving enhanced spreading of repressive chromatin, across a larger chromosomal region [74]. A recent study on TSCs explored whether Airn lncRNA contributes to the structural organization of the paternal chromosome (from which it is expressed) [88]. The authors inserted a polyadenylation signal several kilobases downstream of the Airn transcription start site, such as to strongly reduce Airn levels while keeping the promoter active. Hi-C experiments on wild-type TSCs showed strong long-range interactions along the domain on the paternal chromosome (that expresses Airn), across a region that overlapped the imprinted domain [88] (Figure 2B). In TSCs expressing the truncated Airn lncRNA, these paternal chromosome-specific proximities between different regions were largely lost, showing that Airn lncRNA controls chromatin architecture. Different other chromatin assays showed that Airn lncRNA, PRC complexes and PRC-mediated histone modifications are located close to each other in space, forming a nuclear volume within which gene repression occurs [88]. Which precise part(s) of Airn mediate its effects on chromatin architecture is unknown. Like for Kcnq1ot1, these RNA sequences might overlap with those that bring about the gene repression in cis.
Whether other imprinted lncRNAs control chromatin architecture at their imprinted domains is unclear. Chromatin structural effects presumably depend on the lncRNA's intra-cellular localization — many lncRNAs are cytoplasmic rather than nuclear [56] — and on the sequence and the secondary-tertiary structure of the lncRNA [70]. For instance, the Peg13 gene at the Kcnk9-Peg13 imprinted domain expresses a lncRNA from the unmethylated paternal allele only. This lncRNA is mostly cytoplasmic and oligonucleotide-mediated knock-down studies did not provide evidence for a cis-regulatory role in gene expression [31]. The lncRNA Meg3 at the Dlk1-Dio3 domain, on the other hand, is strictly nuclear and its maternal expression is required to repress in cis the developmental gene Dlk1 [47,48]. Since this lncRNA also interacts with PcG proteins [89,90] and stably accumulates in cis at the locus [91], similarly as Airn and Kcnq1ot1, it could potentially influence higher-order chromatin architecture. Finally, at the Igf2-H19 domain, a distally located lncRNA gene called Nctc1 shows a paternally biased expression (Figure 1B). A muscle-specific enhancer within Nctc1 is important for its expression and also shows preferential interaction with the (active) Igf2 promoter on the paternal chromosome, and this may explain the allelic Nctc1 expression [92,93]. Whether the lncRNA itself plays a role in differential chromatin interactions is unclear.
Interestingly, several imprinted lncRNA also have trans-regulatory roles [reviewed in [70]], but it is not known whether these effects on loci elsewhere in the genome involve changes in chromatin architecture.
Does gene transcription influence chromatin architecture?
For many imprinted protein-coding genes, their allelic expression is developmentally controlled. In published Hi-C studies, interaction patterns measured across imprinted domains are broadly comparable between different cell types [2,31,33,34,74]. However, specific interactions appear during differentiation, as was recently highlighted for the Kcnk9-Peg13 domain [31]. Potentially, such novel structural interactions could be a consequence of the developmental gene activation, particularly when cell-type specific transcription factors (TFs) would influence chromatin looping [7]. Recent studies have shown that the RNA polymerase II itself contributes to the 3D organization of the genome [discussed in [94]]. In one study that made use of a degron technology to rapidly degrade the protein in a timed manner, chromatin-bound RNA polymerase II was shown to prevent bi-directional loop extrusion, similarly as CTCF [95]. In a similar other study, paused RNA PolII complexes were found to influence the strength of chromatin loops bringing enhancers and promoters close to each other [96]. This shows that RNA PolII contributes in part to the formation of structural proximities required for gene transcription. It would be interesting to explore at imprinted domains, to what extent the allele-specific, developmentally controlled, recruitment of RNA PolII to promoters and enhancers, and its potential pausing, influences the 3D chromatin architecture on the maternal versus the paternal chromosome (Figure 3A). Imprinted genes expressed in specific lineages are often flanked by other imprinted genes that have broader expression patterns. For those genes as well, the recruitment of RNA PolII (and its pausing) may contribute to setting-up structural proximities. In part, this may relate to the relative similar interaction patterns between stem and differentiated cells or tissues (see above).
Models for how allelic transcription and RNA Polymerase II may influence chromatin architecture.
(A) RNA Polymerase II (Pol.II) complexes can influence cohesin-mediated loop extrusion [95,96]. Since many imprinted genes are transcribed strictly from one parental allele only, at imprinted domains this process may confer allele-specific looping events. (B) Transcription factors (green) that associate with DMRs on the unmethylated allele protect against aberrant de novo DNA methylation, thereby maintaining the CTCF binding allelism and allelic 3D-chromatin architecture. This function could be particularly important in early embryonic cells. (C) The elongating form of RNA Polymerase II facilitates H3K36me3 deposition (green pentagones) within the gene body, and, subsequently, the recruitment of the de novo DNA methyltransferases [100,101]. This process might help to maintain methylation at intragenic DMRs, and could thus contributes to the allelic CTCF association at imprinted domains. Filled lollipops, methylated CpG dinucleotides; open lollipops, unmethylated CpGs.
(A) RNA Polymerase II (Pol.II) complexes can influence cohesin-mediated loop extrusion [95,96]. Since many imprinted genes are transcribed strictly from one parental allele only, at imprinted domains this process may confer allele-specific looping events. (B) Transcription factors (green) that associate with DMRs on the unmethylated allele protect against aberrant de novo DNA methylation, thereby maintaining the CTCF binding allelism and allelic 3D-chromatin architecture. This function could be particularly important in early embryonic cells. (C) The elongating form of RNA Polymerase II facilitates H3K36me3 deposition (green pentagones) within the gene body, and, subsequently, the recruitment of the de novo DNA methyltransferases [100,101]. This process might help to maintain methylation at intragenic DMRs, and could thus contributes to the allelic CTCF association at imprinted domains. Filled lollipops, methylated CpG dinucleotides; open lollipops, unmethylated CpGs.
In addition, at imprinted DMRs, specific other TFs that associate with the unmethylated allele may protect against aberrant de novo DNA methylation, thereby maintaining the allelic nature of CTCF binding and its associated differential chromatin looping. Such a role would be particularly important in pluripotent cells, in which methylation profiles are still labile and not yet fixed. One example is the pluripotency-associated zing-finger protein ZFP42 (also called REX1). This TF associates with the gDMRs of the Peg3 and Gnas domains, where its depletion leads to aberrant gains of de novo DNA methylation on the normally unmethylated parental allele [97]. At the ICR of the murine Igf2-H19 domain, OCT4/SOX2 binding contributes to maintaining the hypo-methylated state of its CTCF binding sites on the maternal allele [98]. Concordantly, in the human imprinting disorder Beckwith–Wiedemann syndrome, aberrant gains of methylation at this gDMR are associated with mutations at OCT4/SOX2 binding sites in some patients [15]. Other TFs could similarly be important for the maintenance of the allelic methylation status of DMRs [99], and hence the allelic CTCF binding, thereby linking transcriptional regulation to chromatin architecture (Figure 3B).
Another model is that transcriptional elongation plays a role in maintaining the allelic CTCF binding. The elongating form of the RNA PolII facilitates the recruitment of chromatin modifiers responsible for depositing H3K36me3, a covalent histone mark that actively engages in the recruitment of the de novo DNA methyltransferases DNMT3A/DNMT3B through their PWWP histone-modification reading domain [100,101] (Figure 3C). Through this mechanism, the bodies of actively transcribed genes show a relatively high level of DNA methylation [101]. This mechanism likely operates at the Grb10 imprinted domain, and could potentially influence not only the imprint maintenance, but also the CTCF allelism. Particularly, the elongating RNA PolII, through H3K36me3 deposition and subsequent recruitment of DNMT3A/DNMT3B, might contribute to the stabilization of DNA methylation levels within the Grb10 gene body — and thus at its important sDMR. The sustained methylation on the maternal allele would then impede CTCF binding, leading to the observed preference for CTCF binding on the paternal allele [34] (see also Figure 1E).
Somatic maintenance of parental chromosome-specific chromatin architecture
Although 3D chromatin architecture is clearly important for gene expression, as such it is not heritable from one somatic cell generation to the next. Chromatin loops are not static entities; they occur in a transient manner and, at a given time point, specific loops exist in a minority of cells only [102,103] [reviewed in [5]]. In addition, chromatin looping patterns occurring during interphase, are to a large part lost subsequently, and are reconstituted following mitosis only [104]. The potential to form specific loops might persist in part in mitotic cells, possibly through bookmarking TFs [reviewed in [105]]. Similarly, the persistence of histone modification patterns during M-phase may contribute to the quick reassembly, and possibly the maintenance, of (phase-segregated) structural domains [106,107]. But on the whole, the 3D chromatin architecture is not maintained during mitosis and is re-established during the G1-phase.
DNA methylation patterns, on the other hand, are stably maintained throughout the cell cycle. This includes the germline and somatically acquired allele-specific DNA methylation, which, as outlined before, can create pronounced structural differences between the parental chromosomes through the allelic binding of CTCF. These differential binding patterns, in turn, bring about the differences in chromatin architecture and gene expression between the parental chromosomes. In this scenario, the allelic DNA methylation constitutes the heritable information that is maintained throughout the cell cycle, and, at interphase, instructs the differential gene expression and 3D chromatin architecture at imprinted gene domains. For imprinted gene domains, the reiteration of allelic 3D chromatin architecture in daughter cells thus primarily depends on the faithful transmission of the differential DNA methylation at DMRs (Figure 4).
Epigenetically-determined somatic maintenance of allele-specific CTCF binding.
The genomic 3D chromatin architecture observed at interphase cells is largely lost at M-phase [104]. In contrast, patterns of DNA methylation persist throughout the cell cycle, including at the DMRs of imprinted domains. The faithful transmission of the allelic DNA methylation at DMRs through M-phase instructs the restauration of the interphase-specific allelic CTCF binding in the daughter cells and its associated (sub-)TAD organization. It provides the epigenetic memory for the developmental maintenance of allelic CTCF binding and its resulting (sub-)TAD organization. Filled lollipops, methylated CpG dinucleotides; open lollipops, unmethylated CpGs.
The genomic 3D chromatin architecture observed at interphase cells is largely lost at M-phase [104]. In contrast, patterns of DNA methylation persist throughout the cell cycle, including at the DMRs of imprinted domains. The faithful transmission of the allelic DNA methylation at DMRs through M-phase instructs the restauration of the interphase-specific allelic CTCF binding in the daughter cells and its associated (sub-)TAD organization. It provides the epigenetic memory for the developmental maintenance of allelic CTCF binding and its resulting (sub-)TAD organization. Filled lollipops, methylated CpG dinucleotides; open lollipops, unmethylated CpGs.
Perspectives
The exploration of how 3D chromatin architecture and gene expression influence each-other is important for understanding development and disease. Imprinted genes have provided a powerful paradigm, since expressed and repressed states of genes can be directly compared in the same cells.
At several imprinted domains, the epigenetically-controlled binding of CTCF, and possibly other TFs, conveys differential chromatin architecture, and this facilitates appropriate gene expression during development. Conversely, allelic gene expression patterns influence genome architecture, for instance through the action of imprinted lncRNAs.
Imprinted gene domains may help to explore how RNA PolII complexes and allelic histone modifications patterns influence differential 3D chromatin architecture. More broadly, they may provide an attractive paradigm to explore the links between 3D chromatin architecture and other nuclear processes, for instance to test a proposed link between TAD structuration and the timing of DNA replication during S phase [108].
Competing Interests
The authors declare that there are no competing interests associated with the manuscript.
Author Contribution
R.F. and B.M. drafted the manuscript; B.M. made the figures; R.F., B.M. and Y.I. edited the manuscript and the figures.
Acknowledgements
We thank members of our teams for helpful discussion, and acknowledge grant funding from the Agence National de Recherche (ANR) through project ANR-22-CE12-0016-03 (IMP-DOMAIN to R.F.), and the Fondation pour la Recherche Médicale (FRM, grant EQU202103012763 to R.F.). Y.I. acknowledges post-doctoral Fellowship funding from the Japan Society for the Promotion of Science (JSPS).
Abbreviations
- ChIP
chromatin immunoprecipitation
- CTCF
CCCTC-binding factor
- CpG
cytosine-phospate-guanine
- DMR
differentially methylated region
- ESC
embryonic stem cell
- gDMR
germline DMR
- ICR
imprinting control region
- NPC
neuronal progenitor cell
- PcG
Polycomb group
- PRC
Polycomb Repressive Complex
- sDMR
somatic DMR
- TAD
topologically associating domain
- TF
transcription factor
- TSC
trophoblast stem cell