The replication of DNA in chromosomes is initiated at sequences called origins at which two replisome machines are assembled at replication forks that move in opposite directions. Interestingly, in vivo studies observe that the two replication forks remain fastened together, often referred to as a replication factory. Replication factories containing two replisomes are well documented in cellular studies of bacteria (Escherichia coli and Bacillus subtilis) and the eukaryote, Saccharomyces cerevisiae. This basic twin replisome factory architecture may also be preserved in higher eukaryotes. Despite many years of documenting the existence of replication factories, the molecular details of how the two replisome machines are tethered together has been completely unknown in any organism. Recent structural studies shed new light on the architecture of a eukaryote replisome factory, which brings with it a new twist on how a replication factory may function.
Introduction
DNA is duplicated by a multi-protein ‘replisome' machine in all three domains of life: bacteria, archaea and eukaryotes [1–4]. The single circular chromosome of Escherichia coli and Bacillus subtilis, have one origin from which two replication forks emanate in opposite directions. Several microscopy studies performed in the past 20 years demonstrate that the two replication forks produced by a single bacterial origin remain proximal to one another, indicating the presence of a twin replisome ‘replication factory' that stays stationary during replication [5–11]. Similar findings were observed in Caulobacter crescentus [8]. However, one study indicated that the replication forks do not stay together and shed doubt upon the conclusion of replication factories [12]. Recent studies have conclusively shown that the two replisomes formed at a single origin of replication remain together in both Escherichia coli and Bacillus subtilis, providing overwhelming support for the replication factory concept [9,13,14].
Eukaryotic chromosomes contain many origins each of which forms bidirectional replication forks like E. coli [1–4,15]. Thus, one may ask whether they too are packaged in twin replication factories as in bacteria. In eukaryotes, it is well known that replication proteins concentrate in localized regions of the nucleus during S phase, referred to as replication foci (Figure 1A) [16,17]. Nuclear replication foci are visualized by probing a replication protein, either by antibodies to a known replication factor or use of a GFP fusion-replication factor. The Tanaka laboratory has demonstrated that in budding yeast most of these replication foci contain two replication forks that pull flanking DNA inward, indicating they are replication factories containing two linked replisomes (Figure 1B) [17,18]. Additional studies also demonstrate that sister replication forks remain together during S phase [18–21]. These observations indicate that replication occurs within factories of dimeric replisomes. Replication in higher eukaryotes is performed in larger replication foci that contain 10–100 replication forks, and recent super-resolution microscopy demonstrates that these large foci are actually composed of numerous smaller foci that likely correspond to two replication forks produced by single origins [22]. Hence, there may exist higher-order structures that contain multiple twin replisome replication factories. Regardless, the molecular structure of a replication factory unit containing two replication forks is unknown.
Nuclear replication factories in S-phase.
(A) Appearance of replication foci in the nucleus of eukaryotic cells is observed during S phase using fluorescent PCNA. Adapted from Figure. 1 of [22]. (B) The implied twin replication factory in nuclear foci from studies in yeast. Connections between the two replisomes are unknown, and thus are indicated by the grey shaded circle. See text for details.
(A) Appearance of replication foci in the nucleus of eukaryotic cells is observed during S phase using fluorescent PCNA. Adapted from Figure. 1 of [22]. (B) The implied twin replication factory in nuclear foci from studies in yeast. Connections between the two replisomes are unknown, and thus are indicated by the grey shaded circle. See text for details.
Numerous studies have led to significant biochemical and structural knowledge of how the proteins of individual replisomes are organized in both bacteria and eukaryotes [1–4]. However, how two replisomes are tethered together is not known in any system, bacteria or eukaryote. This review summarizes recent biochemical and structural studies that provide a molecular explanation for how twin replisomes are tethered together in a eukaryotic replication factory.
CMG helicase is the heart of the eukaryotic replisome
All cellular replisomes contain a minimum of five core components that interdigitate their actions to simultaneously duplicate both strands of parental DNA [3,4]. These five core components are: (1) Helicase, that separates the two strands of DNA, (2) DNA polymerases that convert the separated strands into two new daughter duplexes, (3) Primase, that makes an RNA primer to initiate chain elongation by the DNA polymerases, (4) Circular sliding clamps that bind DNA Pols and encircle DNA to tether the polymerase to DNA for highly processive synthesis, and (5) A clamp loader pentamer that opens and closes sliding clamps around primed sites.
The eukaryotic replicative helicase is a 11-subunit complex referred to as CMG, an acronym of its three sub-components: Cdc45, Mcm2–7 hexamer, and GINS tetramer (Psf1, Psf2, Psf3, Sld5) [23,24]. Eukaryotes utilize two different DNA polymerases (Pol), Pol ε for the leading strand and Pol δ for the lagging strand [1,2]. The eukaryotic Pol α-primase is a 4-subunit complex that contains the RNA primase and DNA Pol1, forming a hybrid RNA–DNA primer. The sliding clamp is the PCNA homotrimer ring, and the clamp loader is the RFC pentamer. The eukaryotic replisome contains several additional factors that organize and regulate the replisome for accurate replication of huge chromosomes (reviewed in [1,2]).
Within CMG, the six Mcm2–7 subunits are homologous to one another and form a ring that encircles the leading strand ssDNA [25]. The six Mcm2–7 subunits are the only subunits of CMG that bind/hydrolyze ATP and thus constitute the helicase motor. The helicase functions by encircling one strand and excluding the other strand, acting as a wedge as it translocates along ssDNA in an ATP driven reaction (see Figure 2) [26–30]. Each Mcm subunit is bilobed, having an N-terminal and C-terminal domain. In the hexamer, the N-terminal domains are adjacent to one another, which we refer to as the N-tier. Similarly, the C-terminal domains in the hexamer form a C-tier. Thus CMG has a polarity, with an N-tier face and a C-tier face (Figure 2A) [31–36]. The Mcm2–7 hexamer (without bound Cdc45 and GINS) is not a fully closed ring; it has a gap between Mcm2 and Mcm5 and has little or no helicase activity on its own [37]. The accessory factors Cdc45 and GINS lock the Mcm2–7 motor in the active closed-ring configuration (Figure 2) [32].
CMG helicase and replisome structure.
(A) At the left is a top down view of CMG in a surface rendering (left). Each Mcm subunit is a different color, Cdc45 is red and the GINS tetramer is light blue. During helicase action, DNA is accommodated in the center of the Mcm2–7 ring. At the right is a side view of CMG, showing the N-tier and C-tier rings of the Mcm2–7 heterohexamer. Adapted from Figure 1 of [36]. (B) Replisome structure in which the leading strand Pol ε-PCNA is on the C-tier of CMG and the lagging strand Pol α-primase-Ctf4 is on the N-tier of CMG. CMG encircles the leading strand DNA and travels 3′–5′ in both diagrams. Left: CMG travels C-tier first on DNA, placing Pol ε above CMG. Right: CMG tracks N-first on DNA, placing Pol ε below CMG. The Pol ε shown here is lacking a mobile domain and the reader is referred to a recent report in which the full Pol ε structure has been determined [41]. Note the asymmetric architecture of the replisome.
(A) At the left is a top down view of CMG in a surface rendering (left). Each Mcm subunit is a different color, Cdc45 is red and the GINS tetramer is light blue. During helicase action, DNA is accommodated in the center of the Mcm2–7 ring. At the right is a side view of CMG, showing the N-tier and C-tier rings of the Mcm2–7 heterohexamer. Adapted from Figure 1 of [36]. (B) Replisome structure in which the leading strand Pol ε-PCNA is on the C-tier of CMG and the lagging strand Pol α-primase-Ctf4 is on the N-tier of CMG. CMG encircles the leading strand DNA and travels 3′–5′ in both diagrams. Left: CMG travels C-tier first on DNA, placing Pol ε above CMG. Right: CMG tracks N-first on DNA, placing Pol ε below CMG. The Pol ε shown here is lacking a mobile domain and the reader is referred to a recent report in which the full Pol ε structure has been determined [41]. Note the asymmetric architecture of the replisome.
The CMG helicase connects to many other replisome factors and can be thought of as the organizing center of the eukaryotic replisome [35,38–41]. We focus in this review on the core replisome components and one scaffolding factor, Ctf4, as explained below. The leading stand Pol ε is demonstrated to form a stable complex with CMG [33,35,38,42]. Cryo-EM structural studies show that Pol ε binds to the C-tier face of CMG, mainly through contact to the GINS/Cdc45, but also to parts of the Mcm2–7 ring [35,43]. CMG also binds to a homotrimeric scaffolding protein called Ctf4 (AND1 in higher eukaryotes) [35,39]. Ctf4 binds to the N-tier of CMG, partially mediated by the GINS complex [35,39]. The lagging strand Pol α-primase also binds Ctf4 [40], and Pol α-primase forms a complex with CMG in the presence of Ctf4 [35,39]. These interactions have enabled EM visualization of a super-complex that contains CMG, and full-length recombinant leading strand Pol ε complex, Ctf4 and the lagging strand Pol α-primase complex [35]. Indeed, in vitro reconstitution of a functional replisome from pure proteins that synthesizes both leading and lagging strands has been accomplished in several labs [44–49].
Cross-linking mass spectrometry and cryo-EM have shown that the leading strand Pol ε binds to the C-tier face of CMG, and that Ctf4–Pol α-primase binds to the N-tier face of CMG (Figure 2B) [35]. Therefore, the CMG is bracketed on either face by leading and lagging strand Pols. Since the DNA passes through CMG perpendicular to these faces, it implies that one Pol rides above the unwinding point and the other Pol follows below CMG [35]. In contrast, textbook drawings of replisomes indicate that the helicase acts at the head of the replication fork, followed by both leading and lagging strand polymerases and the Pol α-primase [50–56]. Considering the unexpected architecture of the eukaryotic replisome, one may question whether this replisome organization generalizes to bacteria and archaea? Structural studies of bacterial and archaeal cellular replisomes are hindered by the loose interactions between replisome components, and therefore the structure of a bacterial or archaeal cellular replisome remains unknown. However, there is a recent report on the structure of the bacteriophage T4 replisome which reveals that one Pol is above and one Pol is below the helicase, similar in outline to the eukaryotic CMG/Pol ε/Ctf4/Pol α-primase [57]. Therefore, individual replisomes are profoundly asymmetric in their structure and function.
CMG tracks on DNA N-tier first at a replication fork
CMG encircles the leading strand ssDNA and translocates along it 3′–5′ using the energy of ATP hydrolysis [23,24]. Given that Pol ε binds the C-tier face of CMG, one might expect CMG to translocate N-tier first along ssDNA, thereby enabling Pol ε-PCNA to replicate the leading strand as it exits the inner channel of CMG (see Figure 2B, right). Furthermore, as dsDNA is split by CMG the excluded/displaced lagging strand would be near the Pol α-primase on the N-tier of CMG for priming. On the basis of these replisome-DNA geometric considerations it was proposed that CMG may track N-first at a replication fork [58].
In contrast with a N-first tracking direction by CMG, CMG has long been thought to track C-tier first along DNA (reviewed in [1]). A main reason for assuming a C-first orientation of CMG tracking derives from the fact that two CMGs are assembled at an origin such that they are oriented head-on, N-tier to N-tier, around duplex DNA [1]. Thus, the two CMGs, oriented N-to-N would appear to block one another from traveling N-tier first. Because of this geometry it has long been assumed that the two CMGs at an origin travel C-tier first, away from one another to form bidirectional replication forks. However, a C-first CMG tracking orientation is incongruent with the idea of replisome movement, because now the unwound leading strand would need to do a U-turn after exit from the CMG inner channel and loop back up, over the 120 Å thick CMG, to the C-face of CMG where Pol ε resides (Figure 2B, left) [35,58]. Likewise, a considerable amount of the lagging strand would need to be unwound at the C-face of CMG in order to loop around the 120 Å thick CMG to reach Pol α-primase on the N-face of CMG for priming action [58].
A cryo-EM study first unambiguously demonstrated that CMG is oriented N-tier first at a replication fork [59]. This helicase orientation has since been confirmed in several subsequent studies [29,43,60,61]. While this tracking orientation fits nicely for operations of a replication fork, it is opposite from the expected C-tier first direction of CMGs tracking away from each other at a bidirectional origin and suggests instead that the two CMGs at an origin are oriented head-on. How can two CMGs move to form bidirectional replication forks if they are headed into one another? This conundrum has recently been solved. Briefly, the two CMGs push inward toward one another by pulling on opposite strands of the duplex, and this generates sufficient torque to separate the duplex into single-strands [62,63]. The CMGs then use a ssDNA gate within the Mcm2–7 ring to expel the lagging strand, as demonstrated in recent single-molecule studies [64], and in biochemical ensemble studies [23]. The two head-on CMGs, once they encircle opposite single-strands, can then motor forward and pass one another to form bidirectional replication forks. This model is also recently embraced for archaeal replication [65].
What protein(s) connect two replisomes in a replication factory?
The Ctf4 homotrimer is a replication factor unique to eukaryotes, and it is known to bind both CMG and Pol α-primase, bringing them together [35,39]. Interestingly, the molecular surface on Ctf4 to which CMG and Pol α-primase bind partially overlap, and therefore they must bind separate subunits of the Ctf4 trimer [39]. This still leaves one extra subunit of the Ctf4 trimer to bind another molecule. While Ctf4 weakly binds other factors [66], It has been proposed that the Ctf4 trimer binds one CMG and two molecules of Pol α-primase [39]. However, a recent study shows that upon mixing Ctf4 with CMG and Pol α-primase, a large Ctf4 dependent complex forms that contains two CMG and only one Pol α-primase (Figure 3A,B) [30]. Indeed, cryo-EM 2D averages of protein mixtures observed the 2 CMG-1 Ctf4 complex and a complex of 2 CMG and 1 Pol α-primase per Ctf4 trimer (Figure 3C) [30]. The 2 CMG-1 Ctf4 complex was found to be stable to ion exchange at high salt, indicating it would form a tight and stable complex in the cell [30]. High resolution cryo-EM 3D single-particle reconstruction studies revealed that the two molecules of CMG attach to the sides of the Ctf4 trimer, and that the N-tiers of the CMG roughly face one another (Figure 3D) [30]. We note that replisomes can function in vitro as individual entities that are not linked together, and that both replisomes that are linked by Ctf4 are as functional as individual replisomes lacking Ctf4 [30,49,67].
Reconstitution and visualization of a putative replication factory.
(A) Coomassie Blue stained PAGE of fractions from a glycerol gradient that reveals formation of a Ctf4 dependent factory complex reconstituted from CMG, Pol ε, Ctf4, Pol α-primase. (B) Quantitation of the bands in lane 10 of Panel A. (C) Averaged EM images of mixtures of either, Top: CMG, Ctf4 and Pol α-primase. Bottom: CMG and Ctf4. (D) Cryo-EM model of 2CMG–Ctf4 (left, top) and cryo-EM density of CMG–Pol α-primase (left, bottom), and superimposition of the 2CMG–Ctf4 and Ctf4–Pol α-primase models. Panels are adapted from figures/movies in [30].
(A) Coomassie Blue stained PAGE of fractions from a glycerol gradient that reveals formation of a Ctf4 dependent factory complex reconstituted from CMG, Pol ε, Ctf4, Pol α-primase. (B) Quantitation of the bands in lane 10 of Panel A. (C) Averaged EM images of mixtures of either, Top: CMG, Ctf4 and Pol α-primase. Bottom: CMG and Ctf4. (D) Cryo-EM model of 2CMG–Ctf4 (left, top) and cryo-EM density of CMG–Pol α-primase (left, bottom), and superimposition of the 2CMG–Ctf4 and Ctf4–Pol α-primase models. Panels are adapted from figures/movies in [30].
Structural studies of Ctf4 bound to Pol α-primase demonstrate that only one Pol α-primase can bind to the Ctf4 trimer (Figure 3D) [30]. Interestingly, EM analysis of Pol α-primase shows that it consists of two lobes, a Pol lobe (consisting of the Pol1 DNA polymerase subunit) and a Pri lobe, and that these two lobes are highly flexible relative to one another, with up to a 70° range of motion [68]. Hence, one might expect to observe only one lobe in a 3D EM reconstruction. Indeed, cryo-EM analysis of Pol α-primase bound to Ctf4 observed only the Pol lobe, and that it is located on the C-face of the Ctf4 disk under the CMG helicases [30]. The Pol lobe that is bound to Ctf4 sterically occludes additional Pol α-primase binding, and explains the 1 : 1 stoichiometry of Pol α-primase-to-Ctf4. Importantly, the single Pol α-primase does not sterically clash with the CMGs bound to Ctf4 (Figure 3D).
Proposed replisome factory function
The fact that two CMGs tightly bind one Ctf4 trimer indicates that the Ctf4 trimer joins two replisomes together, as illustrated in Figure 4. The leading strand Pol ε is known to attach to the C-face of CMG, and is sandwiched between CMG and a ring-shaped PCNA clamp [35,41]. These facts, and knowing that CMG tracks along leading strand DNA N-tier first enables one to propose how two DNA replication forks would thread through the 2 (CMG–Pol ε)-1 Ctf4-1 Pol α-primase factory during genome duplication (Figure 4) [30]. After initiation at origins, the parental strands that flank the origin would be pulled toward the factory. Importantly, this is consistent with light microscopy observations that demonstrate that parental DNA arms are pulled into the factory (see Figure 1B) [17,18]. Upon unwinding by CMG, the two leading strands would be extended by two Pol ε molecules at the outer edges of the factory located at the C-tier of the two CMGs. Each Pol ε would be held to DNA by a PCNA clamp. The two lagging strands would be expelled from the N-tier face of each CMG near the center of the factory between the two CMGs. This brings both lagging strands into proximity to the single Pol α-primase. Given the documented high degree of flexibility of the Pri lobe relative to the Pol lobe [68], the Pri lobe should be capable of reaching both CMGs and it may possibly prime both lagging strands, but this hypothetical suggestion requires further study. These actions give rise to two growing DNA loops in which the duplicated origin is propelled away from the replication factory, consistent with light microscopy results (e.g. Figure 1B) [17,18].
Proposed model of replication factory operation.
The Ctf4 trimer (purple) binds two CMGs (blue) and one Pol α-primase (green, and outline). The two arms of parental duplex DNA are brought toward one another as they are drawn into the N-tiers of each CMG. The unwound leading strand is pumped through the CMG central channel for replication by the leading Pol ε-PCNA (orange- light red) at the C-tier of CMG. The lagging strand is separated at the N-tier of each CMG near the center of the factory where the flexible Pol α-primase alternates in forming primers (red) on each of the two lagging strands. Primed sites are then extended by the lagging Pol δ-PCNA (yellow/red). This results in two growing DNA loops that started at one origin. Note the symmetric architecture of this replication factory model.
The Ctf4 trimer (purple) binds two CMGs (blue) and one Pol α-primase (green, and outline). The two arms of parental duplex DNA are brought toward one another as they are drawn into the N-tiers of each CMG. The unwound leading strand is pumped through the CMG central channel for replication by the leading Pol ε-PCNA (orange- light red) at the C-tier of CMG. The lagging strand is separated at the N-tier of each CMG near the center of the factory where the flexible Pol α-primase alternates in forming primers (red) on each of the two lagging strands. Primed sites are then extended by the lagging Pol δ-PCNA (yellow/red). This results in two growing DNA loops that started at one origin. Note the symmetric architecture of this replication factory model.
One may wonder how the factory could remain stationary given the enormous size of the DNA compared with the protein factory. We suggest that the vector forces that pull in the parent DNA and propel the newly synthesized DNAs away from individual replisomes may cancel each other out on the twin Ctf4-coupled replication factory, helping the replication factory to remain largely stationary. Important, because the twin replisomes are tethered internally (by Ctf4), rather than externally, e.g. anchoring to a fixed cellular structure, our factory model allows stochastic motion or drift as the factory accommodate gradually increasing amount of chromatin materials being synthesized. This is consistent with the experimental observations that replication foci are largely stationary yet can slowly drift in the S phase of cell cycle (insert refs [13,14]?)
Finally, we note that Ctf4 coupling in our factory model converts two asymmetric replisomes into a symmetrical replication factory. This is conceptually satisfying because one expects a symmetric replication apparatus for the elegantly symmetrical DNA duplex. This structural model and proposed function are built upon a substantial amount of previous work and may be operational during uninterrupted DNA replication. While it provides promise for important new discoveries, the model must remain tentative until further investigations expanding upon this topic.
Perspectives
Defects in DNA replication can lead to pathological states, and an understanding of mechanistic details by which DNA replication occurs may lead to treatments.
Replication factories are observed in bacterial and eukaryotic cells, but the physical basis for how replisomes are linked together has been unknown. The new twin replication factory structure explains how two replisomes are connected in eukaryotes.
It is important to determine if the replisomes within a factory ‘communicate' with one another, perhaps for repair, recombination or epigenetic inheritance of nucleosomes. Ability to reconstitute a twin factory is only the beginning of what might become a ‘new opening' in the field of replication research.
Competing Interests
The authors declare that there are no competing interests associated with the manuscript.
Funding
This work was supported by the US National Institutes of Health (NIH) (GM131754 to H.L. and GM115809 to M.E.O.) and the Howard Hughes Medical Institute (to M.E.O).
Author Contribution
H.L., N.Y.Y. and M.E.O. conceived the topic; H.L. and N.Y.Y drew the figures; H.L., N.Y.Y. and M.O.D. wrote the manuscript.
Acknowledgements
The authors are indebted to the members of the Li and O'Donnell labs for valuable discussions on the topic of this review.