The powerful combination of genetic and biochemical analysis has provided many key insights into the structure and function of the chromosomal DNA replication machineries of bacterial and eukaryotic cells. In contrast, in the archaea, biochemical studies have dominated, mainly due to the absence of efficient genetic systems for these organisms. This situation is changing, however, and, in this regard, the genetically tractable haloarchaea Haloferax volcanii and Halobacterium sp. NRC-1 are emerging as key models. In the present review, I give an overview of the components of the replication machinery in the haloarchaea, with particular emphasis on the protein factors presumed to travel with the replication fork.
In all forms of life, high-fidelity chromosomal DNA replication requires the complex interplay of a variety of essential and non-essential protein factors in a temporally and spatially co-ordinated manner. Understanding how these proteins perform their diverse functions, how they interact with one another and how these interactions are regulated presents a major challenge that has a significant impact on our overall understanding of genome stability.
In archaeal cells, the chromosomal DNA replication apparatus (replisome) is broadly eukaryotic in character (see [1–3] for reviews). A number of archaeal proteins have been identified as having strong protein sequence similarity to eukaryotic replication proteins. These proteins are generally found encoded by all archaeal genomes.
In the present paper, I review current knowledge of the protein components that are likely to be present at the haloarchaeal replication fork. Six archaeal genomes, representing five of 25 currently recognized haloarchaeal genera, have been sequenced completely [4–8] and form the basis of this review. Partial sequences available for several additional species have been excluded for purposes of clarity. The six completely sequenced genomes include both Halobacterium sp. NRC-1  and Halobacterium salinarum R1 . The relatedness of these two organisms is a matter of debate [9,10], but, for the purposes of this review, they are treated as one species. Tools for analysis of the genome sequences can be found at the UCSC Archaeal Genome Browser  and the HaloLex  databases.
Origins and Cdc (cell-division cycle) 6/ORC (origin-recognition complex) proteins
In eukaryotic cells, chromosomal DNA replication begins at multiple discrete sites on each chromosome known as replication origins . Origins are bound by the ORC which acts as a landing pad for the related Cdc6 protein. In collaboration with Cdt1, Cdc6 then promotes further assembly of replisome components at the origin, including the MCM (minichromosome maintenance) helicase (discussed below) . In archaeal cells, chromosome replication is also initiated at specific sites, a number of which have now been identified through a combination of bioinformatic, genetic, molecular biological and biochemical approaches (see [14–16] and references therein). These origins are specifically bound by homologues of the initiation factor Cdc6/Orc1. In the haloarchaea, replication origins have been identified in Haloferax volcanii  and Halobacterium sp. NRC-1 [17–19].
Genes encoding Cdc6/Orc1 homologues are abundant in the haloarchaea; a detailed analysis of the phylogeny of the encoded proteins lies outside the scope of the present review, but can be found elsewhere . Interestingly, some Cdc6/Orc1 genes are located adjacent to known origins, suggesting a functional link between the Cdc6/Orc1 and replication initiation at the origin. However, genetic analysis has shown that, of ten Cdc6/Orc1 homologues encoded by Halobacterium sp. NRC-1, eight are non-essential for cell viability and only two are essential . Haloferax volcanii, meanwhile, encodes 14 Cdc6/Orc1 homologues, of which at least three are non-essential . Clearly much work will be required to elucidate the functions of these different proteins.
MCM, GINS, primase and RPA (replication protein A)
The MCM complex is most likely to be the enzyme responsible for unwinding duplex DNA ahead of the moving replication fork in eukaryotic and archaeal cells. In eukaryotic cells, MCM is heterohexameric in nature. The archaeal MCM complex is similar, but, in this case, the complex is a homohexamer with a single gene encoding all six subunits (reviewed in ).
The five sequenced haloarchaeal genomes encode a total of eight MCM-like proteins (Figure 1). Five of these (one encoded by each organism) form a core group of phylogenetically closely related proteins (Figure 1A) that can be assumed to be functioning as the replicative helicase. Consistent with this, in Halobacterium sp. NRC-1, the core group MCM protein is required for cell viability . The same is true of the Haloferax volcanii MCM protein (T. Kristensen and S.A. MacNeill, unpublished work). The functions of the three MCM homologues shown as outliers in Figure 1(A) are not yet known, neither is it known whether these proteins are capable of interaction with the core group MCM proteins.
Haloarchaeal MCM and GINS proteins
Four of the core group MCM proteins are interrupted by inteins: one each in the case of the MCM proteins from Halobacterium sp. NRC-1, Haloarcula marismortui and Natronomonas pharaonis, and four in the case of Haloquadratum walsbyi (see Figure 1). Inteins are a relatively common feature of archaeal replication factors (see the InBase database  at http://www.neb.com/neb/inteins.html for further details), with examples also being found in DNA polymerase and RFC (replication factor C) subunits (discussed below). The inteins are generally found in key catalytic motifs, making their excision a prerequisite for enzyme function.
Evidence from higher-eukaryotic cells suggests that the active form of MCM at the replicative fork is present in a complex (called the CMG complex) with two other protein factors, Cdc45 and GINS [23–25]. To date, no homologues of Cdc45 have been found in the archaea, but GINS proteins are widespread, if not universal. Eukaryotic GINS is a tetramer of four related proteins: Sld5, Psf1, Psf2 and Psf3. The first GINS complex to be characterized was that of the crenarchaeaon Sulfolobus solfataricus . Like its eukaryotic counterpart, S. solfataricus GINS is also a tetramer, but comprises two molecules of Gins15 (which is most closely related to the eukaryotic Psf1 and Sld5 proteins) and two of Gins23 (most closely related to Psf2 and Sld3) . The genes encoding S. solfataricus Gins15 and Gins23 are located immediately downstream of (and are co-transcribed with) the genes encoding the catalytic subunit of primase and the MCM helicase respectively, and the Gins23 protein is capable of interacting with both subunits of primase and with MCM . Pyrococcus furiosus GINS has a similar structure to that of S. solfataricus: P. furiosus Gins23 also interacts directly with (and stimulates) MCM, and Gins15 interacts with Orc1/Cdc6 .
The haloarchaea, in common with many other euryarchaea, appear to encode a single GINS homologue only [26,27]; the gene encoding this protein (which most closely resembles Gins15) is located immediately downstream of that encoding the primase catalytic subunit PriS. Strikingly, the haloarchaeal GINS proteins are all significantly larger than other eukaryotic and archaeal GINS proteins. This is the result of a large sequence insertion between the conserved N- and C-terminal A- and B-domains [28–30]. At 474 amino acids, for example, the Haloquadratum walsbyi GINS protein is almost three times longer than the S. solfataricus Gin15 and Gins23 proteins . The inserted sequences bear little similarity to one another, or to sequences in current databases, and what, if any, function they perform remains unknown. We have shown that the Haloferax volcanii ginS gene cannot be deleted without loss of viability, consistent with it playing an essential role in replication (A. Norman and S.A. MacNeill, unpublished work).
The archaeal primase is a two-subunit enzyme that closely resembles its eukaryotic counterpart (see [1–3] and references therein). As noted above, primase interacts directly with GINS in S. solfataricus and P. furiosus [26,27]. Each of the five haloarchaea encodes the two subunits of primase (PriL and PriS), and, in each case, as noted above, the gene encoding the PriS subunit is adjacent to that encoding GINS. Both primase genes appear to be essential for cell survival in both Haloferax volcanii and Halobacterium sp. NRC-1, as neither gene can be deleted in either organism [20,31]. In addition, it is noteworthy that the haloarchaea also encode a protein with significant similarity to the bacterial DnaG primase. Whether this enzyme has any part to play in chromosome replication remains to be seen; deletion analysis in Haloferax volcanii indicates that the gene is non-essential .
In eukaryotic cells, the ssDNA (single-stranded DNA)-binding factor RPA plays a vital role during the replication process by binding and protecting ssDNA and co-ordinating protein–protein interactions among components of the replisome . Each of the three subunits of the eukaryotic complex contains at least one OB (oligonucleotide/oligosaccharide-binding) fold, the basic unit of ssDNA binding in all three domains of life. The identity and molecular composition of the haloarchaeal SSB (ssDNA-binding protein) is still unclear, but a number of proteins harbouring OB folds are encoded by each of the haloarchaeal genomes ; some of these proteins also contain zinc-binding motifs, another feature shared with eukaryotic RPA . Recent work in Halobacterium sp. NRC-1 indicates an important role for two potential RPA proteins in radiation-resistance .
Polymerases, sliding clamps and clamp loaders
The haloarchaea encode two DNA polymerase enzymes that are believed to function primarily in chromosome replication: the single-subunit family B polymerase PolB1 and the two-subunit family D enzyme PolD [1–3]. Each genome encodes a single PolB1 protein; in four species, the protein is interrupted by one or more inteins (Figure 2A). At the C-terminus of each protein lies a short conserved sequence motif termed a PIP box [PCNA (proliferating-cell nuclear antigen)-interacting protein box] (Figure 2B) that mediates interactions between the polymerase and the toroidal sliding clamp processivity factor PCNA [36–38]. In addition to these enzymes, further family B DNA polymerase homologues (designated PolB2 and PolB3) have been identified in three species. Interestingly, on the basis of comparative sequence analysis, it has recently been proposed that the PolB2 enzymes are inactive ; consistent with this, the Halobacterium sp. NRC-1 PolB2 protein has been shown to be non-essential for cell viability .
Domain structures and PIP boxes in haloarchaeal DNA polymerase, clamp loader and nuclease proteins
The two-subunit PolD enzyme is specific to the euryarchaea [40,41]. The large catalytic subunit of this enzyme (designated PolD2) has no counterpart in bacteria or eukaryotes, whereas the small exonuclease subunit (PolD1) shares sequence similarity with the B-subunits of eukaryotic replicative polymerases. Each haloarchaeal genome encodes a single PolD2 (which can be interrupted by one or more inteins, see Figure 2A) and a single PolD1 protein. In the haloarchaea, the genes encoding these proteins are generally not linked to one another in the genome and thus are not co-transcribed. Like PolB1, the PolD2 protein possesses a PIP box at its extreme C-terminus (Figure 2A). Deletion of the genes encoding either PolD1 or PolD2 is not possible in Halobacterium sp. NRC-1 , indicating an essential function, but it remains the case that there is no direct evidence that the PolD is actually required for chromosomal replication.
The trimeric sliding clamp PCNA is a ring-shaped complex that encircles and can slide along dsDNA (double-stranded DNA), thereby tethering the polymerase to its substrate . Each haloarchaeal genome encodes a single PCNA protein [1–3] as do the haloarchaeal viruses ΦCh1  and ΦH . In order to load PCNA on to dsDNA, the PCNA ring must be opened; this is achieved by the action of a clamp loader complex called RFC. In eukaryotic cells, RFC is a pentameric complex comprising five different subunits, one large and four small. Each is a member of the AAA+ (ATPase associated with various cellular activities) family of proteins that are able to bind and hydrolyse ATP . Many archaea encode only two RFC proteins, one large and one small, that form pentameric RFC complexes with 1:4 stoichiometry [1–3]. The large subunit possesses a C-terminal PIP box.
Haloarchaeal RFC is probably also a pentamer of one large and four small subunits, but, interestingly, each haloarchaeal genome encodes not two, but three, distinct RFC proteins (Figure 2A). One of these (RfcB) is the large subunit and carries a C-terminal PIP box, whereas the others (designated RfcA and RfcC) are small subunits (Figure 2B). A similar situation exists in methanogenic euryarchaea . The composition and stoichiometry of the haloarchaeal RFC complex remains untested, but none of the three RFC genes can be deleted from the Haloferax volcanii genome, suggesting that each encodes an essential protein (A. Zhao and S.A. MacNeill, unpublished results). The same is true of the single Haloferax volcanii pcnA gene  and the corresponding gene in Halobacterium sp. NRC-1 . In common with PCNA from other euryarchaea , haloarchaeal PCNA is homotrimeric. Recently, the crystal structure of Haloferax volcanii PCNA has been solved (K. Morgunova and R. Ladenstein, personal communication) and reverse genetic analysis is underway in our laboratory (M. Agerbæk, F.C. Gray and S.A. MacNeill, unpublished results).
Okazaki fragment processing
The discontinuous nature of lagging-strand DNA synthesis dictates that Okazaki fragments must be processed and joined before the new DNA strand is complete. DNA ligase enzymes play an essential role in the final joining reaction. All archaeal genomes encode an ATP-dependent ligase (LigA); in addition, four of the five haloarchaeal genomes (Haloferax volcanii, N. pharaonis, Haloarcula marismortui and Haloquadratum walsbyi) also encode an NAD+-dependent DNA ligase enzyme (LigN) apparently acquired by lateral gene transfer from a bacterial organism prior to the divergence of the haloarchaeal lineage [48,49]. To analyse the function of these enzymes, the ligA and ligN genes have been separately deleted from the Haloferax volcanii genome: neither gene is essential, but simultaneous deletion of both is lethal, indicating that the two enzymes, native and acquired, share an essential function [48,49]. In addition to LigA and LigN proteins, Haloquadratum walsbyi also encodes a second ATP-dependent ligase homologue, the function of which is unknown .
Nuclease activity is also required for Okazaki fragment processing, and the PIP-box-containing Fen1 enzyme may play a role in this. Genetic analysis in Haloferax volcanii shows Fen1 to be non-essential, but cells lacking the enzyme grow more slowly than wild-type and show reduced ability to incorporate nucleotides into DNA . In Halobacterium sp. NRC-1, Fen1 appears to be essential . The RNAse HII homologue Rnh2 might also have a minor role in the replication process. Like Fen1, this protein also binds PCNA via a C-terminal PIP box and is non-essential in Haloferax volcanii .
Progress through genetics
The contribution made by genetic studies in yeast to our understanding of how eukaryotic chromosomes are replicated cannot be overstated. A number of key replication enzymes were first identified through genetic screens for replication-deficient mutants (examples include the MCM helicase, Cdc6 and GINS), whereas the in vivo roles of others have been clarified . A detailed understanding of archaeal chromosome replication will require a similar application of genetic techniques. Progress in this area clearly requires the availability of a broad range of molecular genetic tools and, in this regard, Halobacterium sp. NRC-1 and Haloferax volcanii are excellent candidates as model systems. Reverse-genetic approaches are readily applicable to these organisms: both can be transformed with plasmid DNA, selectable (and counter-selectable) markers and regulated promoters are available, and homologous recombination is efficient and reliable [50,51]. Working with Haloferax volcanii, for example, we have recently had considerable success generating mutant alleles of replication factor genes, including mcm and pcnA, allowing us to address detailed questions of replication protein structure and function in vivo, the first time that this has been done in any archaeal organism (M. Agerbæk, K. Ellermann Jensen, R.M. Cherian, M. Madsen, K. Jensen, T. Kristensen, F.C. Gray and S.A. MacNeill, unpublished work). The information derived from these studies complements ongoing biochemical analyses of replication protein function perfectly and will allow us to gain significant insights into the way in which chromosomes are replicated in the third domain of life.
Molecular Biology of Archaea: Biochemical Society Focused Meeting held at University of St Andrews, U.K., 19–21 August 2008. Organized and Edited by Stephen Bell (Oxford, U.K.) and Malcolm White (St Andrews, U.K.).
proliferating-cell nuclear antigen
box, PCNA-interacting protein box
replication factor C
replication protein A
This work was supported by Forskningsrådet for Natur og Univers, Lundbeckfonden and Novo Nordisk Fonden.