All cellular life depends on multisubunit RNAPs (RNA polymerases) that are evolutionarily related through the three domains of life. Archaeal RNAPs encompass 12 subunits that contribute in different ways to the assembly and stability of the enzyme, nucleic acid binding, catalysis and specific regulatory interactions with transcription factors. The recent development of methods to reconstitute archaeal RNAP from recombinant materials in conjunction with structural information of multisubunit RNAPs present a potent opportunity to investigate the molecular mechanisms of transcription.
Evolution of RNAP (RNA polymerase) architecture
The structure and function of RNAP ancestors in the deep and distant past of the ‘RNA world’ is subject to much speculation . Eugene Koonin and colleagues have argued that the ‘ur-RNAP’ was a replicase ribozyme that facilitated replication and transcription of an RNA genome. The ribozyme later acquired a protein scaffold including a homodimeric (RNA-binding) protein cofactor, which eventually evolved into the large RNAP subunits that are characteristic of the present-day enzymes (Figure 1) . Subsequent events included the acquisition of a DNA, rather than RNA, template and the loss of the ribozyme component. A likely reflection of this distant past is the fact that RNAPs specifically interact with, and are regulated by, small RNA molecules and are still capable of utilizing RNA as templates for RNA polymerization [3–6].
Structure and function of multisubunit RNAPs
Extant bacteria and archaea employ one type of RNAP that transcribes all genes across the entire genome. In the eukaryotic domain, animals make use of three and plants utilize four distinct types of RNAP (I, II, III and IV) that transcribe non-overlapping subsets of genes. The archaeal RNAP has structural and functional homologues of all subunits of eukaryotic RNAPII, as do all other types of eukaryotic RNAP. Considering the static environments of hyperthermophilic archaea (e.g. deep-sea vents) and the evolutionary conservation of RNAP subunits, it seems likely that the evolutionary precursor RNAP that existed before the split of the archaeal and eukaryotic lineages approx. 2 billion years ago was highly similar to the present-day archaeal enzyme . All multisubunit RNAPs contain a core of conserved subunits, which are closely related in structure and function, exemplified by the bacterial enzyme (Figures 1A–1C and inset). The archaeal and eukaryotic RNAPs furthermore contain a subset of homologous subunits that have no bacterial counterparts. We do not completely understand the significance of these subunits in evolutionary terms, and discovering their function is currently subject to much research activity. It is often assumed that an increased RNAP subunit repertoire can facilitate a more sophisticated regulation of RNAP activity and thus gene expression, but a simple look at archaeal and bacterial RNAPs and their cognate regulatory metabolic circuits contradicts such an oversimplified view. The 12-subunit RNAPs from Methanocaldococcus jannaschii and Methanosarcina mazeii are nearly indistinguishable, whereas the former inhabits volcanic vents (considered to be a static environment) and has a genome size of 1.7 Mb (1772 annotated genes), the latter can be found in rice paddies (considered to be a dynamic environment) with a 4.1 Mb genome (3439 genes). Methanosarcina is one of the major global sources of the greenhouse gas methane and, in contrast with Methanocaldococcus, it can ferment a range of carbon sources . Escherichia coli in comparison has a genome size of 4.6 Mb encoding 4377 genes, it is metabolically highly adaptable and is capable of regulating its gene expression accordingly, yet it has a simple four-subunit RNAP architecture. These examples demonstrates that a greater number of RNAP subunits do not necessarily correlate with more complex regulation. There are two scenarios that are likely to explain the RNAP subunit composition in the three domains: either (i) the archaeo-eukaryote-specific subunits were gained during evolution of the archaeal/eukaryotic branch by ‘capture’ of polypeptides that were reversibly associated with RNAP (e.g. transcription factors), or (ii) the subunits were present in RNAPs before the split of the bacterial and archaeal/eukaryotic branches and were subsequently lost during the evolution of Bacteria [7,9]. It is not possible by sequence analysis of present-day genomes to unequivocally determine which of the two scenarios is true. However, many scientists favour the first view on the basis of fundamental mechanisms of evolution such as the generation of paralogous proteins by gene duplication and subsequent speciation, and by the principle of Occam's razor . The exact RNAP subunit composition in archaea is subject to minor variations in Euryarchaea and Crenarchaea. Only the latter contain subunit G, the homologue of eukaryotic RPB8 (RNA polymerase B 8), and it has therefore been proposed that a crenarchaeal-like ancestor gave origin to present-day eukaryotes [9,11]. In order to understand the evolution of RNAPs, it is important to characterize the structure and function of the individual subunits.
Functional dissection of RNAP
The best understood multisubunit RNAP is the bacterial enzyme because of its simple α2ββ′ω organization (Figure 1C and inset) and its ability to be reconstituted from individual recombinant subunits in vitro. In addition, it has been possible to complement these in vitro experiments with classical genetics in vivo for almost three decades now. It has not yet been possible to assemble eukaryotic RNAPs in vitro, and the majority of the information obtained is based on genetics and is thus often limited by the viability of mutants that were generated. The application of genetics in the archaea is still in its infancy, but it is progressing rapidly [12,13]. Recently, two laboratories have reported the successful in vitro reconstitution of active archaeal RNAPs from recombinant subunits [14,15]. This makes the archaeal RNAP currently the only recombinant in vitro model system for the eukaryotic enzymes. A functional dissection strategy in which recombinant RNAP variants were assembled by omitting individual subunits has yielded intriguing insights into their contribution to RNAP activities, such as enzyme assembly, nucleic acid (DNA/RNA) binding, catalysis and sites of interactions with basal transcription factors [14–18]. From these studies, a picture has emerged according to which RNAP subunits can be roughly divided into three groups concerned with (i) catalysis (Figure 1D), (ii) assembly of the catalytic subunits (Figure 1F) and (iii) auxiliary functions (Figure 1G).
The large subunits A and B make up about two-thirds of the total mass of the RNAP and form what is often likened to two halves of a crab claw whose pincers interact with the downstream DNA duplex (Figure 1D). Two active-site Mg2+ ions are positioned in the centre of the claw. In archaea, the large A and B subunits are split into two polypeptides (annotated as A′/A″ and B″/B′ according to size) whose structure and sequence are closely related to their (un-split) bacterial β′ and β and eukaryotic RPB1 and RPB2 counterparts (Figure 1 and inset). The large subunits harbour the binding sites for substrate NTPs, duplex DNA template and a 9 bp DNA–RNA hybrid, and provide the catalytic centre, including three catalytic aspartic acid residues and two Mg2+ ions, that facilitate RNA polymerization. The efficient assembly of the large A and B subunits depends on the small subunits L, N, D and P, which form a stable complex located opposite the DNA entry site and the pincers (Figure 1) . The bacterial RNAP uses a homodimeric assembly platform consisting of α subunits that are homologous with both archaeal subunits D and L (corresponding to eukaryotic RPB3 and RPB11), suggesting that the D/L subunits arose by gene duplication of the α subunit following the split of the bacterial and archaeal/eukaryotic lineages and subsequently acquired the two additional subunits N and P (RPB10 and RPB12) that form an interface with the B′ subunit. The combination of assembly platform (L/N/D/P) and catalytic subunits (A′/A″/B′/B″) is the minimal subunit configuration necessary and sufficient for promoter-directed transcription. This implies that these minimal RNAPs can undergo factor-dependent recruitment to the promoter, DNA melting and template strand loading, promoter escape and productive transcription elongation . It also demonstrates that the other RNAP subunits, H, K, F and E, are not strictly required for these processes (Figure 1G). In the yeast Schizosaccharomyces pombe, all of the homologous genes are essential for cell viability, and three of them (H, F and E) have no bacterial homologues. Subunits F and E form a stable subcomplex that binds the nascent RNA transcript during transcription and is anchored to the RNAP core via subunit K [19,20]. In yeast and bacteria, the homologues of K (RPB6 and ω respectively) furthermore stabilize the largest RNAP subunit (RPB1 and β′) .
RNAP function through the transcription cycle
Transcription is a cyclic process by which the molecular machinery makes repeated use of the same template (Figure 2A). During the initiation phase of transcription, the archaeal RNAP interacts with three factors, TBP (TATA-box-binding protein), transcription factor B (TFB) and E (TFE) [14,18] (Figure 2B). The molecular mechanisms of archaeal RNAP during transcription initiation are illustrated in Figure 2(C). Archaeal and eukaryotic TBP nucleates the formation of transcription initiation complexes by binding to the TATA element of archaeal promoters with intermediate affinity (Kd ≈ 10–100 nM), which results in a bending of the promoter DNA by 50–90° ([22,23] and D. Grohmann and F. Werner, unpublished work). TFB binds to the TBP–TATA complex by sequence-specific recognition of BRE (TFB-responsive element) and this stabilizes the pre-initiation complex . RNAP is subsequently recruited by interactions between the N-terminal zinc-ribbon domain of TFB and the A′ subunit dock domain of RNAP . In this ‘closed’ complex, the RNAP makes little or no direct contact with the promoter DNA that is in duplex form. During the ‘closed-to-open’ complex formation, the topology of the initiation complex undergoes dramatic changes during which the double-stranded DNA is melted and the template strand is loaded into the DNA-binding channel and the active site. The RNAP clamp domain is closed over the DNA-binding channel, while the non-template strand is located over the outside of the clamp (Figure 2C). The TFB B-finger domain is inserted into the active site and stimulates catalysis, most likely by interacting with the template DNA strand and/or the NTP substrates. The open complex formation is not dependent on, but is stimulated by, RNAP subunits F/E that are thought to promote the clamp closure mentioned above. TFE is not strictly required for this process, but stabilizes the open complex by interactions with the non-template strand and in a manner that is dependent on RNAP subunits F/E (Figure 2C) [14,25]. These results have led to the hypothesis that TFE and RNAP subunits F/E co-operate via a clamp-closure mechanism to facilitate the transition from the closed to the open complex. The RNAP in the open initiation complex is still anchored to the promoter DNA by interactions with TFB and will initially synthesize ‘abortive’ products (3–11 nt transcripts) most likely by scrunching DNA into the enzyme similar to bacterial RNAP . The energy stored in scrunched DNA may be used to disrupt the interactions with the promoter-bound basal factors and eventually the enzyme escapes the promoter .
RNAP function through the transcription cycle
The elongation phase of transcription is not uniform, but is prone to pausing, stalling or even arrest that can result in a displacement of the RNA 3′-terminus out of the active site. This problem can be overcome by backtracking of the elongating enzyme, extrusion of the misaligned RNA 3′ through a pore that during polymerization allows the entry of substrate NTPs, and a retuning of the active site from RNA polymerization into an endonucleolytic mode . The resulting ‘transcript cleavage’ generates a new RNA 3′-terminus in the active site that is conducive to continued RNA polymerization. Since RNA polymerases are not capable of resuming elongation after they have disengaged from the template, this ‘rescue’ mechanism of stalled complexes is crucial for the processivity of transcription. In archaea and eukaryotes, the basal transcription factor TFS and TFIIS respectively stimulate transcript cleavage by inserting a finger-like domain into the active site and repositioning a catalytic Mg2+ ion [29–31]. Three other archaeal factors are likely candidates for regulating transcription elongation by virtue of sequence homology with their bacterial and eukaryotic counterparts, the eukaryote-like heterodimeric Spt4/5 complex, a homologue of the eukaryotic chromatin remodeller elongator factor Elp3, and the bacterial-like NusA . However, it has not been established experimentally whether these factors regulate transcription elongation in archaea, and little is known about the underlying molecular mechanisms of their homologues in eukaryotes and bacteria.
Archaeal genomes are characterized by a high gene density and short intergenic regions, necessitating efficient termination of transcription in order to prevent readthrough of RNAPs into neighbouring genes. The termination of the archaeal RNAP is efficiently facilitated in vitro by transcription of a stretch of seven consecutive U residues (U7) by a hitherto uncharacterized mechanism (Figure 2C) [33,34]. This mechanism is fundamentally different from bacterial factor-independent intrinsic termination because it does not require a strong RNA hairpin secondary structure . Transcription of A7 does not lead to termination, but, even though the hydrogen-bonding of U:A base pairs is weaker than A:T base pairs, weakened DNA–RNA interactions alone are unlikely to provide the full explanation of the termination mechanism . An alternative model seeks to explain termination as a consequence of prolonged pausing of RNAP when transcribing a U7 stretch. The paused RNAPs could subsequently undergo conformational changes, such as an opening of the clamp domain, facilitating the release of the transcript RNA and dissociation of RNAP from the DNA template by a mechanism similar to bacterial intrinsic termination . However, difficulties arise when trying to reconcile the in vitro data with biologically relevant transcription termination in vivo. Not all archaeal genes or operons harbour a T7 stretch terminator at the 3′-end and, in addition, T7 sequences can be found within archaeal open reading frames. There are no discernible archaeal homologues of bacterial and eukaryotic termination factors such as rho or Xrn2, but a novel archaeal termination factor that facilitates accurate and regulated termination in vivo does not seem inconceivable. A recent report describes a preferential recycling of RNAPs to the same template molecule in a manner dependent on a terminator sequence . However, the underlying molecular mechanisms that physically link the terminating RNAPs with the basal factors assembled on the promoter located upstream in the same gene are not understood.
Archaeal RNAP transcription systems offer many advantages. Owing to their subunit architecture and requirement for basal factors, they are highly reminiscent of their eukaryotic siblings and serve as valid model systems for many eukaryotic RNAP functionalities, including nucleic acid interactions (e.g. RNA-binding of F/E), large scale conformational changes (e.g. clamp closure during template DNA loading), catalytic mechanisms (e.g. RNA polymerization and cleavage) and interactions with and regulation by basal transcription factors (e.g. TBP, TFB, TFE, Spt4/5). At the same time, they offer the biochemical and biophysical tractability of the much simpler bacterial RNAPs that the eukaryotic enzymes are lacking. The successful reconstitution of archaeal RNAP from individual recombinant components has opened many exciting possibilities that we just have started to explore.
Molecular Biology of Archaea: Biochemical Society Focused Meeting held at University of St Andrews, U.K., 19–21 August 2008. Organized and Edited by Stephen Bell (Oxford, U.K.) and Malcolm White (St Andrews, U.K.).
We thank the Biotechnology and Biological Sciences Research Council [grant number BB/E008232/1], the Wellcome Trust [grant number 079351/Z/06/Z] and University College London for financial support.