CRISPR (cluster of regularly interspaced palindromic repeats)/Cas and CRISPR/Cmr systems of Sulfolobus, targeting DNA and RNA respectively of invading viruses or plasmids are complex and diverse. We address their classification and functional diversity, and the wide sequence diversity of RAMP (repeat-associated mysterious protein)-motif containing proteins encoded in Cmr modules. Factors influencing maintenance of partially impaired CRISPR-based systems are discussed. The capacity for whole CRISPR transcripts to be generated despite the uptake of transcription signals within spacer sequences is considered. Targeting of protospacer regions of invading elements by Cas protein–crRNA (CRISPR RNA) complexes exhibit relatively low sequence stringency, but the integrity of protospacer-associated motifs appears to be important. Different mechanisms for circumventing or inactivating the immune systems are presented.
The discovery of the widespread occurrence of CRISPR (cluster of regularly interspaced palindromic repeat)-based immune systems in archaea and bacteria has provided important insights into how hosts can inactivate and or regulate invading foreign DNA and, probably, RNA genetic elements. In addition, these systems are likely to influence how co-invading genetic elements can influence one another [1,2]. The two main molecular apparatus involved are structurally complex, partially independent and have diversified functionally. Moreover, their capacity to facilitate the continual uptake of foreign DNA into host chromosomes, and their propensity for transfer between organisms, has important implications for cellular evolution.
The genus Sulfolobus provides an important model system for studying these immune systems. Most Sulfolobus species carry complex and diverse CRISPR-based systems and appear to be particularly active in the uptake of foreign DNA inserts into their CRISPR loci. Furthermore, a broad collection of Sulfolobus genetic elements is available that can be used to challenge the CRISPR-based systems . It includes numerous diverse viruses many of which have been classified into eight new viral families [4,5] as well as a family of plasmids encoding an archaeal-specific conjugative apparatus [6,7].
Many insights into the complexity of the CRISPR-based immune systems, and their mechanistic diversity, have emerged from detailed experimental studies of CRISPR/Cas and CRISPR/Cmr systems of the archaeal genera Sulfolobus and Pyrococcus respectively, and from investigation of bacterial CRISPR/Cas systems of Streptococcus thermophilus [8,9], Staphylococcus epidermidis [10,11] and Escherichia coli . In the present article, we focus primarily on current knowledge and ideas deriving from, and relating to, the Sulfolobus immune systems.
CRISPR/Cas families: complexity, classification and versatility
At an early stage, it was clear that the CRISPR/Cas and Cmr systems were highly complex when approx. 45 different proteins were implicated in their function , and the number has continued to rise . Genes of the two systems are clustered into cas and cmr cassettes which are sometimes linked physically. These cassettes encode a few core proteins, but they also carry different combinations of other genes, some occurring more commonly than others. Thus cassettes vary markedly in their overall gene contents. To illustrate this, core gene structures of the archaeal cas cassettes are shown together with a more complex family I cas cassette from Sulfolobus islandicus HVE10/4 (Figures 1A and 1B). The core cas genes classify into cas group 1, implicated in CRISPR acquisition of foreign DNA and insertion into CRISPR loci, and cas group 2 associated with crRNA (CRISPR RNA) processing and guidance (Figure 1A).
Core genes of archaeal cas cassettes
Families of CRISPR/Cas modules have been classified on the basis of gene content and gene order within cas cassettes, and on the basis of conserved sequences of cas genes, leader regions and repeats within CRISPR/Cas modules. For archaea, about eight families have been proposed, whereas among the Sulfolobales, three are common (I–III) and one less so (IV) [2,15,16,17].
Cmr modules carry two conserved core genes, cmr2 and cmr5 (Figure 2A), and a variable number of genes encoding diverse proteins which carry RAMP (repeat-associated mysterious protein) motifs. The Cmr modules can be classified into five main families A, B, C, D and E for archaea on the basis of phylogenetic tree building using sequences of Cmr2 and its homologues Csm1 and Csx11 (Figure 2B), where most Sulfolobus Cmr modules fall within families B or D. Further classification is complicated by the presence of multiple diverse copies of genes coding for RAMP-motif-containing proteins. Although these proteins can be classified into families on the basis of these motifs, the remainder of the protein sequences tend to be highly divergent, as illustrated for four proteins encoded in a Cmr family B module of Sulfolobus solfataricus P2 (Figure 2C).
Classification of archaeal Cmr modules
Most Sulfolobus species carry multiple CRISPR/Cas and/or Cmr modules and, given the high energy cost of maintaining and expressing them, they must confer major advantages on to the cell. Clearly, given the molecular and mechanistic complexities of the systems, they can be inactivated readily by incurring a defect in a component or critical sequence motif. Moreover, the systems are potential targets for incoming genetic elements which may attempt to integrate into essential cas or cmr genes as has been observed for a viral integration in a csa3 gene of S. islandicus strain M.16.4 (see below) or modify their protein products or otherwise interfere with transcription or maturation of crRNAs. Therefore multiple systems will provide added security against unwanted invasion. The pairing of many family I CRISPR/Cas modules may reflect a compromise between providing added security and generating more compact and efficient systems which can potentially be mobilized and transferred between organisms as single units .
A further advantage may arise from the presence of different families of CRISPR/Cas modules which is commonly observed for Sulfolobus (e.g. S. solfataricus carries family I and II modules, whereas Sulfolobus acidocaldarius carries those of family II and III) . Their presence may increase versatility in both the uptake of spacers and targeting of protospacers with different PAMs (protospacer-associated motifs).
The presence of multiple Cmr modules is also likely to confer functional versatility, although they are subject to the constraint that some encoded proteins must be able to recognize part of the repeat sequence of the co-inhabiting CRISPR/Cas module [18,19]. Cmr modules are sometimes linked directly to CRISPR/Cas modules on chromosomes and, given their functional interdependence, there is likely to have been some co-evolution of the coupled systems. Consistent with this view, analysis of the Sulfolobales suggests that Cmr family D modules (Figure 2B) are commonly, but not exclusively, found together with family II CRISPR/Cas modules.
CRISPR loci: structural and functional complexity
CRISPR loci consist of regularly spaced direct repeat sequences with intervening spacers deriving from invading foreign DNA elements. Archaeal repeats fall in the size range 23–37 bp and most spacers are 25–50 bp long . CRISPR loci are preceded by a leader region which varies in size from approx. 150 to 550 bp and shows levels of sequence conservation which are only considered significant within specific families of CRISPR/Cas modules. CRISPR locus sizes can also vary considerably, suggesting that rates of spacer turnover differ markedly for different CRISPR loci within a given archaeon. But there is no support for differences occurring between the CRISPR/Cas families of the Sulfolobales, since large and small clusters exist for the most common families I, II and III.
In organisms carrying several CRISPR/Cas modules, including S. solfataricus strains P1 and P2 with six, and S. acidocaldarius with five, they may not all be fully functional. The CRISPR/Cas system exhibits two partially independent functions with one group of Cas proteins responsible for uptake of invader DNA into CRISPR loci and the other for generating crRNAs and guiding them to the invading genetic element (Figure 1). Only the latter proteins are essential for the CRISPR/Cas system to function. Thus non-extending CRISPR loci may still be useful to cells as long crRNAs are generated. S. acidocaldarius carries two large loci and three smaller ones of 11, five and two spacer-repeat units. All five clusters were transcribed and processed to mature crRNAs , but possibly the spacer addition functions are defective for the small clusters. Similarly, for S. solfataricus P1 and P2, of the six CRISPR loci, only four appear to be active in elongation. Of the other two, the smallest (locus E) carries six spacer-repeat units with a leader and no cas genes  and does not appear to be transcribed . It carries spacers matching rudiviruses and a conjugative plasmid and is conserved in three S. solfataricus strains (two from Naples, Italy) with only the final downstream spacer differing between the P1/P2 strains and strain 98/2 (Figure 3). Moreover, it is also found on a highly conserved 36 kb chromosomal fragment (99% sequence identity) in the S. islandicus strain L.D.8.5 (from Lassen, CA, U.S.A.) , with an almost identical leader region (one mismatch) and identical repeat sequence but different spacers (Figure 3). The maintenance and spreading of locus E, lacking a cas cassette, would suggest that the CRISPR module can be activated and generate crRNAs. The inference that Cas proteins encoded in one CRISPR/Cas module can activate other CRISPR loci would also be consistent with the inference that the group 1 cas genes (Figure 1A) can exchange between CRISPR/Cas modules .
A map of the CRISPR locus E
The large inactive locus F with 88 spacer-repeat units, is completely conserved in sequence between S. solfataricus strains P1 and P2, but it lacks a leader region, and, although transcription occurs internally within the CRISPR locus, mature crRNAs are not generated [21,23]. Thus the latter, which has been lost from S. solfataricus strain 98/2, may be of little use when a viral infection occurred.
Generally for Sulfolobus species, loss of mobile DNA elements is difficult, thus IS (insertion sequence) elements tend to degenerate rather than be deleted , and this may also apply to CRISPR/Cas and Cmr modules, and explain the maintenance of defective CRISPR systems over long periods, although in a variant strain of S. solfataricus P2 (P2A), four physically linked CRISPR/Cas modules (A–D) were apparently lost via a single recombination event between bordering IS elements .
Transcription of CRISPR loci and processing
Processed CRISPR transcripts were first observed for the euryarchaeon Archaeoglobus fulgidus and crenarchaeon S. solfataricus, and these studies revealed the regular pattern of the RNA processing, using probes specific for repeat sequences [26,27]. Subsequently, the smallest Sulfolobus RNA product of approx. 40 bp was identified covering primarily a single spacer sequence . S. acidocaldarius CRISPR loci are transcribed upstream from the first repeat within the leader region and termination occurs downstream from the final repeat. Even for the locus carrying 78 spacer-repeat units (4930 bp), a substantial proportion of transcripts were approx. 5000 nt long with another large portion in the size range 3000–3500 nt .
This raised an important question as to how transcription continues throughout CRISPR loci apparently unimpeded by the presence of spacers carrying archaea-specific promoter or terminator motifs, given that the DNA uptake mechanism is essentially statistically random . A compilation of potential promoter and terminator motifs on the leader (crRNA) strand of the available Sulfolobus genomes revealed, for a total of 4505 spacers, 2560 carrying archaeal-type hexameric TATA boxes (at least six consecutive A and Ts with at least two As) and 725 with T-rich pyrimidine motifs (at least six consecutive T and Cs with at least five Ts) [28,29]. Although many of these may at best be weakly effective, nevertheless, given the high gene density in the Sulfolobus viral and plasmid genomes and the low frequency of operon structures, the probability of taking up such active motifs is significant. The conclusion that transcripts do not normally start within CRISPR loci is also supported by examination of CRISPR transcripts from S. solfataricus P2 transcriptome data , which indicate that most of the detectable 5′-ends are attributable to processing sites within repeats . A possible explanation for the unimpeded transcription through the CRISPR loci could be the presence of the CRISPR-binding protein of Sulfolobus and other crenarchaea ; it could act as a transcription factor inhibiting transcriptional starts and stops within the spacer sequences, and repeats.
Full-length transcripts are also produced from the opposite DNA strand of CRISPR loci of S. acidocaldarius which yield discrete 50–60 bp fragments carrying spacer sequences, albeit at lower molar levels than for the crRNAs , and antisense RNA transcripts also were detected for CRISPR loci of S. solfataricus P2 . Failure to detect similar transcripts in the euryarchaeon Pyrococcus and bacterium E. coli [12,19] suggests that this may be a specific property of Sulfolobus or crenarchaea. Analyses of cDNA libraries of S. solfataricus demonstrated previously that antisense RNAs are commonly produced especially against transposase mRNAs , and several other antisense RNAs have been detected for this organism . Given that mature crRNAs are produced in the absence of infecting genetic elements in different Sulfolobus species [16,20,23], one possible explanation is that these antisense RNAs protect at least a fraction of the crRNAs against degradation before their activation.
Maturation of crRNAs and stringency of targeting mechanisms
Details of RNA-processing mechanism have been elucidated for a euryarchaeal CRISPR/Cmr system and an E. coli CRISPR/Cas system where Cas6 homologues cut in the repeat, 8 nt 5′ from the start of the spacer sequence, whereas the 3′-processing sites differ [12,18]. For S. solfataricus, many 5′-ends, and putative processing sites, are detectable 6–8 nt from the spacer start , suggesting that a similar mechanism operates. Processing at the 3′-end of the crRNA is less clearly defined, but for the CRISPR/Cmr system of Pyrococcus, a 14 nt ruler mechanism enables the processing ribonuclease to generate dual cuts at 5 and 11 nt into the spacer sequence . Presumably, crRNA-binding Cas and Cmr proteins distinguish between the different crRNA products before targeting the foreign DNA or RNA respectively.
Until recently, attention focused on targeting of double-stranded DNA elements, but probably single-stranded DNA will also be targeted by the CRISPR/Cas system. It remains an open question whether the CRISPR/Cmr system targets both mRNA and viral RNA, and incorporation of viral RNA into CRISPR loci would require reverse transcriptase activity. Nevertheless, all evidence suggests that the primary targets of the Sulfolobus immune systems are viruses and plasmids and, probably, their mRNAs. There is no support for a general targeting of transposable elements. Spacers matching transposase genes are occasionally found in CRISPR loci [16,20,32], but they can generally be attributed to transposase genes present in viruses or plasmids, in particular orphan orfB elements (family IS605/200) for Sulfolobus [2,15].
Effective targeting of genetic elements requires that the mature crRNA anneals to the protospacer DNA region. Although, for the bacterium S. thermophilus, a perfect sequence match was required to elicit a response from the CRISPR/Cas system , studies on different Sulfolobus strains have shown that a less stringent recognition system prevails. Challenging Sulfolobus cells with viral genes carrying one to three mismatches still produced a strong response from the CRISPR/Cas system . Another important factor is the motif known as PAM. Targeted genetic elements carry this short sequence motif which creates a mismatch with the 5′-end of the crRNA [16,33,34]. For Sulfolobus, this was defined as a family-specific dinucleotide, displaced 1 nt from the spacer sequence [15,16]. Potentially, this can be involved in both selection of protospacers for excision by Cas proteins and crRNA targeting. Whereas a study of the bacterium S. epidermidis concluded that the PAM was not important for protospacer targeting and that any mismatched base pairing would suffice , for S. islandicus strain REY15A, altering the PAM led to a loss of crRNA targeting .
Although a few archaeal viruses have been shown to be lytic and to elicit strong immune responses, many Sulfolobus viruses and plasmids coexist in a stable relationship, at low copy numbers, over longer periods. Although these genetic elements do not appear to be targeted by the host CRISPR systems, the latter could nevertheless have a regulatory role possibly by targeting mRNAs.
Another special feature of archaeal genetic elements is that they often carry an integrase gene which partitions on chromosomal integration. Consequently, the integrated element can only be excised when the free element is present to generate an intact integrase/excision enzyme . Thus targeting and degradation of the free genetic element by the host CRISPR/Cas system could actually favour entrapment of the integrated element, and such a process could enhance viral and plasmid evolution in archaea. The Redder Model  for archaeal viral evolution hypothesized that, since more than one type of fusellovirus can integrate at a given att site within a tRNA gene, the encaptured concatenated viruses would tend to recombine thereby generating, and subsequently releasing, hybrid fuselloviruses . A similar process may occur for Sulfolobus-specific conjugative plasmids. They are also integrative, and their DNA is regularly incorporated into CRISPR loci as spacers [16,20]. Moreover, this could explain why some of the different Icelandic conjugative plasmids cultivated in Wolfram Zillig's laboratory  often carry large regions of almost identical nucleotide sequence [6,7]. Thus, indirectly, the CRISPR/Cas systems could be fuelling production of new viral and plasmid variants which they may subsequently be required to inactivate.
Some insights into how genetic elements undermine or avoid the CRISPR immune systems were gained by passing the rudivirus SIRV1 (Sulfolobus islandicus rod-shaped virus 1) through a series of closely related S. islandicus strains. This generated many sequence changes in the viral genes, but striking was the frequent occurrence of genes that were altered by 12 bp indels, probably deletions . When similar 12 bp indels were observed among related lipothrixviruses, it was inferred that these might occur at crRNA-targeting protospacers on the viral genomes . In another study of a hyperthermophilic archaeal virus, HAV1 (hyperthermophilic archaeal virus 1), cultured in a bioreactor over a 2-year period, samples taken at different times showed genome sequence changes, not unlike those observed earlier for SIRV1, but also a series of recombination sites were detected along the linear genome at which frequent rearrangements had occurred to generate viral variants with altered sequences .
Although accumulating specific sequence changes in genetic elements is an effective way of avoiding, at least temporarily, crRNA targeting, more direct methods must also have evolved. Thus, for the S. islandicus strain M.16.4, an M164 provirus 1 has inserted into, and disrupted, the csa3 gene considered to encode the transcriptional regulator of the group 1 cas genes (Figure 1A) associated with new spacer uptake . This has the advantage for the virus that other infecting viruses will still be attacked by crRNAs if matching spacers are already present in the CRISPR locus, but new spacers cannot be generated from M164 provirus itself.
Other possible mechanisms were discerned from a study in which CRISPR systems of Sulfolobus were challenged directly by vectors carrying viral genes or protospacers showing various degrees of matching to host CRISPR spacers which mimicked, to a degree, the continual infection of a host cell with a given virus . In many viable transformants, CRISPR locus deletions, including the matching spacer, had occurred, whereas in others, whole CRISPR/Cas cassettes were lost. However, several transformants revealed no changes in either CRISPR/Cas modules or vector constructs, suggesting that other unknown regulatory mechanisms, can inactivate the immune system .
CRISPR/Cas and Cmr module mobility
Sulfolobus CRISPR/Cas and Cmr modules generally occur within variable chromosomal regions where extensive gene shuffling has occurred [2,41], often attributable to high levels of transposable elements. Recombination at bordering IS elements can also lead to loss of CRISPR/Cas or Cmr modules . There is also strong evidence in support of the transfer of whole modules between organisms based on comparative studies of CRISPR/Cas module families and their locations, although the transfer mechanisms remain unclear . For bacteria, evidence was provided for transfer of these modules on large plasmids , but many archaeal CRISPR/Cas modules are large, up to 25 kb, and the largest conjugative plasmids are only approx. 40 kb . Chromosomal conjugation may provide a vehicle, possibly facilitated by encaptured Sulfolobus conjugative plasmids [43,44] or presently unknown mechanisms may operate, possibly within biofilms. Finally, although phylogenetic analyses support the transfer of CRISPR/Cas and Cmr modules between archaea and bacteria, the basic differences in archaeal and bacterial transcriptional and translational mechanisms and in the unique cell wall, membrane structures and conjugative system of archaea provide formidable barriers to transfer between domains .
Molecular Biology of Archaea II: A Biochemical Society Focused Meeting held at Robinson College, Cambridge, U.K., 16–18 August 2010. Organized and Edited by Stephen Bell (Oxford, U.K.) and Finn Werner (University College London, U.K.).
Research was supported by grants from the Danish Natural Science Research Council [grant number 272-08-0391], the Danish Research Council for Technology and Production [grant number 274-07-0116] and the Danish National Research Foundation.