In bacteria and archaea, RNA-Seq deep sequencing methodology allows for the detection of abundance and processing sites of the small RNAs that comprise a CRISPR (clustered regularly interspaced short palindromic repeats) RNome. Comparative analyses of these CRISPR RNome sets highlight conserved patterns that include the gradual decline of CRISPR RNA abundance from the leader-proximal to the leader-distal end. In the present review, we discuss exceptions to these patterns that indicate the extensive impact of individual spacer sequences on CRISPR array transcription and RNA maturation. Spacer sequences can contain promoter and terminator elements and can promote the formation of CRISPR RNA–anti-CRISPR RNA duplexes. In addition, potential RNA duplex formation with host tRNA was observed. These factors can influence the functionality of CRISPR–Cas (CRISPR-associated) systems and need to be considered in the design of synthetic CRISPR arrays.
The CRISPR (clustered regularly interspaced short palindromic repeats)–Cas (CRISPR-associated) defence systems facilitate prokaryotic immunity against invading genetic elements [1–3]. Genomic CRISPR loci comprise repetitive identical DNA elements (repeats) that are interspaced by unique spacer sequences that are often derived from viral genomes or conjugative plasmid DNA. CRISPR arrays are transcribed from a promoter within an adjacent leader region into long pre-crRNAs (precursor CRISPR RNAs) that are subsequently processed by an associated Cas endonuclease into small interfering crRNAs (CRISPR RNAs) [2,4–10]. The mature crRNAs are then bound by Cas proteins to form effector ribonucleoprotein complexes that target and degrade complementary invader DNA or RNA [3,11–16]. Members of the diverse family of Cas6 endonucleases have been shown to be responsible for crRNA maturation [17–25]. RNA-Seq deep sequencing methodology has emerged as a method of choice to detect and quantify crRNAs among total RNA populations. Obtained RNA-Seq sequencing reads are mapped to reference genomes. At the location of genomic CRISPR loci these mappings provide single-base resolution information about the overall abundance, the distribution and the processing patterns of crRNAs in an organism.
Common features of CRISPR RNomes
Recent RNA-Seq small RNA profiling analyses for different archaea and bacteria (e.g. for Methanococcus maripaludis , Methanopyrus kandleri , Methanosarcina acetivorans , Nanoarchaeum equitans , Pyrococcus furiosus , Sulfolobus solfataricus , Clostridium thermocellum , Clostridium difficile  and Thermus thermophilus ) have shown that CRISPR Type I and II RNomes display various common features. CRISPR loci are often constitutively transcribed in vivo and processed within the repeat sequence into small crRNAs. Mature crRNAs generally comprise a defined 5′-terminal 8 nt tag derived from cleavage of the upstream repeat element, the spacer guide sequence and the residual 3′-terminal repeat sequence that is often gradually shortened and can thus vary in length. The initial Cas6 cleavage reaction creates crRNAs with 5′-hydroxy and 2′,3′-cyclic phosphate termini [12,26]. Therefore T4 polynucleotide kinase treatment of isolated crRNAs is mandatory to ensure proper RNA adapter ligation during RNA-Seq library preparation [32,33]. Differential RNA-Seq approaches that compare the sequencing of RNA-Seq libraries with and without RNA termini treatment allow the distinction between mature crRNA molecules and the transcription start sites of long pre-crRNAs . The comparison of the coverage of RNA-Seq reads for the individual crRNAs of one CRISPR array highlight a surprisingly large degree of abundance variability. Nevertheless, a commonly observed trend is a gradual decline in crRNA abundance from the leader-proximal region to the leader-distal site of a CRISPR locus. It is plausible that the crRNAs closest to the promoter in the leader region are most abundant as an increased transcript length coincides with an increased chance of (premature) termination and RNA polymerase stalling. From an evolutionary perspective, this set-up is advantageous for the cell as the crRNAs that arise from the leader-proximal area of the CRISPR cluster contain spacers that target the most recent invaders [2,34]. In the following sections, we discuss factors that contribute to the various amounts of crRNAs in the cell.
Spacer sequences that promote and terminate transcription
Promoter elements within the leader sequences drive CRISPR transcription. For some cases, these promoters can be regulated as, e.g., in Escherichia coli, the DNA binding H-NS (histone-like nucleoid-structuring) protein is involved in CRISPR–Cas promoter (cas operon and pre-cRNA) silencing . However, CRISPR promoters can also be located within spacer and repeat sequences. In the CRISPR–Cas Type II system of Neisseria meningitidis, crRNA 5′-termini are produced by transcription from repeat embedded promoters and not by RNase III in the presence of Cas9 . One example for spacer-mediated promotion of transcription is found in a large 170 repeat CRISPR cluster from C. thermocellum. RNA-Seq analyses revealed that the crRNA abundance gradually declines from crRNA 1 to crRNA 103 and then peaks again with crRNA 104  (Figure 1A). Close analysis of the transcription start site of this internal promoter revealed a potential −35 region (TTGAAA) encoded in the repeat element and a putative −10 Pribnow box (TAATAT) provided by the sequence of spacer 103 (Figure 1A). Surprisingly, this internal promoter appears to be stronger than the promoter located within the leader region. It is unclear how this promoter/spacer evolved, but the resulting increase in the amount of leader-distal crRNAs would produce an immediate selective advantage for an organism that is attacked by a virus with complementary DNA. It should be noted that the final uridine base of the 5′-terminal 8 nt tag changes to cytidine in the repeat elements located upstream of spacer 115, which indicates that CRISPR clusters might have recombined.
RNA-Seq coverage data for CRISPR loci
The investigation of potential promoter and terminator motifs in Sulfolobus genomes revealed that more than 50% of all spacer sequences contain putative archaeal TATA boxes and approximately 16% of all spacers contain T-rich pyrimidine motifs that represent potential terminator sequences [37,38]. However, significant internal promotion or premature termination of CRISPR transcription was not observed in S. solfataricus . One possible explanation for the steady CRISPR transcription in S. solfataricus could be the presence of a Cbp1 (CRISPR DNA repeat-binding protein 1) that was shown to specifically bind to repeat elements and could inhibit internal transcription initiation and termination events .
RNA-Seq data of CRISPR RNomes from C. thermocellum and few other organisms also provided evidence for reverse strand transcription starting both from the leader-distal regions and within a CRISPR cluster  (Figure 1B). This production of anti-crRNA transcripts is mediated by highly AT-rich spacers or leader-distal sequences. These analyses also revealed the presence of small processed anti-crRNAs that, following comparison, show a conserved processing pattern within the repeat elements. Whereas the general abundance of anti-crRNAs is significantly lower than their crRNA counterparts, it was observed that the presence of distinct processed anti-crRNAs correlates with the highly reduced abundance of the complementary crRNA (Figure 1B). This indicates that an anti-crRNA could neutralize the matching crRNA in the absence of invaders. Anti-CRISPR transcripts were also observed for P. furiosus , Sulfolobus acidocaldarius  and S. solfataricus . The downstream regions of the CRISPR clusters from S. acidocaldarius revealed potential BRE (transcription factor IIB-recognition element)/TATA promoter regions that could initiate anti-CRISPR transcription. In P. furiosus, a spacer sequence was found that contains a putative promoter driving antisense transcription within the CRISPR array. In contrast with the observations made in bacteria, it was suggested that these anti-crRNAs could rather function to protect crRNAs from degradation . In P. furiosus, in vitro, the Cas protein effector complex CMR targets RNA  and was shown to cleave the antisense RNA . The contrasting effects of antisense RNA transcripts on crRNA stability in bacteria and archaea could be explained by the analysis of the phylogenetic distribution of dsRNA endonucleases. RNase III enzymes are found in bacteria and were shown to play a key role in RNA processing within Type II CRISPR–Cas systems. In archaea, however, enzymes with RNase III-like domains are not readily detectable and dsRNA might not be subject to degradation.
In addition to spacers containing promoters, they can also place termination signals into the CRISPR arrays. In archaea, poly-T stretches have been shown to mediate intrinsic termination [41,42]. In agreement, archaeal repeat sequences have been observed to avoid poly-T sequences, whereas poly-A regions (which might influence termination of antisense transcription) are often present .
Spacer sequences can form RNA duplexes with a host tRNA
Analysis of RNA-Seq data from C. thermocellum revealed that crRNAs can not only target invading DNA, but also show complementarity to other crRNAs and host tRNAs. C. thermocellum contains five CRISPR loci and two of these loci (with different repeats) harbour spacers with complementary sequences (locus 2, crRNA19 and locus 5, crRNA3) (Figure 1C). These two crRNAs can form an RNA duplex which might be the reason for the observed elimination of crRNA19. Interestingly, the spacer sequences match the sequence of the single-host tRNAPro(GGG) isoacceptor (Figure 2). This implies that either a part of the host tRNAPro gene was incorporated into CRISPR clusters or that these sequences stem from viral attachment sites that target tRNA genes. Consequently, base complementarity with crRNA3 provides two target regions in the cell: the tRNAPro(GGG) molecule and the crRNA19 (Figure 2). The crRNA targeting of the essential tRNA molecule might have regulatory implications, but is most likely to be problematic due to inhibited precursor-tRNA folding if not induction of dsRNA cleavage. Therefore crRNA3 might act as an RNA sponge that was maintained to bind the problematic crRNA. This scenario is in agreement with RNA-Seq data that show the near elimination of crRNA19, whereas crRNA3 is present in abundance.
Complementary crRNAs contain host tRNAPro sequences
Influence of spacer elements on crRNA maturation
In the absence of spacer elements that promote or terminate transcription, other factors need to be considered to assess why individual crRNAs can be found to be highly underrepresented in the cell. It is possible that the adjacent spacer sequences have an impact on endonucleolytic repeat cleavage efficiency. This scenario was tested for the single subtype I-B CRISPR cluster of M. maripaludis . In vitro Cas6 maturation analysis of a set of consecutive spacer–repeat–spacer substrates showed that these spacer sequences can influence Cas6 processing activity. For example it was observed that the two substrates harbouring the longest spacer were cleaved less efficiently. However, the observed effects were relatively minor and did not directly correlate with the crRNA abundance pattern observed by RNA-Seq analysis of the investigated CRISPR cluster. Spacer sequences might influence repeat cleavage not only via direct effects of RNA sequence and length recognition by Cas6, but also by folding of intra- and inter-spacer secondary structures within the pre-crRNA transcript. Additional parameters that have to be considered include the loading efficiency of crRNAs into the respective Cas protein interference complexes and the stability and turnover of these crRNAs within these ribonucleoprotein assemblies.
The investigation of crRNA abundance and processing patterns from several organisms using RNA-Seq methodology has demonstrated that single spacer sequences can have an extensive impact on pre-crRNA transcription and crRNA maturation. This should affect the functionality of entire CRISPR regions even though the actual threshold for the crRNA abundance that is required to still mediate an effective defence reaction is not known. Several applications of CRISPR–Cas systems involve the design of artificial CRISPR modules including synthetic spacer sequences. To ensure functionality of these crRNAs, the additional roles of spacer sequences discussed need to be considered. The design or choice of spacer sequences should consider avoiding sequences that might form secondary structures with other spacers within a pre-crRNA transcript, with other crRNAs or with undesired host RNA targets. Additionally, potential termination sites should be excluded to guarantee full-length CRISPR transcription. Length and sequence of spacer elements can influence the stability of the crRNAs and the efficiency of Cas6 endonuclease processing. In contrast, the inclusion of spacer sequences that contain promoter elements can prove to be beneficial in a scenario where leader-distal crRNA production needs to be increased. Finally, anti-crRNA production was shown to influence the abundance of complementary crRNA molecules and should also be considered.
Future research will address the effect of spacer sequences on preferential crRNA loading into different Cas protein interference complexes. Ultimately, these fully assembled ribonucleoprotein complexes are responsible for the successful defence from viral attacks which places a special focus on the investigation of the correlation between effective immunity in an organism's natural environment and the cellular crRNA abundance.
CRISPR Evolution, Mechanisms and Infection: A Biochemical Society Focused Meeting held at the University of St Andrews, U.K., 17–19 June 2013. Organized and Edited by Emmanuelle Charpentier (Laboratory for Molecular Infection Medicine Sweden, Sweden), John van der Oost (Wageningen University, The Netherlands) and Malcolm White (University of St Andrews, U.K.).
We thank André Plagens for advice and discussions.
This work was supported by grants from the Deutsche Forschungsgemeinschaft (DFG) (research group FOR1680) and the Max-Planck Society.