CRISPR-Cas adaptive immunity and the three Rs

In this summary, we focus on fundamental biology of Clustered Regularly Interspersed Short Palindromic Repeats (CRISPR)-Cas (CRISPR-associated proteins) adaptive immunity in bacteria. Emphasis is placed on emerging information about functional interplay between Cas proteins and proteins that remodel DNA during homologous recombination (HR), DNA replication or DNA repair. We highlight how replication forks may act as ‘trigger points’ for CRISPR adaptation events, and the potential for cascade-interference complexes to act as precise roadblocks in DNA replication by an invader MGE (mobile genetic element), without the need for DNA double-strand breaks.


Figure 1. Overview of the stages of CRISPR-Cas adaptive immunity.
During 'adaptation', DNA fragment 'protospacers' from an MGE are captured and integrated into a CRISPR locus, where they become a new spacer coupled to synthesis of a new DNA repeat. Transcription a CRISPR into RNA leads to loading of crRNA fragments into 'interference' complexes that are called 'Cascades' in many bacteria and archaea. Cascades and analogous interference complexes, catalyse base pairing between crRNA and MGE DNA, leading to further adaptation and nuclease activity directed to the MGE.
In this short survey, we focus on emerging ideas in analysing the fundamental biology of CRISPR-Cas in bacteria and archaea. We focus on interactions of CRISPR-Cas proteins with non-Cas host cell proteins that drive DNA replication, repair and HR, in the context that replication forks recruited for MGE replication, and R-loops formed by CRISPR-Cas interference are trigger points for capturing of MGE DNA for acquisition of new spacers, building CRISPR-based immunity.

CRISPR-Cas immunity: adaptation and interference
Adaptation (also called 'spacer acquisition') drives CRISPR-Cas immunity depositing into a CRISPR locus new spacers from MGE DNA. Spacer crRNA transcripts then provide the means for R-loop-based interference, when crRNA base pairs with the complimentary DNA sequence in an MGE. Physical and functional linkage between adaptation and interference, first identified in Escherichia coli [22] is a potent feedback circuit for updating CRISPR-Cas immunity, described later. Adaptation can be subdivided into two major events, target 'DNA capture' and its 'integration' into a CRISPR locus, extensively reviewed recently in [2]. A 'Cascade' ribonucleoprotein complex targets a payload of crRNA to MGE DNA to catalyse CRISPR interference in most bacterial and archaeal type I CRISPR systems. There are significant differences in structural details and protein composition of Cascades from different phyla, but they share an underlying principle of assembling crRNA into the complex for pairing to DNA, forming an R-loop, first elucidated in E. coli [23]. Interaction of Cascade with a nuclease, typically a Cas3-family translocase-nuclease, completes interference by degradation of MGE DNA and this may present new substrates for adaptation. Cas9 also forms R-loops for interference reactions [24], but negates the requirment for a Cas3-family translocase-nuclease by possessing intrinsic nuclease activity that creates a DNA double-strand break within the R-loop. This sequence-sensitive R-loop formation and nuclease activity within a single Cas9 protein polypeptide has made it specially suited for genome editing reactions.

Adaptation: integration of MGE DNA as CRISPR spacers
Two proteins widely distributed across all the CRISPR-Cas systems, Cas1 and Cas2, catalyse the capture and integration of new spacers into CRISPR loci. E. coli Cas1-Cas2 so far provides the most complete picture of the molecular events of protospacer integration and, to a lesser extent, protospacer capture. In two crystal structures, Cas1 homodimers are bound on either side of a Cas2 dimer in a butterfly-like (Cas1) 4 -(Cas2) 2 structure bound to a splayed DNA duplex [25,26]. The 3 -ssDNA ends are accommodated into two of the four possible Cas1 active sites. DNA ends are bound with sequence specificity because Cas1 recognizes short 'PAM' sequences (protospacer adjacent motifs) as part of the DNA capture process, detailed in the next section. The reaction mechanism of integration by E. coli Cas1 has been detailed in recent papers [27,28]. Briefly, 3 -OH ends of protospacer DNA bound in Cas1 active sites are used for metal-dependent nucleophilic attack of DNA in a CRISPR locus. The initial integration event occurs at the boundary of the first CRISPR repeat and the leader, and the second site is at the repeat 1-spacer 1 boundary. This covalently attaches the protospacer to CRISPR DNA but generates ssDNA gaps flanking the integration site, which require filling, detailed more below. In E. coli Cas1-Cas2 structure, Cas2 acts as a scaffold, to position Cas1 subunits for catalysis; no catalytic activity is required from Cas2 [29]. This topology ensures that E. coli Cas1-Cas2 integrates a DNA protospacer of defined size, identified in vitro as a 23-bp duplex with 3 overhangs of five nucleotides in length, providing consistency to the length of spacers integrated into the CRISPR locus. Detailed knowledge of E. coli Cas1-Cas2 structure-function belies that there is a significant variation in how Cas1 and its associated proteins achieve adaptation across prokaryotic phyla. Among the type I CRISPR-Cas systems, Sulfolobus solfataricus Cas1 does not seem to form a complex with Cas2, yet its reaction mechanism is similar to E. coli Cas1 [28], and in Legionella species, Cas2 is a DNA/RNA nuclease that links CRISPR immunity with virulence [30]. Other type I systems contain Cas1 proteins fused with Cas4 [31], an iron-sulphur 5 -3 DNA nuclease [32], indicating that nucleolytic DNA processing during interference and integration are tightly coupled, with or without involvement of Cas2. An ancient CRISPR-Cas precursor 'Casposon' from Aciduliprofundum boonei has no Cas2 protein, and Cas1 alone is proficient at integrating protospacers into CRISPR as 14-15 bp spacers, much shorter than their counterparts in E. coli [33,34]. In type II CRISPR-Cas system of Streptococcus pyogenes, which contains the interference enzyme Cas9, a dimer of Cas1 interacts with a tetramer of Csn2 protein [35], suggesting an integration complex that is highly diverged from E. coli type I Cas1-Cas2. Cas9 is required for protospacer integration in this system [36], in contrast with E. coli in which Cas1-Cas2 alone are sufficient for an integration reaction, albeit one that lacks specificity and would require subsequent gap-filling and DNA ligation, as discussed later.

Adaptation and interference: priming and capturing of MGE DNA protospacers
The mechanics of protospacer DNA captured from an MGE are less understood than the subsequent protospacer integration, as a new spacer. Protospacer capture can be subdivided into 'naïve' and 'primed' pathways [37,38]. In naïve adaptation in E. coli, spacers from an MGE are generated with absolute dependency on Cas1 and Cas2, but independently of Cascade-catalysed interference. Naïve adaptation could therefore generate immunity against an MGE not previously encountered by the cell. However, the physiological relevance of naïve adaptation in many species may be questionable, especially if, as described later, primed adaptation can also create new spacers even without any escape mutation from an MGE. It is not clear how, in E. coli, Cas1-Cas2 would gain access to fragments of protospacer DNA for capture, but the involvement of non-Cas nucleases is a possibility, discussed later.
Primed adaptation manifests as elevated frequencies of spacer incorporation compared with naïve adaptation, in response to previously encountered MGEs [22,39]. In principle this phenomenon, 'priming' is the same as observed in metazoan cellular immune systems. Priming is the major driving force for new spacer generation in CRISPR immunity. E. coli priming requires Cas1 and Cas2, Cascade and Cas3. Analysis of interplay between these proteins, and perhaps other non-Cas host enzymes, is fascinating in trying to establish how priming occurs during the dynamic operations of the established immune response during an MGE attack, amidst the more routine cellular nucleic acid processing events.
CRISPR-Cas interference complexes, exemplified by Cascade and Cas9 form nucleoprotein R-loops in which cr-RNA is base-paired to MGE protospacer DNA (Figure 2). A crucial element for recognition of the protospacer is a trinucleotide 'PAM' on MGE DNA, the significance of which was first proposed years before its mechanistic importance was elucidated [40,41]. The nature of PAM recognition dictates that they are directly adjacent to the target site of an interference complex. However variability arises in the exact positioning, with the E. coli PAM located on the non-complimentary strand, conversely the S. pyogenes PAM is directly upstream of the target sequence on the complimentary strand. Multiple PAMs are often acceptable for Cascades, for example in archaeal organism Haloferax volcanii [42], but in E. coli there is a preference for the PAM sequence 5 -A[A/T]G-3 [43]. Recognition of PAM by Cascade subunits, in E. coli Cascade by Cse1 (CasA) [44], stimulates R-loop formation and recruits the ATP-dependent Cas3 nuclease-translocase [5,23,45,46], resulting in nicking of non-target strand DNA in the R-loop followed by unidirectional degradation of MGE DNA. In contrast with interference by the Cascade interference complex from E. coli, S. pyogenes Cas9 catalyses R-loop formation and nuclease activity via HNH and RuvC-like active sites [47,48]. Cas9 cleaves both strands of the target DNA, generating a DNA double-strand break, and remains DNA-bound before presumably its release by as yet unknown cellular factors. Priming of adaptation is triggered by mutations in PAM, and/or in DNA of the protospacer that result in mismatches with crRNA in interference R-loop [22,49]. Investigating molecular mechanisms of primed adaptation is currently a fast-moving field of research that have highlighted most recently that alterations to Cascade conformation changes its properties, such that Cas3 nuclease may not be directly recruited to a Cascade interference complex, thereby preventing the onset of nuclease stages of interference [50,51]. However, in this scenario Cas1-2 may facilitate recruitment of Cas3, by an unknown mechanism [52]. Bidirectional Cas3 helicase/translocase activity is observed however, leading to the suggestion that E. coli Cas3 may transport Cas1-2 along DNA for sampling of MGE DNA for capture [52,53]. Cascade may also recruit Cas3 in a PAM-associated manner, resulting in the cleavage of target DNA into short fragments 30-100 bp in length which can then be processed as new protospacers by/for Cas1-2 to undergo integration [50]. Observations of physical interaction between Cas1-Cas9 in type II-B system, suggests that there is a general principle to facilitate adaptation through physical and functional interaction between proteins of protospacer capture and interference in an 'adaptasome' machine. We now look at how non-Cas proteins may contribute to CRISPR-Cas immunity, with emphasis on the roles of proteins hitherto more associated with DNA repair and HR, especially in bacteria.

Examples of non-Cas host proteins in CRISPR-Cas immunity
Non-Cas proteins are essential players in promoting, controlling and inhibiting the three stages of CRISPR-Cas immunity ( Figure 3). Diversity of CRISPR-Cas systems is mirrored in emerging knowledge of networks for transcriptional regulation of CRISPR-Cas systems in different organisms, reviewed recently [54]. H-NS is a consistent performer across species for regulating CRISPR-Cas systems, acting as a transcriptional repressor, but there are many more unique or specialized effects of other transcriptional regulators. In E. coli, the histone-like DNA binding protein IHF (integration host factor) has a crucial role in promoting adaptation in the type I-E CRISPR-Cas system by guiding protospacer bound Cas1-Cas2 to the leading end of a CRISPR locus for integration of a new spacer [55]. E. coli IHF was discovered for its ability to promote integration of phage λ, and was subsequently identified as a modulator of recombination, DNA replication and transcription, via sequence-specific manipulation of DNA structures, and by  interaction with other nucleic acid processing enzymes [56]. The importance of IHF in CRISPR adaptation, at least in E. coli, highlights how fundamental nucleic acid processing mechanisms can be multitasked to underpin more specialized activities. Genetic data from analyses of primed and naïve adaptation in E. coli suggest that multiple enzymes of recombination and DNA repair may also support CRISPR adaptation catalysed by Cas1 and Cas2.

Interactions between CRISPR-Cas and DNA replication, recombination and repair
Experiments in E. coli have identified interesting interactions between effectors of HR or DNA repair, and Cas1 or adaptation processes more generally [57][58][59]. In one analysis [59], E. coli cells provoked by genotoxic agents showed reduced viability when the gene encoding Cas1 (ygbT) alone was deleted, sensitivity that was epistatic to deletion of genes encoding the HR 'resolvasome' proteins RuvABC [60]. This was the first suggestion for potential functional interaction between Cas1 and HR. In the same analysis, purified Cas1 protein physically interacted with RuvB, an ATP-dependent DNA helicase that with RuvA and RuvC branch migrates and resolves Holliday junction DNA molecules formed during late stages of HR [61]. HR can be initiated from dsDNA breaks by resection of the DNA ends by RecBCD protein complex, a bipolar DNA translocase and nuclease [62]. This generates ssDNA that is bound by RecA recombinase and directed into sequence homology searches to generate synaptic products called 'displacement loops' (D-loops), that are similar to aforementioned R-loops, but with an invading DNA, not RNA, strand. Interestingly, E. coli Cas1 protein also reportedly physically interacts with RecB and RecD [59]. Functional significance of this or of Cas1 interaction with RuvB, is unclear. Genetic analyses of CRISPR adaptation in E. coli have identified that RecBCD can be a driver of naïve adaptation [57,58], although it is not needed for primed adaptation [57]. A role for RecBCD in naïve adaptation has been proposed [58], exploiting its ability to recognize a specific DNA sequence on the E. coli chromosome called Chi (5 -GCTGGTGG-3 ). This is proposed to help distinguish 'self' from 'non-self' DNA, leading to protospacer capture targeted to an MGE rather than the host chromosome. In this model, double-strand breaks in DNA would be necessary, to provide a substrate for canonical RecBCD resection activity.
Although these might arise randomly as a result of chemical damage to DNA, it would seem a haphazard way of beginning the CRISPR adaptation process, especially if dsDNA breaks are limited to only (estimated) 10-25 occurrences per cell, per day [63]. It is unclear how the DNA products formed by RecBCD, fragmented ssDNA, even after engagement with chi, could become splayed DNA duplexes that are bound by Cas1-Cas2. Some other factor(s) may be able to influence DNA break formation or recombination enzymes may aid and abet CRISPR adaptation via other modes of supporting DNA replication forks [64]. Observation that RecBCD is not needed for primed adaptation [57] suggested that targeting of MGE DNA by Cascade interference, and subsequent Cas3, and/or other, nuclease activity are sufficient for a 'self-/non-self' distinction brought about by crRNA of Cascade targeting MGE DNA.
In contrast with different requirements for RecBCD in E. coli CRISPR, naïve or primed adaptation, a deletion of the gene encoding DNA polymerase I (polA) in E. coli resulted in no detectable expansion of CRISPR loci in either adaptation pathway [57]. This requirement is compatible with DNA 'gap filling' , which is required after integration of a new protospacer into a CRISPR locus, and is consistent with the known role of DNA polymerase I in DNA repair pathways. In addition to potential roles for Ruv, RecBCD and DNA polymerase I proteins in adaptation, DNA repair helicases RecG and PriA (sometimes also called primosomal protein N), which are noted for remodelling DNA at blocked replication fork blocks and ter sites, have also been shown to be important for primed, but not naïve, adaptation [57]. In one model, it was proposed that Cascade interference complexes might act as roadblocks to MGE replication, which relies on the E. coli host replisome. In this model, Cascades form site-specific barriers to MGE replication dictated by the sequence of crRNA, and that this is a trigger point for adaptation by interaction with Cas1-Cas2. Roles for RecG and PriA in this model are unclear, although we speculate that the known ability of RecG to dissociate R-loops may be important for controlling access of Cas enzymes to DNA and potentially to prevent R-loop primed DNA replication, which is a known cause of genome stability. Overall physical and function interplay between effectors of DNA replication, repair and recombination, and CRISPR-Cas proteins (summarized in Figure 4) are significant but poorly understood, requiring further work to establish mechanism and especially if CRISPR interference antagonizes DNA replication.

Funding
This work was supported by the BBSRC [grant number BB/M020541/1]; and the BBSRC DTP scheme. The work on CRISPR-Cas was carried in the Bolt lab.