In this summary, we focus on fundamental biology of Clustered Regularly Interspersed Short Palindromic Repeats (CRISPR)-Cas (CRISPR-associated proteins) adaptive immunity in bacteria. Emphasis is placed on emerging information about functional interplay between Cas proteins and proteins that remodel DNA during homologous recombination (HR), DNA replication or DNA repair. We highlight how replication forks may act as ‘trigger points’ for CRISPR adaptation events, and the potential for cascade-interference complexes to act as precise roadblocks in DNA replication by an invader MGE (mobile genetic element), without the need for DNA double-strand breaks.
Overview: from CRISPR-Cas biology to genome editing and back to biology
CRISPR (Clustered Regularly Interspersed Short Palindromic Repeats)-Cas (CRISPR-associated proteins) systems provide prokaryotes with adaptive immunity against MGEs (mobile genetic elements). Immunity is affected through two major components that interact with other non-Cas host proteins in cellular biology in ways that are still being elucidated: (i) a specialized DNA locus called CRISPR and (ii) cas genes, encoding Cas proteins. The overarching mechanisms by which CRISPR-Cas provides defence against MGEs are very similar (Figure 1), throughout a diversity of CRISPR-Cas types and subtypes: (i) MGE DNA fragments (‘protospacers’) are captured by Cas1 enzymes working in concert with Cas2 or analogous proteins, for integration of captured DNA into CRISPR as new ‘spacers’. This process is called spacer acquisition or adaptation, (ii) transcription of CRISPR from a ‘leader’ promoter generates RNA that is processed into ‘crRNA’ (CRISPR RNA), which is the cargo in ribonucleoprotein complexes that target MGE DNA, (iii) base-pairing of crRNA and DNA within the ribonucleoprotein complex forms an ‘R-loop’ (RNA-loop), in which an RNA–DNA duplex is formed, leaving one DNA strand unpaired. In some instances, this can be an intermediate provoking homologous recombination (HR), but in the context of CRISPR-Cas immunity, it is a target for nucleolytic degradation of MGE DNA that may also stimulate further adaptation. R-loop formation and nucleolytic processing of MGE DNA are collectively called ‘interference’. Readers requiring more mechanistic detail of these stages of CRISPR-Cas systems are directed to recent review articles [1,2].
Overview of the stages of CRISPR-Cas adaptive immunity.
There are two major classes in CRISPR-Cas system [3,4]. In type I systems, interference requires a complex of multiple proteins bound to crRNA, often called as ‘Cascade’ , that acts with a nuclease enzyme . Type II systems utilize a single protein interference complex, most famously Cas9 . Each class is further classified into multiple types and subtypes, described in [3-7]. Mechanistic details for the events shown in Figure 1 are detailed below where required, but the interested reader is also guided towards [1,2,8]. The biographical details of how CRISPR-Cas was discovered, including the derivation of CRISPR-associated jargon, and its subsequent development into a tool for biotechnology are detailed in [9-12]. Research into fundamental biology of CRISPR-Cas has been essential to underpin the ‘genome editing revolution’ that is now available in kit-form from retailers, and is widely reported in scientific literature, mainstream media, courtrooms and within some interesting niches of the blogosphere. Discoveries of new naturally occurring CRISPR-Cas interference proteins with differing properties to Cas9, such as Cpf1 and C2c1 [13-18], suggest new markets for novel gene editing tools. Novel chimeric protein fusions of Cas9 are also generating new ways to manipulate nucleic acids [19-21].
In this short survey, we focus on emerging ideas in analysing the fundamental biology of CRISPR-Cas in bacteria and archaea. We focus on interactions of CRISPR-Cas proteins with non-Cas host cell proteins that drive DNA replication, repair and HR, in the context that replication forks recruited for MGE replication, and R-loops formed by CRISPR-Cas interference are trigger points for capturing of MGE DNA for acquisition of new spacers, building CRISPR-based immunity.
CRISPR-Cas immunity: adaptation and interference
Adaptation (also called ‘spacer acquisition’) drives CRISPR-Cas immunity depositing into a CRISPR locus new spacers from MGE DNA. Spacer crRNA transcripts then provide the means for R-loop-based interference, when crRNA base pairs with the complimentary DNA sequence in an MGE. Physical and functional linkage between adaptation and interference, first identified in Escherichia coli  is a potent feedback circuit for updating CRISPR-Cas immunity, described later. Adaptation can be subdivided into two major events, target ‘DNA capture’ and its ‘integration’ into a CRISPR locus, extensively reviewed recently in . A ‘Cascade’ ribonucleoprotein complex targets a payload of crRNA to MGE DNA to catalyse CRISPR interference in most bacterial and archaeal type I CRISPR systems. There are significant differences in structural details and protein composition of Cascades from different phyla, but they share an underlying principle of assembling crRNA into the complex for pairing to DNA, forming an R-loop, first elucidated in E. coli . Interaction of Cascade with a nuclease, typically a Cas3-family translocase-nuclease, completes interference by degradation of MGE DNA and this may present new substrates for adaptation. Cas9 also forms R-loops for interference reactions , but negates the requirment for a Cas3-family translocase-nuclease by possessing intrinsic nuclease activity that creates a DNA double-strand break within the R-loop. This sequence-sensitive R-loop formation and nuclease activity within a single Cas9 protein polypeptide has made it specially suited for genome editing reactions.
Adaptation: integration of MGE DNA as CRISPR spacers
Two proteins widely distributed across all the CRISPR-Cas systems, Cas1 and Cas2, catalyse the capture and integration of new spacers into CRISPR loci. E. coli Cas1–Cas2 so far provides the most complete picture of the molecular events of protospacer integration and, to a lesser extent, protospacer capture. In two crystal structures, Cas1 homodimers are bound on either side of a Cas2 dimer in a butterfly-like (Cas1)4–(Cas2)2 structure bound to a splayed DNA duplex [25,26]. The 3′-ssDNA ends are accommodated into two of the four possible Cas1 active sites. DNA ends are bound with sequence specificity because Cas1 recognizes short ‘PAM’ sequences (protospacer adjacent motifs) as part of the DNA capture process, detailed in the next section. The reaction mechanism of integration by E. coli Cas1 has been detailed in recent papers [27,28]. Briefly, 3′-OH ends of protospacer DNA bound in Cas1 active sites are used for metal-dependent nucleophilic attack of DNA in a CRISPR locus. The initial integration event occurs at the boundary of the first CRISPR repeat and the leader, and the second site is at the repeat 1–spacer 1 boundary. This covalently attaches the protospacer to CRISPR DNA but generates ssDNA gaps flanking the integration site, which require filling, detailed more below. In E. coli Cas1–Cas2 structure, Cas2 acts as a scaffold, to position Cas1 subunits for catalysis; no catalytic activity is required from Cas2 . This topology ensures that E. coli Cas1–Cas2 integrates a DNA protospacer of defined size, identified in vitro as a 23-bp duplex with 3′ overhangs of five nucleotides in length, providing consistency to the length of spacers integrated into the CRISPR locus. Detailed knowledge of E. coli Cas1–Cas2 structure–function belies that there is a significant variation in how Cas1 and its associated proteins achieve adaptation across prokaryotic phyla. Among the type I CRISPR-Cas systems, Sulfolobus solfataricus Cas1 does not seem to form a complex with Cas2, yet its reaction mechanism is similar to E. coli Cas1 , and in Legionella species, Cas2 is a DNA/RNA nuclease that links CRISPR immunity with virulence . Other type I systems contain Cas1 proteins fused with Cas4 , an iron–sulphur 5′–3′ DNA nuclease , indicating that nucleolytic DNA processing during interference and integration are tightly coupled, with or without involvement of Cas2. An ancient CRISPR-Cas precursor ‘Casposon’ from Aciduliprofundum boonei has no Cas2 protein, and Cas1 alone is proficient at integrating protospacers into CRISPR as 14–15 bp spacers, much shorter than their counterparts in E. coli [33,34]. In type II CRISPR-Cas system of Streptococcus pyogenes, which contains the interference enzyme Cas9, a dimer of Cas1 interacts with a tetramer of Csn2 protein , suggesting an integration complex that is highly diverged from E. coli type I Cas1–Cas2. Cas9 is required for protospacer integration in this system , in contrast with E. coli in which Cas1–Cas2 alone are sufficient for an integration reaction, albeit one that lacks specificity and would require subsequent gap-filling and DNA ligation, as discussed later.
Adaptation and interference: priming and capturing of MGE DNA protospacers
The mechanics of protospacer DNA captured from an MGE are less understood than the subsequent protospacer integration, as a new spacer. Protospacer capture can be subdivided into ‘naïve’ and ‘primed’ pathways [37,38]. In naïve adaptation in E. coli, spacers from an MGE are generated with absolute dependency on Cas1 and Cas2, but independently of Cascade-catalysed interference. Naïve adaptation could therefore generate immunity against an MGE not previously encountered by the cell. However, the physiological relevance of naïve adaptation in many species may be questionable, especially if, as described later, primed adaptation can also create new spacers even without any escape mutation from an MGE. It is not clear how, in E. coli, Cas1–Cas2 would gain access to fragments of protospacer DNA for capture, but the involvement of non-Cas nucleases is a possibility, discussed later.
Primed adaptation manifests as elevated frequencies of spacer incorporation compared with naïve adaptation, in response to previously encountered MGEs [22,39]. In principle this phenomenon, ‘priming’ is the same as observed in metazoan cellular immune systems. Priming is the major driving force for new spacer generation in CRISPR immunity. E. coli priming requires Cas1 and Cas2, Cascade and Cas3. Analysis of interplay between these proteins, and perhaps other non-Cas host enzymes, is fascinating in trying to establish how priming occurs during the dynamic operations of the established immune response during an MGE attack, amidst the more routine cellular nucleic acid processing events.
CRISPR-Cas interference complexes, exemplified by Cascade and Cas9 form nucleoprotein R-loops in which crRNA is base-paired to MGE protospacer DNA (Figure 2). A crucial element for recognition of the protospacer is a trinucleotide ‘PAM’ on MGE DNA, the significance of which was first proposed years before its mechanistic importance was elucidated [40,41]. The nature of PAM recognition dictates that they are directly adjacent to the target site of an interference complex. However variability arises in the exact positioning, with the E. coli PAM located on the non-complimentary strand, conversely the S. pyogenes PAM is directly upstream of the target sequence on the complimentary strand. Multiple PAMs are often acceptable for Cascades, for example in archaeal organism Haloferax volcanii , but in E. coli there is a preference for the PAM sequence 5′-A[A/T]G-3′ . Recognition of PAM by Cascade subunits, in E. coli Cascade by Cse1 (CasA) , stimulates R-loop formation and recruits the ATP-dependent Cas3 nuclease-translocase [5,23,45,46], resulting in nicking of non-target strand DNA in the R-loop followed by unidirectional degradation of MGE DNA. In contrast with interference by the Cascade interference complex from E. coli, S. pyogenes Cas9 catalyses R-loop formation and nuclease activity via HNH and RuvC-like active sites [47,48]. Cas9 cleaves both strands of the target DNA, generating a DNA double-strand break, and remains DNA-bound before presumably its release by as yet unknown cellular factors. Priming of adaptation is triggered by mutations in PAM, and/or in DNA of the protospacer that result in mismatches with crRNA in interference R-loop [22,49]. Investigating molecular mechanisms of primed adaptation is currently a fast-moving field of research that have highlighted most recently that alterations to Cascade conformation changes its properties, such that Cas3 nuclease may not be directly recruited to a Cascade interference complex, thereby preventing the onset of nuclease stages of interference [50,51]. However, in this scenario Cas1–2 may facilitate recruitment of Cas3, by an unknown mechanism . Bidirectional Cas3 helicase/translocase activity is observed however, leading to the suggestion that E. coli Cas3 may transport Cas1–2 along DNA for sampling of MGE DNA for capture [52,53]. Cascade may also recruit Cas3 in a PAM-associated manner, resulting in the cleavage of target DNA into short fragments 30–100 bp in length which can then be processed as new protospacers by/for Cas1–2 to undergo integration . Observations of physical interaction between Cas1–Cas9 in type II-B system, suggests that there is a general principle to facilitate adaptation through physical and functional interaction between proteins of protospacer capture and interference in an ‘adaptasome’ machine. We now look at how non-Cas proteins may contribute to CRISPR-Cas immunity, with emphasis on the roles of proteins hitherto more associated with DNA repair and HR, especially in bacteria.
Structural representation of nucleoprotein R-loop complexes.
Examples of non-Cas host proteins in CRISPR-Cas immunity
Non-Cas proteins are essential players in promoting, controlling and inhibiting the three stages of CRISPR-Cas immunity (Figure 3). Diversity of CRISPR-Cas systems is mirrored in emerging knowledge of networks for transcriptional regulation of CRISPR-Cas systems in different organisms, reviewed recently . H-NS is a consistent performer across species for regulating CRISPR-Cas systems, acting as a transcriptional repressor, but there are many more unique or specialized effects of other transcriptional regulators. In E. coli, the histone-like DNA binding protein IHF (integration host factor) has a crucial role in promoting adaptation in the type I-E CRISPR-Cas system by guiding protospacer bound Cas1–Cas2 to the leading end of a CRISPR locus for integration of a new spacer . E. coli IHF was discovered for its ability to promote integration of phage λ, and was subsequently identified as a modulator of recombination, DNA replication and transcription, via sequence-specific manipulation of DNA structures, and by interaction with other nucleic acid processing enzymes . The importance of IHF in CRISPR adaptation, at least in E. coli, highlights how fundamental nucleic acid processing mechanisms can be multitasked to underpin more specialized activities. Genetic data from analyses of primed and naïve adaptation in E. coli suggest that multiple enzymes of recombination and DNA repair may also support CRISPR adaptation catalysed by Cas1 and Cas2.
Cas and non-Cas proteins that interplay during CRISPR immunity.
Interactions between CRISPR-Cas and DNA replication, recombination and repair
Experiments in E. coli have identified interesting interactions between effectors of HR or DNA repair, and Cas1 or adaptation processes more generally [57-59]. In one analysis , E. coli cells provoked by genotoxic agents showed reduced viability when the gene encoding Cas1 (ygbT) alone was deleted, sensitivity that was epistatic to deletion of genes encoding the HR ‘resolvasome’ proteins RuvABC . This was the first suggestion for potential functional interaction between Cas1 and HR. In the same analysis, purified Cas1 protein physically interacted with RuvB, an ATP-dependent DNA helicase that with RuvA and RuvC branch migrates and resolves Holliday junction DNA molecules formed during late stages of HR . HR can be initiated from dsDNA breaks by resection of the DNA ends by RecBCD protein complex, a bipolar DNA translocase and nuclease . This generates ssDNA that is bound by RecA recombinase and directed into sequence homology searches to generate synaptic products called ‘displacement loops’ (D-loops), that are similar to aforementioned R-loops, but with an invading DNA, not RNA, strand. Interestingly, E. coli Cas1 protein also reportedly physically interacts with RecB and RecD . Functional significance of this or of Cas1 interaction with RuvB, is unclear. Genetic analyses of CRISPR adaptation in E. coli have identified that RecBCD can be a driver of naïve adaptation [57,58], although it is not needed for primed adaptation . A role for RecBCD in naïve adaptation has been proposed , exploiting its ability to recognize a specific DNA sequence on the E. coli chromosome called Chi (5′-GCTGGTGG-3′). This is proposed to help distinguish ‘self’ from ‘non-self’ DNA, leading to protospacer capture targeted to an MGE rather than the host chromosome. In this model, double-strand breaks in DNA would be necessary, to provide a substrate for canonical RecBCD resection activity. Although these might arise randomly as a result of chemical damage to DNA, it would seem a haphazard way of beginning the CRISPR adaptation process, especially if dsDNA breaks are limited to only (estimated) 10–25 occurrences per cell, per day . It is unclear how the DNA products formed by RecBCD, fragmented ssDNA, even after engagement with chi, could become splayed DNA duplexes that are bound by Cas1–Cas2. Some other factor(s) may be able to influence DNA break formation or recombination enzymes may aid and abet CRISPR adaptation via other modes of supporting DNA replication forks . Observation that RecBCD is not needed for primed adaptation  suggested that targeting of MGE DNA by Cascade interference, and subsequent Cas3, and/or other, nuclease activity are sufficient for a ‘self-/non-self’ distinction brought about by crRNA of Cascade targeting MGE DNA.
In contrast with different requirements for RecBCD in E. coli CRISPR, naïve or primed adaptation, a deletion of the gene encoding DNA polymerase I (polA) in E. coli resulted in no detectable expansion of CRISPR loci in either adaptation pathway . This requirement is compatible with DNA ‘gap filling’, which is required after integration of a new protospacer into a CRISPR locus, and is consistent with the known role of DNA polymerase I in DNA repair pathways. In addition to potential roles for Ruv, RecBCD and DNA polymerase I proteins in adaptation, DNA repair helicases RecG and PriA (sometimes also called primosomal protein N), which are noted for remodelling DNA at blocked replication fork blocks and ter sites, have also been shown to be important for primed, but not naïve, adaptation . In one model, it was proposed that Cascade interference complexes might act as roadblocks to MGE replication, which relies on the E. coli host replisome. In this model, Cascades form site-specific barriers to MGE replication dictated by the sequence of crRNA, and that this is a trigger point for adaptation by interaction with Cas1–Cas2. Roles for RecG and PriA in this model are unclear, although we speculate that the known ability of RecG to dissociate R-loops may be important for controlling access of Cas enzymes to DNA and potentially to prevent R-loop primed DNA replication, which is a known cause of genome stability. Overall physical and function interplay between effectors of DNA replication, repair and recombination, and CRISPR-Cas proteins (summarized in Figure 4) are significant but poorly understood, requiring further work to establish mechanism and especially if CRISPR interference antagonizes DNA replication.
The interplay between the effectors of DNA replication, repair and recombination, and CRISPR-Cas proteins.
This work was supported by the BBSRC [grant number BB/M020541/1]; and the BBSRC DTP scheme. The work on CRISPR-Cas was carried in the Bolt lab.
The authors declare that there are no competing interests associated with the manuscript.