CRISPR (clustered regularly interspaced short palindromic repeats) together with cas (CRISPR-associated) genes form the CRISPR–Cas immune system, which provides sequence-specific adaptive immunity against foreign genetic elements in bacteria and archaea. Immunity is acquired by the integration of short stretches of invasive DNA as novel ‘spacers’ into CRISPR loci. Subsequently, these immune markers are transcribed and generate small non-coding interfering RNAs that specifically guide nucleases for sequence-specific cleavage of complementary sequences. Among the four CRISPR–Cas systems present in Streptococcus thermophilus, CRISPR1 and CRISPR3 have the ability to readily acquire new spacers following bacteriophage or plasmid exposure. In order to investigate the impact of building CRISPR-encoded immunity on the host chromosome, we determined the genome sequence of a BIM (bacteriophage-insensitive mutant) derived from the DGCC7710 model organism, after four consecutive rounds of bacteriophage challenge. As expected, active CRISPR loci evolved via polarized addition of several novel spacers following exposure to bacteriophages. Although analysis of the draft genome sequence revealed a variety of SNPs (single nucleotide polymorphisms) and INDELs (insertions/deletions), most of the in silico differences were not validated by Sanger re-sequencing. In addition, two SNPs and two small INDELs were identified and tracked in the intermediate variants. Overall, building CRISPR-encoded immunity does not significantly affect the genome, which allows the maintenance of important functional properties in isogenic CRISPR mutants. This is critical for the development and formulation of sustainable and robust next-generation starter cultures with increased industrial lifespans.
Streptococcus thermophilus and dairy products
S. thermophilus is a domesticated lactic acid bacterium widely used in the formulation of industrial dairy starter cultures for the fermentation of milk into yogurt and cheese. This important food-grade species is notably used in the production of cheddar and mozzarella, as well as hard cooked cheeses such as emmental. It is a critical ingredient in the ~US$50 billion annual dairy products business, and it is estimated that annual human consumption of S. thermophilus exceeds 1021 cells [1,2]. The important functional features inherent to S. thermophilus include rapid acidification of milk through the production of lactic acid, as well as the ability to enhance the textural and organoleptic properties of fermented dairy products through the production of exopolysaccharides, acetaldehyde and diacetyl . The inclusion of this organism in many food products probably explains its relatively high abundance in the gastrointestinal tract of humans , where it may play additional roles yet to be determined. This is consistent with the extensive historical use of S. thermophilus in the fermentation of dairy products for hundreds of years, if not conservation of milk by humans for millennia [2,4]. Recent advances in food science and microbiology have furthered the rigorous selection of highly desirable strains for their exploitation on an industrial scale, their formulation in complex and high-performance starter cultures, and their functional and genomic characterization. Accordingly, S. thermophilus has been the subject of extensive research efforts over time. Genomic analyses of this industrially relevant bacterium have provided insights into the genetic basis for several of its physiological functions carried out throughout the fermentation processes, as shown in genomic studies of CNRZ1066 , LMG 18311 [1,2,5] and LMD-9 [6,7], and the recent genome sequencing of additional strains including ND03 , JIM8232  and MN-ZLW-002 .
Industrial phages and defence systems
Of critical importance, the ubiquitous presence of bacteriophages (phages) in the environment has negatively and repeatedly affected fermentation processes in industrial settings [11,12], as phage predation can interfere with the acidification of milk by starter cultures in (very) large fermentation tanks. Phages can actually persist in manufacturing facilities given their resistance to pasteurization, airborne dissemination and the practical challenges inherent to sanitation strategies in food-grade manufacturing settings. Consequently, dairy manufacturers and providers of starter cultures have relied on the exploitation of bacterial phage-resistance systems to safeguard the production of important and traditional food products. The formulation of starter cultures often relies on combining multiple phage-resistance mechanisms and strains in rotation strategies that allow sustainable use of the most effective and resistant strains that carry critical functional properties [4,13].
A plethora of phage defence systems occur in lactic acid bacteria, including prevention of adsorption, blocking of injection, abortive infection, R-M (restriction–modification), toxin–antitoxin systems and the recently characterized CRISPR (clustered regularly interspaced short palindromic repeats) loci [13,14]. These defence systems may be combined and occasionally engineered to rein in phage populations in industrial settings. Whereas traditional defence strategies relied originally on systems such as the prevention of adsorption and blocking of DNA injection, and subsequently on R-M systems and abortive infection, the importance of the newly described CRISPR–Cas (CRISPR-associated) systems in providing phage resistance in S. thermophilus has had a dramatic impact on the management of phage-related issues [14–16].
CRISPR–Cas immune systems
CRISPR loci, together with cas genes form the CRISPR–Cas immune system, which is present in ~46% of bacteria and almost 90% of archaea [17,18]. CRISPR loci typically consist of several non-contiguous short DNA repeats separated by stretches of non-repeated elements, called spacers, that derive from invasive nucleic acids such as viruses and plasmids [14–16,19–21]. These hypervariable loci differ widely across genera and species in terms of number of CRISPR–Cas systems and core elements such as the cas genes, repeat sequence and spacer content . Overall, three distinct types of CRISPR–Cas systems have been established on the basis of the sequence of universal cas1 and cas2 genes, as well as the occurrence of signature genes, namely cas3, cas9 and cas10 for Types I, II and III respectively . Overall, CRISPR–Cas systems provide immunity in three distinct steps: (i) adaptation, where immunity is acquired by integration of new spacers from invasive elements into CRISPR loci; (ii) expression, where CRISPR loci are transcribed and processed into mature non-coding interfering crRNAs (CRISPR RNAs); and (iii) interference, where crRNAs guide Cas proteins for sequence-specific cleavage of complementary nucleic acids [22–29]. Originally, CRISPR–Cas systems were used for genotyping purposes [6,30], on the basis of their hypervariable nature, which has been exploited for epidemiological studies, and studying the interplay between hosts and viruses in natural systems [16,31–35]. Subsequently, their ability to interfere with foreign genetic elements has been used for building immunity against phages and to preclude plasmid uptake. More recently, the ability to reprogramme the Cas9 endonuclease using small synthetic guide RNAs has revolutionized genome editing [36–39].
Four distinct CRISPR–Cas systems have been identified in S. thermophilus, including CRISPR1 and CRISPR3, that both have the ability to rapidly acquire novel spacers in response to phage challenge [15,22,40,41] or plasmid exposure . Although the occurrence of these four loci can vary widely in this species , the model strain DGCC7710 in which the first demonstration that CRISPR–Cas systems provide adaptive immunity in prokaryotes was shown  carries all four . An overview of CRISPR–Cas systems present in S. thermophilus genomes is provided in Table 1. CRISPR1 and CRISPR3 are both subtype II-A CRISPR–Cas systems; however, each locus is associated with a distinct set of four cas genes, including the universal cas1 and cas2 genes, the Type II cas9 signature gene involved in target DNA cleavage [21,42–44], and csn2, a gene exclusively found in this particular subtype. The subtype III-A CRISPR2–Cas system encodes several cas genes including the universal cas1 and cas2 genes, the Type III signature gene cas10 and cas6, which is important in crRNA biogenesis. Lastly, the subtype I-E CRISPR4–Cas system is associated with eight cas genes including the cas1 and cas2 universal genes, the Type I signature gene cas3 which is involved in target nucleic acid degradation [45,46], and the Cascade (CRISPR-associated complex for antiviral defence)-encoding genes. Among the S. thermophilus strains sequenced to date, this locus uniquely occurs in DGCC7710 (Table 1).
|CRISPR1 (subtype II-A)||CRISPR2 (subtype III-A)||CRISPR3 (subtype II-A)||CRISPR4 (subtype I-E)|
|Strain||Repeats||cas genes||Repeats||cas genes||Repeats||cas genes||Repeats||cas genes|
|CRISPR1 (subtype II-A)||CRISPR2 (subtype III-A)||CRISPR3 (subtype II-A)||CRISPR4 (subtype I-E)|
|Strain||Repeats||cas genes||Repeats||cas genes||Repeats||cas genes||Repeats||cas genes|
The S. thermophilus genome
The genome sequence of the industrial S. thermophilus DGCC7710 (DuPont Global Culture Collection) strain was determined by 454 pyrosequencing using Roche GS FLX technology. A total of 220377 reads totalling 52211361 nt was used for assembly, representing approximately a 29× coverage of the genome. Primary assembly of raw sequencing data was performed using Newbler's gsAssembler program (Roche), which generated 155 contigs ranging in size between 100 and 194234 bp. Large contigs above 500 bp were subsequently ordered with the ProgressiveMauve software  using the LMD-9 complete genome as a template. Several gaps were closed by Sanger sequencing of PCR amplicons, and the assembly was validated further using optical mapping (OpGen). The assembled DGCC7710 draft genome (accession number AWVZ00000000) consists of 17 contigs totalling 1798341 bp, which encode at least 2124 ORFs. The relatively high synteny observed across all the S. thermophilus genomes sequenced to date is illustrated by the small number of blocks provided by the ProgressiveMauve alignment, with notably two large blocks spanning two-thirds of the genome (Figure 1). Apart from a small number of genes missing in some genomes, it is noteworthy to highlight that most genes are widely conserved across genetically distinct strains. With ‘only’ five blocks spanning the other one-third of the genome, it is also fairly obvious that, overall, gene content and organization are highly conserved in the S. thermophilus species (Figure 1).
Comparative analysis of S. thermophilus genomes
Consistent with the functional properties of the strain, the genome of DGCC7710 includes numerous important genes involved in the fermentation of milk into dairy products such as yogurt. Specifically, given the milk acidification reliance on lactose fermentation, the presence of a dedicated lacS permease associated with the lacZ β-galactosidase which feeds into the glycolysis and Leloir pathways for eventual lactic acid production is a staple of the S. thermophilus genome [4,7]. Additionally, the ability to generate texture during yogurt fermentation is a key functional feature of selected S. thermophilus starter strains. The eps gene cluster within the DGCC7710 genome illustrates the ability of this strain to generate exopolysaccharides that yield desirable viscosity and mouth feel . An important feature encoded within the DGCC7710 genome is the molecular machinery involved in natural competence, which has been described as a new molecular biology tool instrumental in manipulating and investigating this organism [49–51].
Another important feature of S. thermophilus genomes is the widespread presence of transposases and IS (insertion sequence) elements that reflect genomic plasticity and further complicate the assembly and genome closing process. Nevertheless, a careful analysis of the differential content between genomes, and especially the unique content in the genome of DGCC7710, reveals the presence of multiple genomic islands that encode functionally important genes (Figure 1). The identified diverse and unique genomic islands are consistent with previous reports showing that differential content between various strains of S. thermophilus primarily consist of mobile genetic recombinases, exopolysaccharide biosynthesis enzymes, bacteriocins and phage-resistance mechanisms including R-M and CRISPR . Diversity between strains in the eps gene cluster has repeatedly been shown in S. thermophilus, and may involve horizontal transfer of genes in connection with transposable elements such as IS3, IS6 and ISL3 families [48,52]. Similar lateral gene transfer events may also be mediated by ICEs (integrative conjugative elements), and the spread of the important prtS cell-wall proteinase likewise exemplifies the diversity, plasticity and adaptability of S. thermophilus genomes [9,51,53].
Overall, the genome sequence of DGCC7710 illustrates further the evolutionary path of the S. thermophilus species towards adaptation to milk and highlights the importance of phage-resistance systems, notably R-M and CRISPR–Cas.
Building CRISPR immunity through iterative phage challenges
In order to enhance the phage resistance in DGCC7710, four iterative cycles of phage exposure followed by CRISPR BIM (bacteriophage-insensitive mutant) selection were performed as outlined previously [15,21], using a set of biodiverse lytic phages chosen for their virulence spectra. The iterative rounds of phage exposure chronologically generated DGCC9705 (following challenge with phage 2972), DGCC9726 (following challenge with phage 3821), DGCC9733 (following challenge with phage 3288) and, eventually, the fourth-generation BIM DGCC9836 (following challenge with phage 4753) which was insensitive to all four phages (Figure 2).
CRISPR immunization using iterative phage challenges
A detailed sequence analysis of CRISPR loci revealed polarized insertion of phage protospacer sequences at the leader end of both CRISPR1 and CRISPR3. All acquired protospacers were systematically associated with a PAM (protospacer-adjacent motif) [30,40]. This is consistent with previous reports characterizing the PAMs in active S. thermophilus CRISPR–Cas systems, and their involvement in both spacer acquisition and invasive nucleic acid interference [21,44,54]. In contrast, no CRISPR acquisition was detected in CRISPR2 or in CRISPR4. Notwithstanding the propensity of inactive CRISPR loci for evolution by internal spacer deletion through homologous recombination between identical CRISPR repeats, we observed strict conservation of CRISPR spacer content and CRISPR repeat sequences, suggesting that even inactive CRISPR–Cas systems in S. thermophilus can be relatively stable, at least within the timeframe that generated these BIMs. We cannot rule out the possibility that these CRISPR–Cas systems are active or that they may be involved in biological roles that go beyond immunity against foreign genetic elements, which would be consistent with the detection and occasional induction of cas genes and the proteins they encode in this strain . The extent and rapidity of novel spacer acquisition observed here in a laboratory system corresponds to similar findings derived from natural microbial communities exposed to phages where only the most recently acquired spacers match coexisting phages , which is consistent with modelling predictions .
The propensity of both active CRISPR–Cas systems to readily acquire novel spacers following phage exposure provides a convenient and efficient option to develop strains with a broad range of phage resistance through iterative challenges. Furthermore, the sequential addition of spacers that collectively confer increasing levels of resistance provides a molecular basis for enhancing phage resistance depth. Accordingly, we anticipate that building CRISPR immunity iteratively using a diversity of phages will allow the biogenesis of novel strains with increased phage resistance in terms of both spectrum breadth and resistance depth, rapidly extending the lifespan of commercial cultures for perennial use in industrial settings where phages are ubiquitously problematic.
A practical advantage of generating BIMs with multiple spacers is that they provide a unique set of chromosome-encoded sequences that can be used as natural genetic tags theoretically rare and unique to selected, thus proprietary, strains. Likewise, these hypervariable sequences and dynamic genetic loci provide a molecular basis for high-resolution genotyping of even very closely related isolates [16,30]. Genomically, the argument could be made that the biogenesis of a diverse population of CRISPR genotypes derived from the exposure of a single original wild-type strain establishes novel genetic biodiversity or strains. This natural approach can be readily replicated or synthetically implemented in laboratory settings using molecular biology techniques. Furthermore, the recent development of small guide RNAs that drive Cas9-mediated interference provide a molecular basis for CRISPR-mediated, but CRISPR spacer-independent, (re)programmable immunity [36,57].
Impact of CRISPR immunization on the host genome
A comparative analysis of the terminal BIM (DGCC9836) draft genome (1762882 bp) compared with that of the wild-type revealed the presence of 552 bp of differential content. It was readily determined that 526 bp were associated with CRISPR-related insertions resulting from the addition of novel CRISPR spacers into the two active CRISPR loci, namely CRISPR1 and CRISPR3 (Figure 2). This is consistent with the experimental design whereby intermediates were specifically selected following PCR screening of locus (CRISPR1 and/or CRISPR3), with size increase reflecting CRISPR adaptation events. Overall, the differences consisted of 16 putative INDELs (insertions/deletions) and three SNPs (single nucleotide polymorphisms) (Table 2). Each difference predicted in silico was subjected to Sanger re-sequencing both in the wild-type and final BIM, but also in the three intermediate BIMs (Table 2). Given the anticipated impact difference between an INDEL (loss and frameshifts) as opposed to SNPs (non-synonymous at worst), we were initially surprised by the high ratio of predicted INDELs/SNPs. Intriguingly, a majority of the INDELs (ten of 16) actually corresponded to single nucleotide deletions. Careful analysis of the genomic context of these sequences revealed that these single nucleotide INDELs occurred primarily within homopolynucleotidic sequences, a caveat of next-generation pyrosequencing technologies. Likewise, sequencing results revealed that SNP2 was derived from a pyrosequencing error. In contrast, INDEL2 was detected within the first round of phage challenge (Table 2). Similarly, SNP3 was also validated and documented to occur during the initial phage exposure. The anticipated insertion of novel spacers within the active CRISPR1–Cas and CRISPR3–Cas systems, INDEL8 and INDEL14 respectively, were validated. These results confirm the high rate of incorrect SNP and INDEL predictions arising from next-generation sequencing technologies, and highlight the need to systematically re-sequence putative mutations using Sanger sequencing, as not to overestimate mutation rates.
Analysis of INDEL2 revealed the in-frame insertion of three nucleotides in the first generation BIM DGCC9705, leading to the insertion of an aspartic acid residue within the predicted protein sequence of a putative bacterial capsule synthesis protein/poly-γ-glutamate biosynthesis enzyme. Analysis of the occurrence of this gene in other S. thermophilus genomes, namely STER_0153 in LMD-9 or stu0110 in LMG 18311, revealed annotations as uncharacterized or unknown or hypothetical proteins. This sequence seems to be highly conserved across Streptococcus spp., with up to 96% similarity in most species, including S. pneumoniae, S. suis, S. equii, S. mutans, S. pyogenes and S. agalactiae.
In silico analysis of INDEL13 indicated the deletion of a single nucleotide (A) within a poly(A) stretch during the fourth phage challenge, generating the DGCC9836 BIM. This mutation yields a premature stop codon, leading to a predicted truncated protein (36 instead of 666 amino acids). This mutation is located towards the 5′ end of a gene encoding a putative ABC (ATP-binding cassette) transporter/permease. This gene is also present in CNRZ1066 (str1333) and LMG 18311 (stu1333), predicted to encode a peptide-4 ABC exporter, whereas it is annotated as an antimicrobial peptide transporter in LMD-9 (STER_1307). Likewise, this sequence has orthologues in many Streptococcus spp., including S. pneumoniae, S. sanguinis, S. salivarius, S. suis and S. gordonii.
The first SNP (SNP1, Table 2) is a T>A silent mutation in strain DGCC9705, within the STER_0096 (LMD-9) orthologue, predicted to encode a leucyl aminopeptidase (aminopeptidase T).
The third SNP (SNP3, Table 2) is a C>T mutation in strain DGCC9836, at the 3′ end of the STER_1849 (LMD-9) orthologue, predicted to encode the small regulatory subunit of an acetolactate synthase. This non-synonymous mutation is predicted to change the last residue of the 158-amino-acid protein, replacing an asparagine residue with aspartic acid.
Overall, the comparative analysis of the draft genome sequences of the parental strain with that of a fourth-generation CRISPR BIM initially suggested the presence of 16 INDELs and three SNPs (Table 2), of which two were selected for (spacer acquisitions in the active CRISPR1 and CRISPR3 loci), and ‘only’ four validated (two SNPs and two INDELs), since the majority of predicted differences were sequencing artefacts. We are mindful that these sequences may ‘only’ represent 97% of the complete genome size, and that other mutations may have occurred in the remaining gaps, which consist primarily of ribosomal DNA sequences and transposons. Altogether, these results indicate that multiple iterative phage challenges primarily give rise to novel spacer insertions within active CRISPR loci, and that there are occasional mutations consistent with natural evolutionary events giving rise to SNPs and small INDELs. Surprisingly, the stress inherent to phage exposure does not seem to significantly affect the mutation rate or evolutionary pattern of S. thermophilus genomes, other than CRISPR immunization events, indicating that CRISPR-mediated processes arguably generate isogenic variants.
Although the two primary forces driving the overall genome evolution of S. thermophilus consist of genome reduction by iterative gene losses in combination with occasional acquisition of beneficial genes through horizontal gene transfer for adaptation to a rich environment (primarily milk) [6,53,58,59], we show in the present article that CRISPR plays a major role in genome evolution following exposure to phages. Indeed, regressive genome evolution by extensive gene loss has been a key driving force shaping the adaptation of S. thermophilus to the rich milk environment, illustrated by the loss of virulence genes widely distributed in most streptococci. Overall, the DGCC7710 genome shares a high degree of synteny with other S. thermophilus genomes, with a few unique genomic islands and hypervariable loci that include the eps operon, the gp operon and CRISPR–Cas systems. Focusing on genome interplay within host–virus dynamics, we propose that the impact of the virus on host genome evolution is relatively limited, primarily consisting of CRISPR immunity build-up, whereas the effects of the resistant host on viral genome evolution are conversely relatively widespread, consisting of extensive protospacer mutations, PAM mutations and occasionally recombination.
The development of model laboratory spacer-acquisition systems, together with knowledge inferred from CRISPR adaptation in natural systems, and the development of mathematical models for CRISPR locus evolution is rapidly expanding our limited understanding of the adaptation/acquisition phase [25,34,56]. This makes DGCC7710 an appropriate model organism to fundamentally investigate the balance between CRISPR spacer acquisition and occasional loss, and investigate the impact of CRISPR immunity build-up on genome evolution. Notwithstanding the expanding understanding of the organization, content, mechanistic and molecular underpinnings of CRISPR-mediated targeting of complementary nucleic acids, relatively little is known about the short- and long-term interplay with viruses. The present article is the first report of iterative build-up of CRISPR-encoded immunity against a series of genetically distinct phages, and shows that CRISPR immunization does not have a significant impact on the host genome.
Practically, the work described in the present article provides a proof of concept for the development of next-generation starter cultures with naturally generated CRISPR immunity, optimally developed through multiple iterative rounds of exposure to a diversity of industrially relevant phages. Overall, these results confirm that active CRISPR loci are subject to rapid evolution by acquisition of novel spacers derived from protospacers associated with PAMs. Moreover, the data indicate that both active and inactive loci are stable with regards to spacer loss, which is consistent with a previous metagenomic study of this system on a similar timeframe . Eventually, iterative addition of multiple spacers derived from different phages provides increased immunity in terms of both level and spectrum of resistance. This natural process then relies on the availability of virulent phages, and the ability to readily screen CRISPR BIMs using PCR monitoring the polarized insertion of novel spacers in active CRISPR loci. The concurrent use of multiple CRISPR–Cas systems that recognize and target different PAMs should provide additional pressure on phage genomes.
Exploiting CRISPR-based strategies to enhance phage resistance in various S. thermophilus strains and using multiple isogenic variants in rotation schemes will ensure sustainable and perennial use of the most efficient and desirable strains with extended lifespans. Harnessing CRISPR immunity may be enhanced further by combination with other efficient and compatible phage-resistance mechanism . These results further our understanding of virus–host dynamics, especially with regard to the impact of CRISPR immunity, and provide an evolutionary framework for the analysis of the interactions between bacteria and their phages, and their ecological impact.
CRISPR Evolution, Mechanisms and Infection: A Biochemical Society Focused Meeting held at the University of St Andrews, U.K., 17–19 June 2013. Organized and Edited by Emmanuelle Charpentier (Laboratory for Molecular Infection Medicine Sweden, Sweden), John van der Oost (Wageningen University, The Netherlands) and Malcolm White (University of St Andrews, U.K.).
We are thankful for the many insightful conversations we have had with Sylvain Moineau and team members in his laboratory at Université Laval (Québec, Canada). We also thank Mickaël Charron, DuPont Nutrition and Health, for his technical support in gap closure.
This work was supported by funding from DuPont Nutrition and Health.