The fields of molecular genetics, biotechnology and synthetic biology are demanding ever more sophisticated molecular tools for programmed precise modification of cell genomic DNA and other DNA sequences. This review presents the current state of knowledge and development of one important group of DNA-modifying enzymes, the site-specific recombinases (SSRs). SSRs are Nature's ‘molecular machines’ for cut-and-paste editing of DNA molecules by inserting, deleting or inverting precisely defined DNA segments. We survey the SSRs that have been put to use, and the types of applications for which they are suitable. We also discuss problems associated with uses of SSRs, how these problems can be minimized, and how recombinases are being re-engineered for improved performance and novel applications.
The Genetic Revolution is upon us. Just as in its Industrial Revolution predecessor, new processes, tools and materials are taking over at an astonishing pace, and old ways of working are being swept away. Following the advent of the DNA double helix structure in 1953, decades of painstaking basic research into the enzymes that make, maintain and process DNA molecules have provided us with machinery that we can now apply to characterize and manipulate DNA on a scale that would have been hardly credible only a few years ago. At the dawn of the Genetic Revolution, restriction enzymes, ligases and polymerases were the essential ‘molecular machines’, but more recently we have transformed our capacity to make specific changes to genomic DNA with new enzyme systems such as targeted nucleases [e.g. zinc-finger nucleases (ZFNs) and transcription activator-like effector nucleases (TALENs)], clustered regularly interspaced short palindromic repeats-CRISPR-associated gene 9 (CRISPR-Cas9) and recombineering [1,2]. Another group of these ‘second generation’ systems, the site-specific recombinases (SSRs), has become established as a mainstay for highly specific manipulations of DNA, and is reviewed here.
WHAT ARE SITE-SPECIFIC RECOMBINASES?
Site-specific recombination was first identified in bacteria, the original example being the system that integrates and excises the bacteriophage λ genome from a specific site in its Escherichia coli host genome [3,4]. Since then, thousands of site-specific recombination systems have been identified, the vast majority from bacteria but also some from archaea and microbial eukaryotes (especially yeasts) [5–7]. In site-specific recombination, two short DNA sequences (sites) at separate locations within a single DNA molecule or in separate molecules are brought together; the DNA strands are then broken at specific phosphodiester bonds within the sites, and the broken ends are rejoined in a new arrangement to form recombinants (Figure 1A). The sequence recognition and chemical catalysis steps of this process are implemented by a dedicated enzyme of the system, the SSR. The process does not involve any DNA synthesis or degradation, or any cofactors such as ATP; it is often therefore referred to as conservative site-specific recombination to distinguish it from processes such as transposition, homologous recombination (HR) and non-homologous end-joining (NHEJ) (see below). Another distinction from HR is that site-specific recombination does not depend on long stretches of DNA identity or similarity in the two sites. In the simplest site-specific recombination systems, each site is recognized and bound by a SSR dimer (sometimes formed by co-operative on-site assembly from monomers). In more complex systems, the ‘crossover site’ (or ‘core site’) bound by the SSR dimer is adjacent to ‘accessory’ sequences that are recognized and bound by the SSR itself and/or other proteins (Figure 1B) [5,8]. The complexity of the system (need for accessory sequences and factors) can affect its usefulness as a practical DNA manipulation tool. Of the large number of SSRs known from DNA sequence analysis, only a small fraction has been characterized biochemically, and even fewer (the main subjects of this review) have been used for DNA manipulations.
Site-specific recombination: the basics
SSRs have close mechanistic affinities with another large group of DNA-rearranging enzymes, the transposases. The function of a transposase is to move a defined segment of DNA (a transposon) from one genomic location to another. To do this, the transposase recognizes and binds to sequences at each end of the transposon, brings the ends together, then cuts the DNA strands at precisely defined positions, analogous to the action of a SSR. However, in transposition a third ‘target’ DNA sequence is involved, which may or may not be recognized specifically by the transposase. The cut DNA ends of the transposon attack the strands of the target site to form new phosphodiester bonds, thereby transferring the transposon sequence to this new location. We will not go into further details on transposition here; the reader is directed to recent reviews [5,9] for further information. However, it is worth bearing in mind that transposition can be a viable alternative to site-specific recombination for some applications (such as integration of a transgene into genomic DNA) [10,11].
OUTCOMES OF SITE-SPECIFIC RECOMBINATION
Site-specific recombination reactions can occur in vivo, in cells expressing the SSR, or in vitro, using purified SSR proteins . Reactions always involve two DNA recombination sites (see above), which are often (although not necessarily) identical or very similar in sequence. In general, sites have a polarity; that is, their left and right ends are distinct (this is often a consequence of asymmetry in the central overlap sequence of the site–see below). Typically, the left end of one site will be rejoined to the right end of the partner site, and vice versa; left–left and right–right recombinants are not formed. As a consequence, the arrangement of the recombining sites affects the outcome (Figure 1C). In one type of reaction, a site on a circular DNA molecule (such as a plasmid) recombines with a site on another molecule (which might, for example be another plasmid, or genomic DNA). The result is that the circular DNA is integrated into the second molecule, with directly repeated (head to tail) sites flanking the integrated DNA. In a reversal of this type of reaction, a segment of DNA flanked by directly repeated sites is excised from the remainder of the molecule (this is sometimes called deletion or, in some circumstances, resolution). A third type of reaction occurs when a segment of DNA is flanked by sites in inverted repeat (head to head). In this case, recombination does not cut the DNA out, but instead inverts its orientation relative to the remainder of the molecule. The length of the DNA between the sites in these intramolecular reactions is not a critical factor; typical SSRs will tolerate lengths from 200 bp or even less, up to hundreds of kilobase pairs. A final possibility, less commonly observed in Nature, is a reaction between sites on two linear molecules, generating two recombinant linear products.
WHY USE SITE-SPECIFIC RECOMBINASES?
Site-specific recombination is most suitable for insertion, deletion or reconfiguration of substantial ‘chunks’ of DNA sequence (e.g. hundreds or thousands of base pairs), rather than for highly targeted changes such as point mutations [12,13]. SSRs can thus be regarded as tools for ‘cut-and-paste’ genome editing. Advantages of using SSRs rather than other methodologies for DNA rearrangement are summarized below.
As the name suggests, site-specific recombination is strictly confined to specific DNA sequences. A typical crossover site of 30–40 bp is very likely to be unique even in a large genome. In practice, some minor variations of the site sequence may be tolerated by SSRs, but, even so, off-site activity is usually very low and may compare favourably to other DNA targeting systems such as meganucleases, ZFNs, TALENs and CRISPR-Cas9 [2,14].
Under optimal conditions, site-specific recombination can be very fast and efficient, in vitro and in vivo. Some natural systems with directionality (see below) have been shown to approach 100% conversion of substrates to recombinant products in a few minutes in vitro, and at comparable rates in vivo [15–17]. The intrinsic rates of many other SSR-mediated recombination reactions may be similar, although interfering factors in experimental organisms may reduce conversion rates in vivo to ‘half-times’ of hours or days. SSR efficiency is typically much higher than is observed for other DNA-rearranging systems such as targeted nuclease-stimulated HR (see below), or transposition [10,11].
Site-specific recombination normally requires only the two DNA sequences that are to recombine and the SSR protein. The SSR promotes the complete recombination reaction, breaking and rejoining all four DNA strands at the pair of recombining sites. No other cofactors or host systems are required. Furthermore, the recombinant products generally still contain functional sites, so further SSR-mediated manipulations are possible. In contrast, all other commonly used in vivo genetic modification systems require host factors to complete the modification. For example targeted nucleases (including meganucleases, ZFNs, TALENs and CRISPR-Cas9) just make a DNA double-strand break (DSB), which must then be processed by (error-prone) NHEJ-mediated repair to introduce unspecified mutations at the target site, or by HR with another DNA sequence to make a specific mutation or larger-scale DNA rearrangement [14,18]. HR and NHEJ are complex highly regulated processes involving many host proteins, and may thus vary dramatically in efficiency and outcome depending on the host organism, cell type, chromosomal location of the DSB and other factors [14,19,20]. Oligonucleotide-based targeted modification methods such as recombineering [21–23] and MAGE (multiplex automated genome engineering)  can be used for making specific small changes (e.g. point mutations) of genomic sequences as well as targeted gene rearrangements. These methods rely on cellular expression of bacteriophage λ HR-promoting proteins, but also involve endogenous cellular proteins and are thus subject to complicating factors as mentioned above. The outcome and efficiency of a targeted modification experiment using one of these methods can therefore be very unpredictable, whereas, at least in principle, variations of the intracellular milieu should have lesser effects on site-specific recombination.
The focus of this review is on the SSR enzymes themselves. We will describe the types of SSRs, and summarize the kinds of applications reported for SSRs to date. Our aim is to present an overview of SSR structure, mechanism and applications that will help the researcher to decide whether or not use of SSRs is appropriate, and, if so, how to choose a particular system and strategy.
TWO FAMILIES: STRUCTURES AND MECHANISMS
For all of the SSRs to be discussed here, recombination takes place between two 30–40 bp crossover sites, with (usually imperfect) two-fold DNA sequence symmetry. Each crossover site is recognized and bound by a dimer of SSR subunits. Protein–protein interactions between the SSR dimers bring the two sites together, forming a ‘synapse’. The SSR tetramer in the synapse then promotes strand cleavages by attacking specific DNA phosphodiesters close to the centre of the site with a nucleophilic active site amino acid residue. Rearrangement of the broken DNA strands followed by rejoining (reversal of the cleavage reaction) completes recombination (Figure 2) .
Mechanisms of site-specific recombination
The vast majority of the thousands of known (by DNA sequence analysis) site-specific recombination systems fall into one of two families according to the structure of the system-specific SSR. These two families are the serine recombinases and the tyrosine recombinases, named according to the identity of the nucleophilic active site amino acid residue. The serine or tyrosine side chain attacks the scissile DNA phosphodiester to create a strand break, and becomes covalently linked to the end of a broken DNA strand in the process. This intermediate phosphodiester linkage between the protein and the DNA conserves bond energy and thus allows rejoining of recombinant strands without any need for cofactors such as ATP or further processing by a polymerase or ligase. Although the two families have features in common (such as recognition of the crossover site by a SSR dimer, and catalysis within a SSR tetramer), their mechanisms are distinct and the proteins have no sequence or structural similarity. Nevertheless, Nature seems to use the two SSR families rather indiscriminately; there do not seem to be any ‘rules’ restricting a particular biological function to one type of SSR. Likewise, the uses of SSRs as experimental tools cannot be mapped straightforwardly onto the families.
Tyrosine recombinases attack the DNA strands so that the nucleophilic hydroxy group of the active-site tyrosine residue becomes covalently linked to each 3′ end of the broken strands. Tyrosine recombinases break, exchange and rejoin the DNA strands two at a time; their reactions thus proceed via a ‘Holliday junction’ or four-way junction intermediate, in which two strands are recombinant and two are non-recombinant (Figure 2A). The top and bottom strand breaks are separated by a number of base pairs (typically 4–8 bp) which is constant for a specific system but varies between systems (Figure 2A) . Tyrosine recombinases comprise a catalytic domain (∼200 or more amino acids), which contains the active site and also participates in subunit interactions and DNA sequence recognition, usually linked to one or more domains which may contribute further DNA recognition, subunit interactions or regulatory functions. Tyrosine recombinase-based systems are found in bacteria, archaea and microbial eukaryotes, where they perform diverse programmed DNA rearrangement functions including integration and excision of bacteriophage genomes from host chromosomal DNA, monomerization of chromosome and plasmid multimers, circularization of bacteriophage replication intermediates, resolution of transposition intermediates, inversion-mediated switching of gene expression and amplification of plasmid copy number . Tyrosine recombinases are structurally and mechanistically related to the Type IB topoisomerases, which include the human topoisomerase I [4,25].
A number of tyrosine recombinases have been applied in experimental genetics and biotechnology; indeed, the two most widely used SSRs, Cre and flippase (FLP), are from this family. Details of their applications are in sections below.
Serine recombinases break DNA strands by attacking the scissile phosphodiester with the hydroxy group of an active- site serine residue, so that the recombinase becomes covalently linked to the DNA 5′ end at the break. Serine recombinases make simultaneous DSBs in the two recombining sites during recombination; there is no Holliday junction intermediate. Recombination is brought about by a unique subunit rotation mechanism which swaps the positions of the cut DNA ends. The breaks in the top and bottom strands are always exactly 2 bp apart, and are symmetrically located at the centre of the crossover site (Figure 2B) [6,26]. A group of ‘small serine recombinases’ has an N-terminal catalytic domain containing the active site (∼150 amino acids), linked to a small helix–turn–helix C-terminal domain (∼40–50 amino acids). This group includes long-established model systems such as the transposon resolvases and DNA invertases [27,28]. Another important structural group, the ‘large serine recombinases’, has a similar N-terminal catalytic domain, but linked to a much larger and more diverse C-terminal part, which can be 500 amino acids or more and may comprise multiple domains or subdomains. This group includes the bacteriophage serine integrases, discussed in more detail below . Natural serine recombinases are found in bacteria and archaea, where they perform the same kinds of functions as the tyrosine recombinases .
Some natural systems possess the remarkable property that recombination is limited to pairs of sites that are connected to each other in a specific way. For example, many resolvases, whose natural functions are to split circular transposition intermediates into two smaller circles or to monomerize plasmid multimers, only recombine sites that are in direct repeat within the same DNA molecule. In contrast, invertases only recombine pairs of sites in inverted repeat in the same molecule. The recombination products of these systems have specific DNA topologies; for example, resolvases make ‘catenanes’ comprising two interlinked DNA circles [6,27,28]. The intrinsic selectivities of some of these systems have been co-opted for various experimental applications, such as probing the higher-order structures involved in the compaction of bacterial chromosomal DNA and preparing specific types of DNA molecules for in vitro studies, as outlined briefly below.
THE OVERLAP SEQUENCE
As noted above, the DNA strand breaks made by SSRs during the recombination reaction are ‘staggered’; the break in the top strand is not directly opposite the break in the bottom strand (Figure 2). The recombinant sites therefore each contain a short region at the centre comprising the top strand of one parent site and the bottom strand of the other. In nearly all site-specific recombination systems, efficient recombination occurs only if there are no sequence differences between the recombining sites in this region which would lead to mismatched base pairs in the recombinants. It is this requirement for overlap sequence homology that forces joining of left and right half-sites to form recombinants in a typical reaction [6–8]. Mutations in the overlap sequence can be used to restrict recombination to specific pairs of sites in situations where three or more sites are present (see below).
OUR FAVOURITE SSRs
Three SSRs have dominated applications of site-specific recombination to date: Cre, FLP and ϕC31 integrase.
Cre is a 38-kDa tyrosine recombinase from the E. coli bacteriophage P1 which acts on 34-bp loxP sites, and was the first SSR to be applied in higher eukaryotes [31–34]. FLP is a 45-kDa tyrosine recombinase encoded by the yeast Saccharomyces cerevisiae 2μ plasmid, which acts on 34-bp FRT sites [7,35,36]. The functionalities of these two enzymes are quite similar, and their popularity has been such that the terms ‘flox’ and ‘flirt’ have come into common use, meaning to flank a gene by loxP or FRT sites respectively, for gene knockout/knockin or cassette exchange (see below). Cre remains the most frequently applied SSR, especially in mammals.
Most SSRs promote recombination between two identical sites, such that the two product (recombinant) sites are also identical. Therefore, in principle, the recombinase could catalyse a second round of recombination, returning the DNA to the non-recombinant configuration. Cre and FLP conform to this paradigm. These subsequent rounds of recombination can be undesirable for many applications where a clean efficient conversion of substrates to products is required. However, some SSRs recombine non-identical sites, and thereby can selectively promote a single round of recombination. These include bacteriophage integrases of the tyrosine recombinase family (such as λ integrase) and the serine recombinase family (such as ϕC31 integrase). In contrast with the long complex sites used by tyrosine integrases such as λ integrase, the sites for serine integrases are short and simple, without accessory sequences. Serine integrase-catalysed recombination takes place only between two different sites, called attP and attB, forming two recombinant sites attL and attR. This reaction is one-way; the reverse reaction (between attL and attR) is not observed. However, the presence of an additional phage-encoded protein called the recombination directionality factor (RDF) transforms the activity of the integrase so that it preferentially catalyses attL × attR recombination to give product attP and attB sites (Figure 3). The RDF achieves this change by binding directly to the integrase protein, but its modus operandi remains unclear [29,37–39]. The strict unidirectionality of serine integrase-mediated attP × attB recombination and the ability to reverse the reaction with RDF has attracted much interest in these systems and has inspired ideas for a number of potential applications. In particular, ϕC31 integrase has been used extensively in various model organisms and in human cells [39–42].
Directional site-specific recombination by serine integrases
A few other SSRs have also been put to use, but less frequently (for example, see [43–45]). It may well be that some of these less-used SSRs are functionally superior to Cre, FLP or ϕC31 integrase for certain applications, but the latter systems predominate because the required genetic constructs, strains and techniques are so well established.
OVERVIEW OF CURRENT SSR APPLICATIONS
Microbiology: integrating plasmids, etc.
The most straightforward applications of SSRs adapt the natural integration, excision or inversion functionality of a site-specific recombination system for experimental investigation or biotechnology purposes. Historically, bacteriophage λ integration and excision played a crucial role in the seminal studies of molecular genetic structures and mechanisms at the birth of molecular biology, even though the concept of site-specific recombination was not described until the 1960s [4,25]. Nowadays, both serine and tyrosine bacteriophage integrases, and sometimes other SSRs, are widely used to introduce transgenic DNA into bacterial genomes by recombination between a site on a transgenic plasmid and a genomic site (Figure 4) . The integrating sequence may contain a selectable marker gene along with other genes of interest, for example enzymes involved in a biosynthetic pathway. Some strategies involve a second round of site-specific recombination after plasmid integration, which deletes the plasmid backbone along with any marker genes used for selection of successful integrants, and leaves behind just the transgene(s), giving a ‘markerless’ product strain (Figure 4) [11,40]. SSRs have also been used as experimental tools to probe fundamental features of DNA biochemistry in vivo (for example the structures of compacted bacterial genomic DNA) and in vitro (for example DNA supercoiling and topology) [46–50].
Transgene insertion via plasmid integration
Eukaryotic genomic DNA: conditional knockouts, etc.
The use of SSRs as tools for eukaryote genomic modification has become very widespread, and published work in this area runs into thousands of papers and reviews. In nearly all cases, the organism being used or studied must first be genetically modified (for example using HR or recombineering) to introduce constructs containing one or more recombination sites for the SSR to be used, as no such sites are likely to be present in the natural genomic sequence. This step can be difficult, but a suitable construct, once integrated, can then be used efficiently and repeatedly. In fact, in many laboratory organisms, it has become routine to include sites for commonly used SSRs (usually Cre, FLP or ϕC31 integrase) along with integrated transgenes to facilitate future manipulations, so that large collections of site-containing strains are now available [51–57]. The design of the constructs that are introduced in this way can be very elaborate, with several recombination sites for a single SSR, or sites for multiple SSRs [34,41,51,58–63]. The simplest scenario is to introduce a ‘cassette’ comprising a transgene of interest flanked by directly repeated recombination sites (Figure 5A). This gene can then be ‘knocked out’ (deleted from the genomic DNA) upon expression of the SSR, which may be regulated by an inducible promoter. In complex multicellular organisms (mammals, fruitflies, etc.), SSR expression can be made to be specific to a cell type or developmental stage. Gene expression can also be conditionally switched on, for example by SSR-mediated deletion of a cassette containing a transcriptional terminator .
Transgene knockout and RMCE
In some circumstances, it is feasible to introduce purified SSR protein directly into the cells, or to inject mRNA encoding the SSR, thus avoiding the need for intracellular SSR gene expression. Another advantage of these strategies is the transient presence of the SSR, reducing the possibilities for SSR-mediated toxicity or DNA damage [64–66].
RMCE and related methods
An alternative strategy is to exchange a transgene (flanked by recombination sites) for a different gene or other segment of DNA. This popular procedure, called recombinase-mediated cassette exchange (RMCE) [67,68], is illustrated in Figure 5(B). A DNA molecule including the replacement cassette (for example a circular plasmid) is introduced into a cell which already has a suitable cassette resident in its genome. RMCE using a plasmid cassette donor can be regarded as a two-step process: the first recombination reaction integrates the entire plasmid containing the incoming cassette, and the second reaction deletes the original genomic cassette and the plasmid backbone, leaving behind the new cassette. To achieve the desired result, the two sites flanking either the genome-resident cassette or the incoming cassette should not recombine with each other, the new cassette should be integrated in the correct orientation, and only a single cassette should be integrated. A strategy that can be used to favour the desired outcome is to flank the cassettes with different sites at the left and right ends, each of which is recognized by a different SSR. Alternatively, the sequences of the sites flanking the cassette can be altered so that their central overlap sequences are non-identical; recombination between them is then blocked (Figure 5C). A number of these ‘heterospecific’ sites have been created for Cre and FLP, enabling the design of elaborate genetic strategies involving cassettes with multiple pairs of sites which recombine with each other but not with sites of a different type [68–72]. Another refinement aims to prevent secondary reactions between the recombination product sites. As noted above, a typical site has inverted repeat symmetry and binds an SSR dimer. Variant sites with mutations of the bases contacted by the SSR in one ‘arm’ of the site can still support recombination because the SSR dimer binds well enough through co-operative interactions between subunits. Recombination between two of these arm mutant sites, one mutated in the left arm and the other in the right arm, produces one normal recombinant site and a defective site with mutations in both arms, which has low affinity for the SSR (Figure 5D) [68,73,74].
A further RMCE option is to use a serine integrase. Because serine integrase-mediated recombination is strictly between attP and attB sites (see above), the desired cassette exchange can be forced by flanking the incoming cassette with one type of site (e.g. two attP) and the resident cassette with the other type (e.g. two attB) .
in vivo applications
The methodologies outlined above have been put to myriad uses, and to keep this review within bounds we regretfully cannot detail them all here. For the reader who would like to know more, reviews cited above cover many specific applications as well as wide-ranging surveys of the field (see, for example, [12,36,51,60]). Some previous highlights illustrating the power and potential of SSR-based techniques include the development of sophisticated ‘gene stacking’ and related approaches to facilitate crop plant breeding [54,75]; the ‘Brainbow’ method for labelling individual neurons in distinct colours with combinations of fluorescent proteins ; the ‘SCRaMbLE’ method for creation of genetic diversity in yeast strains, by Cre-mediated recombination between large numbers of loxP sites introduced into synthetic chromosomes [77–79]; and methods for comprehensive engineering of bacterial genomes [80,81].
Plasmid construction and ‘gene assembly’
SSR-based systems can often be reconstituted in vitro, allowing ‘unnatural’ substrates to be used, such as PCR products or substrates made by annealing synthetic oligonucleotides. The reaction conditions in vitro can be controlled precisely, enabling complex DNA manipulations. One very popular SSR-based cloning methodology is Life Technologies' Gateway system, which uses bacteriophage λ integrase [82–84]. A series of E. coli plasmids contain cassettes flanked by sites for λ integrase which can be replaced by a user-specified gene (or other DNA segment) in an in vitro reaction analogous to RMCE as described above. The replacement gene cassette, which can be generated by PCR, is first cloned into an ‘entry vector’, and can then be exchanged into a suite of ‘destination vectors’ that support expression in various organisms, or with different types of regulation. The system can also be used for the ordered assembly of several genes at a single locus.
Recent developments in the field of synthetic biology have created a demand for convenient methods to assemble DNA segments into larger arrays. SSRs of the serine integrase family (see above) have been shown to be very useful tools for this type of operation, because the reactions between attP and attB sites are efficient and irreversible in the absence of the RDF, and the integrases have robust activity on PCR- or oligonucleotide-derived substrates [40,41,85,86]. Multiple serine integrases can be used in complex assembly procedures, providing that they are ‘orthogonal’; that is, each integrase protein interacts only with itself and its recombination sites, not with any of the other integrases or their sites (Figure 6) [87,88].
Use of serine integrase-mediated directional recombination (SIDR) for gene assembly
Genetic circuits, logic gates and biocomputing
Synthetic biologists aim to create ‘genetic circuits’ that alter gene expression in a logical way upon one or more input stimuli (such as chemical inducer molecules, heat or light). These devices can be likened to the components and logic gates of electronic circuits . SSRs are potentially very useful for this sort of application, as they can efficiently bring about a permanent change in the cell's DNA sequence. There has been much recent interest in the use of SSRs for implementation of logic gates and memory devices by switching of the orientation (i.e. inversion) of DNA segments flanked by recombination sites [90,91]. Serine integrases may be particularly adaptable for these applications because of their one-way directionality (Figure 3) [92–95]. SSR-based genetic devices might also form the basis of ‘biocomputers’ that can perform digital operations on inputs. Such concepts are very much still in their infancy, but one can readily imagine how logic gates such as those already developed, and memory systems based on invertible DNA segments that can act as binary digits, could be combined into a computing system if the specificities and efficiencies of the SSRs themselves were sufficiently high [96–98].
SSRs have obvious potential in the field of nanotechnology, although there is limited published work to date. For example, they could be used to implement molecular-level switches, or to attach specific DNA fragments (perhaps with chemical modifications such as fluorophores or attached enzymes) to specific locations in patterned 2D arrays of synthetic double-stranded oligonucleotides, or to bring about site-specific modifications of 3D structures created by the emerging ‘DNA origami’ technology [99,100], or to link multiple origami modules together in specific ways.
SSRs can come close to being ideal ‘molecular machines’ for rearranging DNA sequences; they can recognize and cut-and-paste specific sequences with very high efficiency and fidelity. Nevertheless, there are potential problems that can reduce their usefulness for particular applications. Some of these issues (for example, reversibility of the reaction) have been discussed above. Three others fall under the headings of off-target activity, toxicity and inefficiency.
SSRs have very high specificity for their canonical target sites. However, we place very high demands on them when, for example we try to target a unique locus in a large genome. Natural SSRs presumably evolve specificity until their level of off-target activity is so low as to be inconsequential for the survival of the genetic entities encoding them. Nevertheless, many instances are known of SSRs promoting low levels of ‘off-target’ recombination at ‘pseudosites’ that bear some sequence resemblance to their natural target site(s). A notable and well-studied example is ϕC31 integrase, for which a number of pseudosites (∼100) in the human genome have been identified (some targeted much more frequently than others, presumably depending on their similarity to the natural attP or attB target sequences, and their genomic context) . These pseudosites can be used opportunistically by researchers to target integration to specific genomic loci, even though recombination at a transgenic bona fide attP or attB recombination site might be much more efficient. However, in other circumstances (for example, when the aim is to integrate specifically at a single transgenic locus), off-target integration could be problematic. It is likely that some SSRs are more strictly sequence-specific than others, and protein engineering (see below) might provide improvements. Further research on this issue will be required, especially if SSRs are to be applied in areas such as gene therapy that demand exquisite specificity .
Expression of an SSR in an organism can be toxic (leading to problems such as polyploidy or germline sterility), or even lethal [12,31,102,103]. The molecular basis of toxicity is not well understood, but may be related to the specificity issue discussed above; low-level but genome-wide off-target recombination could lead to multifarious genetic problems such as mutations, insertions/deletions and failure of chromosomes to segregate properly. Toxicity might also be a consequence of persistence of recombination intermediates with strand breaks and covalent SSR–DNA linkages, leading to DNA damage  (see also the Inefficiency section below). Abortive intermediates of this type might be more common in off-target reactions. There might also be problems that are not caused by recombination itself, for example by inappropriate SSR-mediated looping interactions between distant sites on the DNA or by SSR binding at places that disrupt normal transcriptional regulation. In reported cases of SSR toxicity, the problems may be exacerbated by factors such as high expression levels and temporal persistence. Alleviation of toxicity problems could involve judicious choice of the SSR to be used in a particular organism or cell type, careful adjustment of SSR expression so that it is just sufficient to promote the desired reaction, transient SSR expression, use of purified SSR protein; addition of targeting signals to the SSR for correct (e.g. nuclear) localization, or modification of the SSR to reduce its lifetime in the cell .
We would usually, of course, like our SSR to perform the intended reaction rapidly (within minutes) and quantitatively (approaching 100%). However, recombination may be slow and/or incomplete. These problems could be due to intrinsic properties of the SSR, or incompatibility with the experimental system it is being used in. For example wild-type FLP is unstable at higher temperatures (such as 37°C), and tends to perform better in lower-temperature organisms such as fruitflies ; to counteract this property, variants more suitable for higher-temperature applications have been selected . Cre is stable at 37°C, but tends to give incomplete conversion of the substrate to recombinants, possibly because it binds tightly and sequesters the substrate in an intermediate state [31,106,107]. Some comparative studies of SSR efficiencies in specific systems have been published, but much more needs to be done. The extent of recombination in eukaryotic cells is often lower than might be expected, perhaps due at least in part to the condensed architecture of chromatinized DNA [108–111], and there may be more specific problems; for example serine integrases can fail to complete the recombination reaction, leaving behind damaged non-functional sites . Substantial improvements in some of these factors might be achieved by optimization of the SSR type and expression, as outlined in the previous section.
The widespread application and perceived potential utility of SSRs has led to much work to improve and extend their properties for specific tasks.
Currently used SSRs are all derived from microbial (except for FLP and its relatives, prokaryotic) organisms. It is therefore not too surprising that they may perform sub-optimally in very dissimilar cell types to which they are not adapted, such as those of mammals and other eukaryotes. Improvements to the stabilities and recombination efficiencies of Cre, FLP and other SSRs have been achieved by various strategies including random or directed mutagenesis, codon optimization, and addition of nuclear localization signals and/or degradation tags [31,105,112–117].
Another avenue is to hunt for new natural SSRs that might perform better than those currently used, and might also provide orthogonal systems for use in combinations to implement complex DNA rearrangement strategies. A number of reports have described the characterization of SSRs related to current favourites such as Cre and FLP [118–123], so far, however, without usurping the hegemony of the archetypal enzymes. Extending the range of available and well-characterized serine integrases, which is currently quite limited, is another focus of attention [40,87,95].
It would seem prudent in general to restrict SSR expression as much as possible to where and when it is needed; continuous cell-wide expression will inevitably lead to the accumulation of DNA damage, and potential toxicity problems have been noted above. Regulated expression can be implemented at the transcription or the translation level by standard methodologies , but the SSR protein itself can be modified to allow for additional control. One much-used ingenious approach is to fuse an SSR (for example Cre or FLP) to a steroid ligand-binding domain, which sequesters the fusion protein in insoluble cytoplasmic aggregates until exposure to the steroid ligand solubilizes the protein and allows it to enter the nucleus [51,125,126]. Other approaches include the creation of a split version of Cre (the N-terminal and C-terminal parts are expressed separately, and can interact to reconstitute a functional recombinase) [127–129], and a version of Cre that is activated by light due to incorporation of a photosensitive synthetic amino acid analogue at the active site .
An obvious limitation on the uses of SSRs in vivo is that the target recombination sites must first be introduced into the organism being studied. For many years, we have dreamed of the possibilities that would be opened up if we could re-configure the DNA recognition functions of an SSR so that it recombines at a sequence of our choice. Most enticingly, we could then promote recombination at specific sequences in the natural genome of our organism of interest, or even recombination between two different genomic sequences, opening the way to systematic SSR-mediated genetic modifications for biotechnology or therapeutic purposes . Two distinct strategies have been adopted towards this end.
First, SSRs including Cre and FLP have been subjected to directed evolution, selecting for increased activity on pseudosites having some resemblance to the natural recombination site [131–134]. Notable successes of this approach include the targeting of specific sequences in the human genome [135,136] and the HIV long terminal repeat (LTR) [137–139] (recombination between the identical sequences in the LTRs at each end of the HIV proviral genome could excise the viral DNA and thus form the basis of a therapy). This directed evolution strategy tends, however, to lead to the selection of variants with broadened rather than switched specificity. This problem can be alleviated, at least partially, by secondary screens that select for activity on the chosen target site and concomitant loss of activity on the original target site [131,132,140,141]. We recollect that both tyrosine and serine SSRs bind to their sites as dimers, each subunit being associated primarily with one ‘half-site’. Therefore, to target a typical natural sequence with high specificity, we would need to use two SSR variants, one for each half-site. Experiments with altered-specificity Cre have shown that this approach is feasible , as have experiments with zinc-finger recombinases (ZFRs), described below.
An alternative strategy is to reprogramme specificity by attaching the catalytically functional parts of an SSR to a heterologous DNA-binding domain that targets a new sequence. The small serine recombinases are particularly amenable to this strategy, as structural analysis has revealed that their catalytic domains are quite distinct and spatially separate from the primary determinant of DNA sequence recognition, a small helix–turn–helix domain at the C-terminus which binds at each end of the recombination crossover site. It therefore seemed feasible to replace the C-terminal domain with a domain of different specificity. Some degree of target specificity change was achieved by swapping the C-terminal domain of a small serine recombinase with the equivalent domain of a related enzyme (reviewed in ), but more wholesale changes had to await the characterization of serine recombinase mutants with catalytic activity that is no longer dependent on accessory sites and factors [8,143–146]. A set of hybrid ZFRs comprising the catalytic (N-terminal) domain of a mutant version of Tn3 resolvase, fused to the zinc-finger DNA-binding domain of the mouse transcription factor Zif268, were shown to have robust activity in vivo and in vitro on ‘Z-sites’ consisting of a central sequence targeted by the resolvase catalytic domain flanked by motifs recognized by the Zif268 domain  (Figure 7). Several other types of ZFR have since been characterized, with different serine SSR catalytic domains and zinc-finger domains [148,149]. Zinc-finger domains such as that of Zif268 have been the subject of a massive research effort to systematically reconfigure their DNA-binding specificities, in order to target chosen genomic sequences (most famously as ZFNs which have a nuclease domain attached to the zinc-finger domain C-terminus) . Similarly, ZFRs can be targeted successfully to many sequences. A remaining hurdle limiting ZFR applicability is that the SSR catalytic domain retains some sequence specificity, so the recombination site (‘Z-site’) must be quite similar to the target site for the natural recombinase in the central 12 bp. Some reductions in this specificity have been achieved by directed evolution [151–154], but more widespread application of ZFRs will depend on effective abolition of catalytic domain specificity and/or full understanding of the structural basis of specificity allowing us to manipulate it at will. Recently, another type of synthetic recombinase has been created by linking a serine recombinase catalytic domain to a TALE DNA-binding domain, whose DNA sequence specificity can be reconfigured particularly easily [149,155]. Also, altered-specificity serine integrases have been created by fusing N-terminal and C-terminal parts of different members of this SSR family .
Chimaeric serine recombinases
The SSR menagerie is becoming very large; there are more and more enzymes, enzyme variants and cleverly designed constructs to choose from. It can all be quite bamboozling. For a newcomer to the field looking for the best SSR to use, a good place to start might be to ask: why not Cre? Cre has been the most used SSR to date, with a long pedigree of successful application in many organisms, including higher eukaryotes that are often the organism of interest, and in vitro. But there may be good reasons to choose a different SSR for particular applications, as we hope that the sections above have illustrated.
We began this review by making a whimsical comparison between the Industrial and Genetic Revolutions and implying that, like the old steam engines, SSRs can be drivers of the ‘manufacturing’ processes involved–‘machines on genes’. We hope to have provided an informative ‘guided tour’ of these remarkable enzymes and their applications, and perhaps indicated how they might provide useful solutions for your research problems.
We are very grateful to Sally Rowland for helpful comments on the review.
This work was supported by the Biotechnology and Biological Sciences Research Council [grant number BB/003356/1].
clustered regularly interspaced short palindromic repeats-CRISPR-associated gene 9
long terminal repeat
recombination directionality factor
recombinase-mediated cassette exchange
transcription activator-like effector nuclease