Back in 2003, we published ‘MAX’ randomization, a process of non-degenerate saturation mutagenesis using exactly 20 codons (one for each amino acid) or else any required subset of those 20 codons. ‘MAX’ randomization saturates codons located in isolated positions within a protein, as might be required in enzyme engineering, or else on one face of an α-helix, as in zinc-finger engineering. Since that time, we have been asked for an equivalent process that can saturate multiple contiguous codons in a non-degenerate manner. We have now developed ‘ProxiMAX’ randomization, which does just that: generating DNA cassettes for saturation mutagenesis without degeneracy or bias. Offering an alternative to trinucleotide phosphoramidite chemistry, ProxiMAX randomization uses nothing more sophisticated than unmodified oligonucleotides and standard molecular biology reagents. Thus it requires no specialized chemistry, reagents or equipment, and simply relies on a process of saturation cycling comprising ligation, amplification and digestion for each cycle. The process can encode both unbiased representation of selected amino acids or else encode them in predefined ratios. Each saturated position can be defined independently of the others. We demonstrate accurate saturation of up to 11 contiguous codons. As such, ProxiMAX randomization is particularly relevant to antibody engineering.
Saturation mutagenesis (replacement of wild-type codons with codons for all 20 amino acids) is a core technique within the protein engineer's repertoire. Its importance in engineering non-native ligand-binding domains is undisputed. Thus saturation mutagenesis has played a vital role in creating synthetic zinc-finger-based transcription factors [1,2], antibodies and antibody-derived scaffolds [3,4] for many years, and, more recently, has proved valuable in engineering modified enzymes .
Conventional saturation mutagenesis employs degenerate codons NNN, NNK or NNS. Although technically undemanding, degeneracy leads to significant problems that have provoked a drive for non-degenerate alternatives. Traditionally, non-degeneracy was achieved by using trinucleotide phosphoramidites which add whole codons (rather than single bases) during oligonucleotide synthesis . In 2003, we described ‘MAX’ randomization , which uses standard oligonucleotides and is useful for enzyme engineering, but requires separation between randomized codons and thus cannot saturate more than two contiguous codons. Recently, simpler alternatives of ‘small-intelligent libraries’ designed by the program DC-analyzer  and the ‘22c trick’  have been described, which are optimal methodologies to saturate small numbers of codons effectively and efficiently, irrespective of location; although, owing to multiplex PCR primers, these approaches cannot saturate larger numbers of codons (see below). In the present paper, we describe ProxiMAX randomization, which offers all the advantages of Slonomics™ [10,11] (an automated, non-degenerate, enzyme-based process), but can be performed in a standard molecular biology laboratory. ProxiMAX likewise combines the benefits of non-degeneracy with the ability to saturate larger numbers of contiguous codons, a key requirement for antibody engineering.
Saturation mutagenesis: comparison of techniques
The advantages and disadvantages of the various approaches to saturation mutagenesis are compared in Figure 1. The most critical consequences of degeneracy are the loss of diversity/functionality [12,13] and inherent encoded bias . Diversity is a measure of the percentage of unique species within a library. Even the ‘22c trick’  (NDT/VHG/TGG degeneracy) leads to >60% loss of diversity over 12 saturated codons (Figure 1A). It might be argued that excess screening capacity (e.g. with ribosome or CIS display ) diminishes this issue, but in libraries with more than three randomized positions, the use of degenerate codons will probably have a severe impact on the quality of the output. Since the magnitude of protein–ligand interactions is a function of both affinity and concentration, all display screens are based on a pretext of equivalent concentration of library members. Figure 1(B) demonstrates that the concentration differences between common and rare codon combinations are unworkable beyond three degenerate saturated codons, regardless of methodology. Library diversity is further restricted by NNK and NNN saturations which randomly introduce termination codons (Figure 1C) that may lead to non-functional proteins, possibly leading to aggregation. Practical issues must also influence method choice. Notwithstanding objections of bias, the number of primers required to saturate more than three consecutive codons using either ‘small-intelligent libraries’ or the ‘22c trick’ are impractical to handle manually (Figure 1D). Neither do degenerate methods (including the ‘22c trick’) allow the potential to exclusively eliminate cysteine, which is usually undesired in protein and peptide libraries. Finally, Figure 1(E) compares other desirable attributes of saturation techniques, including the ability to select codons to suit the organism of choice, including codon optimization, ratio-control, subset-selection, etc. Thus we propose that ProxiMAX is the first technology to offer all desirable attributes in a manual setting and, as such, will be an invaluable addition to the protein engineer's toolbox.
Comparison of performance of common saturation mutagenesis techniques
ProxiMAX randomization: the concept
ProxiMAX randomization is based on iterative cycles of blunt-ended ligation, amplification and digestion with the Type IIS restriction endonuclease MlyI. More specifically, oligonucleotide donors (containing the required codons at their termini) are ligated on to a conserved acceptor sequence. Following ligation, the PCR amplification step provides ample ligated product, while concomitantly diluting non-ligated contaminants. Subsequent digestion removes all but the final codon of the donor sequences to generate a new acceptor, which enters the next cycle (Figure 2). By alternating different sets of oligonucleotide donors for each cycle, codon additions can be restricted to a specific saturation cycle.
ProxiMAX development: T4 DNA ligase exhibits some sequence preference in blunt-end ligations
To examine process feasibility, preliminary experiments used a mixture of all 20 donors combined in nominally equal quantities (based solely on manufacturer's data). Results demonstrated that the process worked, but, unsurprisingly, there was bias in the resulting small-scale library (Supplementary Figure S1 at http://www.biochemsoctrans.org/bst/041/bst0411189add.htm). Bias might result from variant oligonucleotide quality, inaccurate estimations of oligonucleotide concentration, sequence preferences of T4 ligase, amplification bias during PCR or even sequence preference of the restriction enzyme MlyI. To examine and/or exclude these factors, ligations and amplifications were performed in individual parallel reactions. Only after purification and quantification were the individual amplicons combined. Analysis at this midpoint in each of six cycles of saturation (i.e. after combination of individual reactions and before MlyI digestion) demonstrated that equimolar mixes (5% of each codon) had been generated successfully during each cycle of addition (Supplementary Figure S2A at http://www.biochemsoctrans.org/bst/041/bst0411189add.htm). However, these data offered no insight as to whether subsequent ligations would proceed in an equimolar fashion when those mixtures acted as acceptors. Therefore, after six cycles of saturation mutagenesis, composition of the final assembled library was assessed by Roche 454 pyrosequencing, which demonstrated that, whereas most codons were introduced into the library at the expected ratio, a few were noticeably over- and under-represented. In particular, the histidine codon 5′-CAT-3′ was favoured, whereas codons for lysine and threonine were under-represented (Supplementary Figure 2B).
By eliminating donor concentration and/or quality and PCR preference as potential sources of bias, these data suggest that T4 DNA ligase exhibits some sequence preference in blunt-ended ligations, when presented with a mixture of 5′ and 3′ ends, so that, in this substrate mixture, the enzyme has a different reactivity rate for different codons. To examine this possibility, data from the preceding 6-mer sequence analysis were studied for expected and observed ratios with respect to the last base in the phosphate donor/acceptor junction. A χ2 statistic with one degree of freedom was calculated for the 16 possible combinations of 3′–5′ preference at the ligation junction and where P<0.0001, this was noted (Supplementary Table S1 at http://www.biochemsoctrans.org/bst/041/bst0411189add.htm). Although there were no discernible rules for the junction preference directly at the point of ligation, there are certainly preferences for 3′-T and 5′-C, which supports the observed over-representation for histidine (CAT), but does not fully explain the reason for histidine being more preferred than arginine (CGT).
Successful application of ProxiMAX randomization
To compensate for these apparent sequence preferences, the six cycles of addition were repeated, with the concentration of each donor sequence being adjusted during mixing, to counteract the over- or under-representation shown in Supplementary Figure S2(B). Thus concentrations of donors carrying over-represented codons were reduced, whereas those carrying under-represented codons were increased. Cysteine and methionine codons were also excluded as these are deemed to have liabilities in peptide libraries by being susceptible to oxidation. It is clear from Figure 3(A) that T4 DNA ligase sequence preference can be compensated for by adjustment of donor concentrations.
Library construction with controlled codon ratios
Having demonstrated successful equimolar representation, we next sought to create a more relevant library from a protein engineering perspective. Saturation of 11 consecutive codons within the loop of a CDR3 (complementarity-determining region 3) domain of an antibody VH (variable heavy) chain domain was undertaken. Each position in the loop required a separate mixture of codons so that favourable protein-binding amino acids were included at the correct positions and those that would encode structural or manufacturing liabilities were excluded. The frequencies of the desired amino acids were rounded to the nearest 5% in order to simplify mixing of the codon oligonucleotides and analysis of the final library.
For this longer loop, both right-hand and left-hand acceptors were employed to build two library components in parallel (Figure 3B). The 11-mer CDR3 loop was then assembled by ligating the two fragments. The loop was sequenced, and remarkable compliance was achieved between design specifications (expected) and the actual library generated (observed; Figure 3C). One notable anomaly was the arginine codon which was poorly represented in the left-hand acceptor ligations. This is apparent in Figure 2(C), where the lack of under-representation of arginine at a particular addition has led to a compensating over-representation of other amino acids, for example aspartate at position 95. Despite this, the majority of codons are remarkably close to their desired frequency.
Although more involved than conventional degenerate saturation mutagenesis, we believe that the benefits of ProxiMAX randomization easily outweigh the increased outlay on reagents and of time. One to two cycles of ProxiMAX randomization can be achieved in a day and thus our exemplar 11-mer library can readily be prepared in 2 weeks. The bottleneck of directed evolution is library screening  and thus the smaller and more accurate the initial library, the greater the efficiency of sampling in selection or screening and the better the likely outcome. As stated by Neugebauer and co-workers (Morphosys AG) “The most critical parameter for all in vitro-based approaches is the quality of the antibody library” . Thus although the requirement to perform individual ligations and PCRs may initially seem unwieldy, performing 20 parallel reactions from a single mastermix is not a significant undertaking. The phrase ‘if you want a good result, preparation is nine-tenths of the work’ was never more apposite.
Protein Engineering: New Approaches and Applications: A joint Biochemical Society/Protein Society Focused Meeting held at the University of Chester, U.K., 10–12 April 2013. Organized and Edited by Ross Anderson (Bristol, U.K.) and Dafydd Jones (Cardiff, U.K.).
We gratefully acknowledge Dr C. Baar for assistance with data collection and analysis, Dr A.J. Sutherland and Dr D.A. Nagel (Aston University) for helpful discussions and Professor A.R. Pitt (Aston University) and Professor J.D. Sutherland (Laboratory of Molecular Biology, Cambridge, U.K.) for a critical reading of drafts before submission.
This work was supported by an Aston University departmental studentship (to M.A.), the Biotechnology and Biological Sciences Research Council [grant number BB/D525756/1 to A.V.H. (M.D.H.)], an Aston Research Centre for Healthy Aging (ARCHA) studentship (to H.R.M.H.) and Technology Strategy Board [grant number 720032 (to Isogenica)].
ProxiMAX randomization is protected by patent numbers EP 1907548, EP 2236612 and US 8357638 B2 filed by Mohammed Ashraf, Marcus Hughes and Anna Hine, and is operated commercially under the trade name Colibra™ (Isogenica).