Protein engineering is at an exciting stage because designed protein–protein interactions are being used in many applications. For instance, three designed proteins are now in clinical trials. Although there have been many successes over the last decade, protein engineering still faces numerous challenges. Often, designs do not work as anticipated and they still require substantial redesign. The present review focuses on the successes, the challenges and the limitations of rational protein design today.
Alternative protein–protein interaction scaffolds
An important goal of protein engineering is to modify, create or inhibit protein–protein interactions. These designs can be used for numerous applications in synthetic biology, diagnostics, therapeutics and imaging. Historically, antibodies have been used as high-specificity reagents for many molecular recognition applications, such as ELISAs and Western blots. However, they are large complex multi-chain proteins with disulfide cross-links, and, as a result, they are extremely sensitive to freeze–thaw cycles and cannot be routinely expressed to high levels in bacterial cells. Production in animals and mammalian cell culture is typical, but expensive and time-consuming. Finally, the generation of antibodies for a particular target is random, such that recognition of a pre-specified epitope in a target is difficult.
The limitations associated with antibodies have stimulated many groups to develop alternative binding scaffolds. Such protein scaffolds typically lack disulfide cross-links, are small, stable and highly soluble, and are easy to produce in large quantities in Escherichia coli. In the present review, we assume that all scaffold proteins have these qualities unless noted otherwise. In comparing the different frameworks that have been used to develop binding modules, we highlight the unique features of each. Questions include the following. Is the scaffold protein a globular or repeat protein? Are the binding residues displayed on loops, on elements of regular secondary structure or both? Is the peptide/protein binding the native function of the scaffold? Is a consensus protein or a natural protein used as a framework for design? What is the potential application of each design?
Many designs for novel binding functionality that we discuss have revealed unexpected features about the properties of the framework protein and/or details of the protein–protein interaction interface. In some examples, the unexpected results have been co-opted and incorporated into next-generation designs. In the present review, we focus on the rational design of alternative protein interaction scaffolds, with a special emphasis on strategies for second-generation designs.
Globular protein scaffolds
Several globular proteins, such as the tenth fibronectin type III domain of human fibronectin (the ‘monobody’ or ‘Adnectin’), the B domain of protein A (the ‘Affibody’), PDZ, SH3 (Src homology 3) and WW domains, have all been used as frameworks on which to display different binding specificities [1–6].
Fibronectin type III domain (monobodies)
Monobodies were chosen as a design scaffold for molecular recognition because fibronectin type III domains possess three loops that are tolerant to mutations, including insertions and deletions. Moreover, these loops connect a β-sheet core in a fashion similar to the complementarity-determining loops of immunoglobulin domains .
First-generation monobody libraries were created by randomizing 10–21 positions in up to three loops (theoretical library sizes ~1013–1027). Both in vitro and in vivo selection techniques (e.g. phage, mRNA and yeast display) have been used to identify monobodies with the desired binding properties. However, in all cases, the fraction of library coverage was quite small, so a consensus sequence is rarely discernible [1,7–8].
Unexpected interactions between target proteins and non-randomized monobody β-strands (revealed by the crystal structures of the complexes) motivated second-generation libraries, in which both loop and β-strand positions were randomized (Figure 1A). For monobodies that interact with their targets through both loops and β-strands, this strategy successfully increased monobody-binding affinity and specificity .
Redesign of globular proteins
To improve library coverage and capitalize on the observed abundance of tyrosine and serine at antibody–antigen interfaces, Koide et al.  created a minimalist library with only tyrosine or serine allowed in the loops. This strategy was effective for some targets, such as yeast and human SUMO (small ubiquitin-related modifier) variants. However, a slightly expanded set of amino acids (alanine, aspartate, glycine, histidine, leucine, asparagine, arginine, serine and tyrosine) was necessary to increase affinity for MBP (maltose-binding protein) . Based on MBP–monobody co-crystal structures, it appears that tyrosine and serine still dominate the binding interface, whereas other amino acids are important for stabilizing different loop conformations.
B domain of protein A (Affibody)
Affibodies are derived from the B domain of protein A, a Staphylococcus aureus cell wall protein. Initial studies showed that mutations are well-tolerated in 13 of 58 positions, including many (but not all) positions critical for IgG recognition .
First-generation Affibody libraries were created by randomization of all 13 positions (theoretical library size ~1017). Phage display was used to identify library members that bind to various targets . Again, the large theoretical library size precludes significant library coverage. Two strategies were applied to increase the affinity of these ‘first-generation’ hits for the desired targets.
In one strategy, after alignment and analysis of binding sequences, highly conserved binding residues were kept constant, whereas the remaining binding positions were randomized and reselected to achieve higher binding affinity and specificity . In a separate strategy, multivalent Affibodies were constructed from a selected monomeric Affibody to take advantage of avidity effects . Both approaches could be used as strategies to increase affinity on many scaffolds.
Second-generation Affibodies have been generated by scaffold optimization, i.e. site-specific mutagenesis of residues outside the binding surface  (Figure 1B). An optimized scaffold was produced by testing each mutation, independently and in combination with other mutations, for improved biophysical properties (e.g. stability and hydrophilicity) and reduced cross-reaction with the native ligand IgG.
PDZ domains are natural protein–protein interaction modules that bind peptides in an extended conformation. They bind different peptide sequences, but all have in common the recognition features of a free C-terminus and a hydrophobic residue at the C-terminal position in the peptide .
First-generation PDZ designs were constructed by computationally guided mutagenesis of up to 12 positions at the PDZ–peptide interface . Amazingly, only a few PDZ domains were tested and found to bind to their target peptides with affinities in the micromolar range, which is typical for natural PDZ domain–peptide interactions.
To increase affinity and specificity, second-generation PDZ affinity reagents, also known as ‘affinity clamps,’ were created by fusion of circularly permuted PDZ domains (to allow facile fusion) with randomized monobody domains  (Figure 1C). In the newly generated binding cleft between the PDZ and monobody domains, the PDZ domain specifies the extended peptide conformation, whereas the monobody domain improves binding affinity up to 500-fold and increases specificity against a closely related peptide up to 2000-fold. These enhancements probably stem from the fact that affinity clamps bind longer stretches of the cognate PDZ peptide than the PDZ domain alone. However, because both the PDZ and monobody domains are essential for peptide interaction, current designs are still constrained by the conserved C-terminal hydrophobic residue required for PDZ–peptide interaction.
SH3 and WW domains
SH3 and WW domains are protein–protein interaction domains that both recognize short proline-rich peptides. Similar to PDZ domains, the affinity of natural SH3 and WW domain–peptide interactions is in the micromolar range .
First-generation libraries of these domains have been constructed by randomization of the peptide-binding residues. Hck-SH3 was selected as a scaffold because it has high nanomolar affinity for its native ligand HIV-1 Nef, largely due to an accessory loop that increases the size of its protein-binding interface beyond the canonical SH3–PXXP (Pro-Xaa-Xaa-Pro) motif interaction. Six consecutive positions (library size ~108) in this accessory loop were randomized and screened by phage display for increased HIV-1 Nef-binding affinity  (Figure 1D).
For WW domains, nine of 38 positions (theoretical library size ~1011) were randomized on the peptide-binding surface of a consensus Pin1 domain (Figure 1E) and screened using CIS display  for binding to VEGFR-2 (vascular endothelial growth factor receptor isoform 2) .
In these SH3 and WW domain designs, the estimated dissociation constants for domain–target interaction were in the nanomolar range. Also, in each design, a consensus is obtained at the randomized positions because the library sizes are within the limits of in vitro display techniques (~1012). However, as with PDZ domains, they still have significant sequence constraints, namely binding to proline-rich sequences.
Repeat protein scaffolds
Repeat proteins are fundamentally different from globular proteins because they are stabilized by local, rather than long-range, interactions. Each repeat motif is characterized by a few signature residues that specify the interactions within repeats and between adjacent repeats. This construction is particularly useful for protein engineering because one can consider each repeat as a Lego™ brick that can be stacked to create modules with different functionalities. Consensus repeats, derived from the alignment of all repeats, have been generated for most motifs [19–23], and arrays of such consensus repeats are frequently used as a framework for design because they are more stable than natural proteins. In this section, we focus on the three most common repeat protein scaffolds for protein design: TPR (tetratricopeptide repeat), LRR (leucine-rich repeat) and ANK (ankyrin repeat).
TPR proteins consist of tandem repeats of a 34-amino-acid helix–turn–helix motif that propagate in a superhelical fashion to create a concave peptide-binding face. Most TPR proteins contain three tandem repeats, and most TPR–peptide complex structures show that the bound peptides adopt an extended conformation.
To create a novel TPR–peptide interaction pair, Jackrel et al.  modified an existing TPR–peptide interaction. This design retained many elements of the existing interface with the exception of one peptide residue and its corresponding binding pocket. This design strategy, with simultaneous changes to both the protein and the peptide, can be used to create novel TPR–peptide pairs for a variety of applications.
A different strategy involved grafting a set of statistically derived Hsp90 (heat-shock protein 90)-binding residues on to a module composed of three tandem consensus TPR motifs [25–26]. This designed TPR specifically binds its ligand, but approximately 40-fold more weakly than the natural Hsp90-binding protein TPR2A. A second-generation TPR included charge optimization outside of the peptide-binding surface (Figure 2A), resulting in a TPR module with higher affinity and specificity for Hsp90 than TPR2A.
Redesign of repeat proteins
LRR proteins form horseshoe-like structures that mediate protein–protein interactions. LRRs are the antibody equivalent for jawless vertebrates. This property has been exploited to identify novel LRRs by injecting antigens into lampreys to trigger an immune response. cDNAs from the immunized lampreys can be isolated, screened using yeast surface display and selected for binding to a target antigen. Selected LRRs, however, contain disulfide linkages and are purified from inclusion bodies in E. coli, making recombinant expression and purification more challenging [27,28].
To overcome some of these issues, Jung et al.  created a hybrid scaffold consisting of eight LRRs from TLR4 (Toll-like receptor 4) and three LRRs from hagfish variable lymphocyte receptor B. The hybrid scaffold, purified from insect cells, retained nanomolar binding to TLR4's natural binding partner MD2 (myeloid differentiation protein 2). For second-generation designs, individual point mutations, guided by the hybrid scaffold–MD2 co-crystal structure, were made to fine-tune the binding interface . Some mutations improved affinity by over an order of magnitude. Combining two point mutations (one charge and one hydrophobic) had a synergistic effect that led to a 3000-fold improvement in affinity (Kd=26 pM) (Figure 2B).
The same group later developed a new binding scaffold (the ‘Repebody’) based on a consensus of 1000 variable lymphocyte receptor sequences . Unlike their previous design, this scaffold is fully soluble and is purified in high yield in E. coli. By grafting residues from TLR4 on to the Repebody, they created a scaffold that bound MD2 with nanomolar affinity. They also created libraries of Repebodies with six randomized positions (theoretical library size ~109) that could be displayed on phage and used to identified library members that bind to IL-6 (interleukin-6) with nanomolar affinity.
ANK proteins (DARPins)
ANK proteins are superhelical structures that mediate protein–protein interactions through the loops and helices on their concave surfaces. Alignment-based consensus ANK proteins, also known as DARPins, are particularly appealing scaffolds for protein engineering because they are far more stable than natural ANK proteins .
Previously, Binz et al.  generated DARPin libraries containing seven randomized positions on the loops and helices per repeat. These libraries (on the order of 1015 for proteins containing two ANKs and 1023 for proteins containing three ANKs) are typically screened by ribosome display. Again the large library size makes full library coverage difficult to achieve. Nonetheless, this approach has been used to identify DARPins that specifically bind numerous targets including Her2 (human epidermal growth factor receptor 2), tubulin, MBP, VEGFR-2, caspase 3 and caspase 2 with nanomolar affinity [31–33]. Often, additional mutated residues accumulated during PCR amplification also participate in the binding interfaces [31,32].
Many consensus repeat proteins contain N- or C-terminal capping repeats, which are included for solubility. The DARPin C-terminal capping repeat, borrowed from guanine–adenine-binding protein, is the least stable repeat in solution and appears disordered in a DARPin the crystal structure [34,35]. Recently, Interlandi et al.  used MD simulations to improve the interrepeat packing in the C-terminal cap. Incorporation of mutations suggested by these studies significantly improved the stability of the capping repeat and this new C-terminal cap may also be included in second-generation DARPins to create even more stable proteins (Figure 2C).
Practical applications of designed proteins
Researchers have learned much about protein–protein interactions from selection and design efforts of the type we have described. But, more than that, we are now at a point where such ‘designer binding modules’ can be used effectively in many of the same applications as antibodies. Adnectins, DARPins and Affibodies are all in clinical trials as potential therapeutic agents against VEGFR-2 and Her2 [36–38].
Designed protein–peptide interactions have also been used for other applications. For example, one of the grand challenges in biomaterials research is to develop stimuli-responsive materials. For biomedical applications, a useful material must respond to stimuli that are relatively mild and compatible with life. Towards this end, smart hydrogels based on designed tunable TPR–peptide interactions have been used successfully .
Two interrelated processes that are critical to biomedical research are the ability to specifically recognize a protein and purify it from complex biological samples. Designed protein–protein interactions have been used successfully as more robust alternatives to antibodies for affinity purification and Western blotting [21,24,40].
Designed protein–protein interactions can also be used as crystallization ‘chaperones’ of difficult targets in the same way that antibodies and antibody fragments have been used. To date, the DARPin 3H10, which specifically binds Plk1 (Polo-like kinase 1), has been used successfully in crystallization of the apo form of Plk1 .
For intracellular applications, designed proteins have an unprecedented advantage over antibodies, namely the ability to probe protein activities in living cells. For example, Cortajarena et al.  showed that a designed TPR that competes with the chaperone scaffolding protein Hop (heat-shock organizing protein) for binding to Hsp90 was able to impair proper protein refolding in mammalian cells. Similarly, Amstutz et al.  demonstrated that DARPins could inhibit APH [aminoglycoside phosphotransferase (3′)-IIIa] in E. coli and thereby restore kanamycin sensitivity. Thus alternative scaffold proteins can not only replace antibodies in many applications but also be used to dissect complex protein networks in living cells.
Protein Engineering: New Approaches and Applications: A joint Biochemical Society/Protein Society Focused Meeting held at the University of Chester, U.K., 10–12 April 2013. Organized and Edited by Ross Anderson (Bristol, U.K.) and Dafydd Jones (Cardiff, U.K.).
human epidermal growth factor receptor 2
heat-shock protein 90
myeloid differentiation protein 2
Polo-like kinase 1
Src homology 3
vascular endothelial growth factor receptor isoform 2
We acknowledge members of the Regan laboratory for a critical reading of the paper before submission.
E.B.S. is supported by a National Science Foundation Graduate Research Fellowship. We acknowledge the support of the Raymond and Beverly Sackler Institute for Biological, Physical and Engineering Sciences.
These authors contributed equally to this work.