Most cellular processes are carried out by a multitude of proteins that assemble into multimeric complexes. Thus a precise understanding of the biological pathways that control cellular events relies on the identification and on the biochemical characterization of the proteins involved in such multimeric assemblies. Advances in MS have made possible the identification of multisubunit protein complexes isolated from cell lysates with high sensitivity and accuracy, whereas the TAP (tandem affinity purification) methodology efficiently isolates native protein complexes from cells for proteomics analysis. TAP is a generic method based on the sequential utilization of two affinity tags to purify protein assemblies. During the first purification step, the Protein A moiety of the TAP tag is bound to IgG beads, and protein components associated with the TAP-tagged protein are retrieved by TEV (tobacco etch virus) protease cleavage. This enzyme is a sequence-specific protease cleaving a seven-amino-acid recognition site located between the first and second tags. In the second affinity step, the protein complex is immobilized to calmodulin-coated beads via the CBP (calmodulin-binding peptide) of the TAP tag. The CBP–calmodulin interaction is calcium-dependent and calcium-chelating agents are used in the second elution step to release the final protein complex preparation used for protein identification by MS. The TAP–MS approach has proven to efficiently permit the characterization of protein complexes from bacteria, yeast and mammalian cells, as well as from multicellular organisms such as Caenorhabditis elegans, Drosophila and mice.
More than the proteins themselves, multimeric protein complexes constitute the fundamental and functional units of the proteome [1,2]. Indeed, most cellular processes require protein–protein interactions or the assemblies of large protein complexes. Protein complexes (or molecular machines) are dynamic entities that assemble in an ordered time- and space-dependent fashion to store and relay various cellular signals or to contribute to the cellular architecture. Protein complexes are more than the sum of their components. For example, enzymes very often interact with regulatory subunits or with scaffolding proteins that are required for their activity, subcellular localization or substrate specificity. Different enzymatic activities could also be associated within the same protein complexes. To co-ordinate the various cellular functions efficiently and precisely, many protein complexes interact with one another. Moreover, the same proteins could be associated within different molecular machines that exert distinct cellular functions. Thus the study of protein complex components is essential to unravel the molecular nature of the cellular programmes. In addition, important insights can be revealed by the characterization of the molecular environment of proteins with known functions.
Since protein complexes correspond to a key level of the information encoded by individual genes, the description of the interactome is one of the challenges of the post-genomic era . The composition of protein complexes is not easily inferred by studying pairwise interactions on a large scale. In contrast, the development of protein complex purification strategies combined with protein identification by MS/MS (tandem MS) has emerged as a powerful approach to decipher the interactome. Among the different protein complex purification strategies, protein tagging based on the TAP (tandem affinity purification) tag has proven particularly successful for molecular machine analyses [4,5].
The TAP–MS procedure
The TAP method is an epitope-tagging purification strategy based on a dual purification that represents a major improvement in sample purification for further protein identification by MS. In the original system, the TAP tag fused in-frame with the protein of interest is composed of two tags separated by a TEV (tobacco etch virus) cleavage site: the proximal tag is the CBP (calmodulin-binding peptide), whereas the distal tag is composed of the IgG-binding part of the Staphylococcus aureus Protein A . The TAP-tagged protein is expressed in cells to associate in protein complexes with endogenous components and is recovered together with its binding partners via a TAP procedure (Figure 1A). First, the TAP-tagged protein is immobilized on IgG–agarose beads via the Protein A moiety of the TAP tag, and gently washed. Protein complexes associated with the TAP-tagged protein are then specifically released from the resin by TEV protease cleavage. TEV is a sequence-specific protease cleaving a seven-amino-acid recognition site which is relatively rare in the mammalian proteome. Most of the proteins bound non-specifically to the IgG beads are left behind because the binding to resin and the release of the tagged protein from it are based on different chemistries (affinity compared with protease cleavage). In the second affinity step, the CBP tag is bound to calmodulin-coated beads in the presence of calcium. TEV protease and further contaminants are washed out. Finally, the recombinant protein and its interacting partners are then specifically retrieved by calcium chelation, since the CBP–calmodulin interaction is calcium-dependent.
Schematic representation of the TAP–MS methodology
Purified components are then separated by one-dimensional SDS/PAGE (Figure 1B) because the complexity of the protein samples after TAP is high enough to require a gel separation, but, in principle, two-dimensional gel electrophoresis is not necessary. Proteins are visualized by staining, excised from the gel and cleaved into peptides by trypsin or the endoproteinase Lys-C.
As far as purified protein complexes are considered, MS/MS combined with online nano-LC (liquid chromatography) peptide separation remains a reliable, robust and efficient approach [6,7]. Online nano-LC–MS is particularly adapted to samples containing an excess of contaminant proteins such as actin or tubulin, since it allows the identification of proteins that constitute as little as 2–5% of the total protein mixture. In MS/MS, the separated peptide ions enter the mass spectrometer and their mass-to-charge (m/z) ratios are measured in real time (in the effective range of the MS, approx. 200–2000 Da). Then, peptides are selected for fragmentation within the spectrometer. During this process, peptides are forced to collide with inert gas molecules leading to cleavage of peptide bonds and generating a set of fragments that differ in length by one amino acid each. The resulting peptide fragments are again measured to produce a series of signals (MS/MS spectra) which correspond in mass to adjacent amino acid segments. Information on peptide sequence, peptide mass and fragment mass can be queried against theoretical spectra generated from protein sequences in appropriate protein, EST (expressed sequence tag) or genomic databases using peptide sequencing software (Mascot, SEQUEST, etc.). Thus, even if, in many cases, only part of the amino acid sequence can be read, the searches lead to the identification of the corresponding protein [8–11]. Each peptide analysed identifies a given protein deduced from the peptide sequence. Analysis of different peptides in the digest mixture can confirm the identification of a protein or identify a different protein that is present in the mixture.
The TAP–MS approach, originally described for protein complex analysis in yeast , has also been shown to permit efficient protein complex characterization from Escherichia coli , trypasonomatids  and mammalian cells [14,15], as well as from multicellular organisms such as plants [16,17], Caenorhabditis elegans , Drosophila  and mice .
Application fields of TAP–MS methodology
TAP–MS leads to the identification and characterization of protein complexes that are formed in vivo, and the two successive purification steps confer a high degree of specificity on the technology. Stringent washes are not required, and less stable multiprotein assemblies may be recovered together with a reduced number and a lower amount of contaminating proteins. More importantly, TAP–MS is a generic and reproducible method. Protein complex purification by TAP does not rely on the availability of specific antibodies and is potentially applicable to any protein, as far as its corresponding cDNA can be cloned. Moreover, the same optimized protocol is used for the purifications of different protein complexes, making the list of background contaminants across purifications easy to establish. However, in some cases, a purification contaminant protein could be a true component of a given protein complex, and more stringent statistical analyses of the MS data (number of peptides identified, etc.) are required to determine whether a protein identified is a contaminant or an interacting partner.
Because of its numerous strengths, TAP–MS appears to be a method of choice to map protein–protein interactions in large-scale experiments in E. coli  and in yeast [22,23]. Without approaching the scale of the bacterial and yeast data, the TAP–MS technology has been used efficiently in mammalian cells to map smaller protein networks such as the Par interactome involved in cell polarity  or the interaction network around the human transcription machinery , as well as to decipher biological signalling pathways such as the TNFα (tumour necrosis factor α)/NF-κB (nuclear factor κB)  or Wnt/β-catenin  pathways. In the case of the TNFα/NF-κB signalling pathway, Bouwmeester et al.  applied the TAP–MS methodology to compare protein complex compositions under different physiological conditions (stimulated or non-stimulated by TNFα). Finally, the application of the TAP–MS approach to transgenic mice enables the comparison of protein complex organization between different tissues and allowed the characterization of novel interacting partners not previously identified in cells in culture .
Thus TAP–MS emerges as a powerful tool in protein network studies for systems biology in a number of different organisms and in different physiological conditions.
Further developments of the TAP–MS technology
Despite a number of strengths, the TAP–MS approach has several limitations associated with the epitope-tagging technologies in general. In particular, the fusion of the TAP tag (approx. 21 kDa) to a protein may affect its folding, activity or interaction with other cellular components. In a large-scale study in yeast, Gavin et al.  determined that, in 18% of the cases when essential genes were TAP-tagged, viable strains were not obtained, indicating that C-terminal TAP tagging can impair protein function. Tagging the proteins at the other end might often overcome the problem, but the development of smaller tags, based on new combinations of epitope tags is an alternative way to tackle this issue. In particular, a novel tag (SF-TAP) composed of the streptavidin II tag and the FLAG tag reduces the size of the TAP tag to 4.6 kDa . In this case, protein complex recovery is not based on the addition of a protease and a chelating agent, but on the presence of metabolites. In the first purification step, the streptavidin–biotin binding is disrupted by desthiobiotin, and final protein complexes are eluted with FLAG octapetides in the second step. Moreover, B-Raf was significantly less expressed when tagged with the original TAP tag in comparison with when expressed in fusion with the SF-TAP tag. Even though this phenomenon was demonstrated with only one tagged protein (B-Raf), this example could indicate a possible instability of proteins when fused to large tags.
TAP is successfully used to purify protein complexes from mammalian cells in culture. However, yields from the tandem purifications remain low, and consequently large amounts of starting material are required. In order to increase purification yields, Bürckstümmer et al.  developed an alternative tag, named GS-TAP tag, composed of two IgG-binding units of the Protein G from Streptococcus sp., a TEV cleavage site, and SBP (streptavidin-binding peptide). The GS-TAP tag produces a 10-fold increase in efficiency when compared with the conventional TAP tag. The procedure allows purification of protein complexes that were not amenable to TAP and use of less starting material, leading to higher success rates and possibly identifies less abundant protein assemblies. The technological improvements conferred by the GS-TAP tag are also confirmed by studies performed in Drosophila embryos .
The TAP method may fail to detect transient interactions, low stoichiometric complexes and/or those interactions occurring only in specific physiological states under-represented in exponentially growing cells. In addition, gentle extraction conditions associated with the procedure renders the purification of protein complexes strongly associated with cell structures such as chromatin difficult to recover . The development of a HB-TAP (histidine-biotin) tag for two-step purification under fully denaturing conditions combined with in vivo cross-linking might overcome some of these current limitations of the technology .
In E. coli and yeast, homologous recombination allows fast and efficient ORF (open reading frame) tagging, and subsequent expression of the tagged protein at physiological levels under the control of their endogenous promoters [21,22]. In contrast, strategies based on exogenous expression of the TAP-tagged cDNA were mainly used for protein complex studies in mammalian systems. It is quite reasonable to expect that changes in expression of some proteins may alter the stoichiometry of the corresponding protein complexes, making the identification of low-abundant binding interactors more difficult. In particular, high expression of the TAP-tagged protein using transient expression often results in the recovery of large quantities of chaperones and heat-shock proteins presumably associated with the misfolded overexpressed TAP-tagged protein. Very often, rather than generating false interactions, overexpression of the TAP-tagged proteins leads to the masking of binding partners of low abundance. Thus stable expression using weak or inducible promoters inserted by virus-mediated gene transfer to reduce the number of copies by adjusting the multiplicity of infection, would be favoured in order to identify more easily the binding partners. Ultimately, the expression of the TAP-tagged protein at real endogenous levels can be achieved by the generation of TAP-knockin mouse lines . An interesting alternative to TAP-knockin mice also comes from the usage of large genomic clones such as BACs (bacterial artificial chromosomes). Transgenes based on BACs are often large enough to contain complete genes with all their endogenous regulatory sequences. In addition, the development of efficient DNA engineering methods enables rapid and robust modifications of these large DNA sequences, including high-throughput TAP tagging for protein complex purifications from cells in culture and mice .
By dramatically reducing the levels of background proteins in the purifications, TAP is a generic method that allows an efficient and accurate purification of protein complexes for identification by MS. In recent years, the TAP–MS strategy has been employed successfully to perform large-scale protein interaction mapping in lower organisms and to decipher smaller interactomes and signalling pathways in mammals, therefore making this technology a central tool in systems biology.
Experimental Approaches to Protein–Protein Interactions: A Biochemical Society Focused Meeting held at University of Sheffield, Sheffield, U.K., 11–12 January 2010. Organized and Edited by Michael Sutcliffe (Manchester, U.K.) and Mike Williamson (Sheffield, U.K.).
We are particularly grateful to Claire Rosnoblet and Julien Vandamme for discussions.
Our work is supported by the Centre National de la Recherche Scientifique (CNRS), l'Université de Lille 1 – Sciences et Technologies, the Région Nord-Pas de Calais and the European Regional Developmental Funds.