Defining structural features of IDPs (intrinsically disordered proteins) and relating these to biological function requires characterization of their dynamical properties. In the present paper, we review what is known about the IDPs of colicins, protein antibiotics that use their IDPs to enter bacterial cells. The structurally characterized colicin IDPs we consider contain linear binding epitopes for proteins within their target cells that the colicin hijacks during entry. We show that these binding epitopes take part in intramolecular interactions in the absence of protein partners, i.e. self-recognition, and consider the structural origins of this and its functional implications. We suggest that self-recognition is common in other IDPs that contain similar types of binding epitopes.
For many years, it has been axiomatic that the physiological functional states of most proteins are structured. An illustration of this is that most current textbooks used for teaching biochemistry and molecular biology do not discuss disordered states of proteins. However, it is now clear that disordered states of proteins are common since many proteins are unfolded, or only partially folded, in their native states; these are termed IDPs (intrinsically disordered proteins) [1–4]. Indeed, using computational predictors based on factors such as the complexity of amino acid sequences, it has been suggested that 6–33% of proteins in some prokaryotes have disordered domains of at least 50 amino acids in length, and that the majority of proteins in some eukaryotes are similarly disordered [1,2]. Furthermore, if we consider that most, or all, proteins that are structured in their native states must be disordered after biosynthesis before the onset of folding, then disordered states of proteins are more common than ordered states. However, unlike structured proteins, relatively little is known about the conformational properties of disordered proteins. One of the major reasons for this is their dynamic properties, which contribute greatly to the experimental difficulty of characterizing them in atomic detail. X-ray crystallography is not applicable to disordered proteins since they do not crystallize, and when a disordered domain is attached to one or more globular domains in a crystal, the electron density for the disordered region in X-ray diffraction maps is usually too weak to interpret. It was not until the development of multi-nuclear NMR spectroscopy that residue-specific information about disordered proteins could be obtained . In the present paper, we review what we know about disordered domains of colicins [6,7], protein antibiotics that kill susceptible Escherichia coli cells by forming pores in their cytoplasmic membrane [e.g. Col (colicin) A, ColN and ColIa], hydrolysing DNA (e.g. ColE9) or RNA (e.g. ColE3) or by inhibition of peptidoglycan synthesis (e.g. ColM), which are one of the best characterized families of low-complexity IDPs.
Types of IDPs
Considerable progress has been made in understanding why a given amino acid sequence is disordered [1,2]. In part, this is a corollary to our understanding of secondary-structure propensities of amino acid residues and sequences  and our growing ability to predict tertiary structures from amino acid sequences . A common feature of IDPs appears to be a relatively low content of bulky non-polar residues [1,2] that could form a structured core in aqueous solution following the kind of hydrophobic collapse [10,11] that has been well explored in the folding of globular proteins. We distinguish three general types of IDPs.
(i) Those with a high proportion of small amino acids, such as glycine, serine, asparagine and aspartate, which are commonly referred to as low-complexity sequences .
(iii) Proteins whose folded states are not sufficiently stable under normal conditions for them to be appreciably populated . There may be many examples of this group since even folded globular proteins are usually only marginally stable . For example, for the 13 kDa ribonuclease at pH 7, the folding free energy is only approximately −30 kJ·mol−1 , equivalent to the energies of three to four hydrogen bonds.
The T-domains (translocation domains) of colicins contain low-complexity IDPs with linear interprotein binding epitopes
Consistent with the requirements for receptor binding, translocation and cytotoxicity, colicins generally consist of three domains: an N-terminal T-domain, a central receptor-binding domain and a C-terminal cytotoxic domain [6,7]. There is often no sequence identity between colicins of different families even though they may have similar killing mechanisms or binding partners, despite a common feature being that T-domains contain intrinsically disordered regions (Figures 1 and 2) that carry binding epitopes for proteins that assist with their uptake into target cells. Important helper proteins are TolA, TolB and TonB [6,7]. The disorder appears to be associated with the passage of the colicin across the outer membrane of target E. coli cells [22,23].
Order within the otherwise disordered region of the T-domain of ColIa
Order within the otherwise disordered region of the T-domain of ColE9
Significantly none of the linear binding epitopes in the IDP regions of the structurally characterized colicins are actually wholly disordered! NMR studies of ColN show that its TolA epitope is bound intramolecularly to its globular receptor-binding domain . We infer that the same is true for the TonB epitope of ColIa in solution since, in its X-ray structure (Figure 1), there is an intramolecular interaction of its TonB epitope with an α-helix reducing the flexibility of residues 23–67 . X-ray structures of ColB  and ColM  show something similar with their TonB epitopes interacting with structured regions of the same molecule and other N-terminal residues being more flexible. NMR reveals that the TolB epitope of ColA binds intramolecularly to a globular domain of the protein in solution , but the TolB epitope of ColE9 behaves differently [17,28]. The two tryptophan residues in its TolB epitope promote the formation of clusters [17,20] of interacting residues (Figure 2), which then interact with each other instead of with a globular domain of the colicin . It appears that the bulk of the side chains and their polarity play an important role in determining binding properties of linear epitopes such as the TolB epitope of ColE9. Mészáros et al. , found that the molecular interfaces generated upon intermolecular binding are significantly more hydrophobic in naturally disordered proteins compared with their overall amino acid composition and suggested that the hydrophobic core of IDPs is the interface between them and their binding partner. Consistent with this view, peaks of the AABUF (average area buried upon folding)  along the sequence of the ColE9 IDP correlate reasonably well with its intramolecular binding epitope (Figure 2, bottom) and a similar relationship is found for other colicin-binding epitopes as well . This ‘self-recognition’  may then be just a consequence of a hydrophobic patch contained within an otherwise disordered sequence, but it has marked consequences for the desired intermolecular interactions (discussed below) and for undesired ones in the form of proteolytic agents. It protects against proteolysis, at least for ColN , something that is probably significant for colicins that having been secreted by producing cells into an extracellular environment then need to traverse the protease-rich environment of the guts of animal hosts. As Espinoza-Fonseca  has pointed out, aromatic residues are common in the binding epitopes of IDPs, so there is a strong likelihood that the kind of order that creates ‘closed’ conformations of colicin IDPs on their own is common in other IDPs.
Clearly, the colicin conformers in which the interprotein binding epitopes are sequestered in intramolecular interactions are not available for their helper protein, although in solution, these ‘closed’ conformers must be in equilibrium with one or more ‘open’ conformers that are able to bind their helper proteins. In only one case, however, the TolA-binding epitope of ColN, has the equilibrium constant for the ‘closed’⇋‘open’ exchange been reported , in this case to be 0.055, which means that approximately 95% of the ColN molecules have sequestered their TolA-binding boxes in the absence of other binding partners. Clearly, the TolA-binding region must be exposed for TolA recognition and it may be that binding of ColN to its outer-membrane receptor OmpF disrupts the intramolecular T-domain binding. However, notwithstanding this, it is pertinent to note that the ColN R-domain binds to an outer-membrane protein, while its T-domain binds to a periplasmic protein, which is in a different spatial compartment so, however it happens, the ‘closed’ conformation of ColN is disrupted by interacting with its target E. coli cell.
Although the ColE9 TolB epitope is not sequestered in the same way as the ColN TolA epitope, the ‘closed’ form of the ColE9 T-domain is likely to be the dominant one in the absence of binding partners. This follows from the observation of the clusters  by 15N-NMR relaxation times and from their predicted order parameters, S2 (Figure 2, top). For a peptide amide group of a globular protein with no independent motion, S2=1, and for a fully disordered peptide amide group, S2=0 . The predicted S2 values for the ColE9 clusters are considerably greater than those expected for a disordered protein and greater even than those for some flexible regions of otherwise rigid globular proteins, consistent with a relatively long-lived structural feature.
Important features of colicin IDPs and general considerations of protein–protein interactions
Two of the major current issues with protein-binding reactions of IDPs are does their disorder lead to an increased rate of association via the so-called ‘fly-casting’ mechanism , and does the binding reaction follow an induced fit or conformational selection mechanism [3,33,34]? The rate enhancement accruing from the fly-casting mechanism was originally suggested to arise from the greater Stokes radius of an IDP compared with that of a globular protein of the same size giving the IDP a greater capture radius , but it is now recognized that IDPs have slower translational diffusion than globular proteins, leading to a reduced capture rate . Now the origin of the faster binding of IDPs to their partners is proposed to be the need for fewer encounters before formation of the final complex . However, the fly-casting mechanism assumes that the IDP can be likened to a fishing line , which will not be the case if a pre-equilibrium step from a ‘closed’ conformation is required to free the interprotein binding site. Where such an interprotein binding site has a high content of non-polar residues, then promiscuous interactions of the site should be expected, suggesting that, for colicins, the fly-casting mechanism is not so much a route to an increased rate of association compared with a similarly sized globular protein , but a means to compensate for competing binding interactions that reduce the likelihood of the desired interaction occurring.
Whether the interactions of the colicin IDPs follows an induced fit or a conformational selection mechanism [3,33,34], i.e. whether it folds into its bound conformation on binding or whether the bound conformation is one of the many available to its partner in solution and is preferentially selected leading to a displacement of the equilibrium, has not been addressed experimentally, but we can offer the following observation with the only colicin IDP epitope that has been structurally studied in its free and bound states: the TolB epitope of ColE9 [17,19]. Although the dominant form of this in solution is the ‘closed’ conformer, ‘open’ conformers will be in equilibrium with it. Using the BTPRED algorithm , we predict that the ‘open’ conformer of the ColE9 TolB epitope will primarily constitute of a series of β-turns (Figure 3). Only small changes to such a conformation is required to make it resemble the conformation of the TolB-bound ColE9 epitope observed in an X-ray structure: the backbone RMSDs (root mean square deviations) between the model and X-ray structure are only 3.05±0.03 Å. Thus it is likely that a large fraction of the ‘open’ conformer of the unbound ColE9 TolB epitope has a conformation close to that of the TolB-bound epitope, suggesting that binding follows a conformational selection mechanism.
Overlay of Trp39–Trp46 of the X-ray-determined structure (black) for the ColE9T32–47 peptide when bound to TolB [
19] with the structure of the unbound peptide predicted to contain four type I β-turns (grey) [ 17]
Implications for other IDPs
The feature of the colicin IDPs that leads to self-recognition is the presence of a linear binding epitope for partner proteins with a high proportion of hydrophobic residues. In aqueous solution, these interact with a suitable region of the surface of the same molecule in what we envisage to be something like the hydrophobic collapse of an unfolded globular polypeptide during the early stages of folding. The surface with which it interacts can be ordered or disordered; the important thing is that hydrophobic residues in the binding epitope are sequestered from solvent. Since both linear epitopes  and aromatic residues within epitopes  are common in IDPs, it is likely that the conformational ensembles of IDPs in the absence of their binding partners contain a significant contribution from one or more conformers in which the binding epitope is not available for partners.
How Bugs Kill Bugs: Progress and Challenges in Bacteriocin Research: A Biochemical Society Focused Meeting held at University of Nottingham, U.K., 16–18 July 2012. Organized and Edited by Colin Kleanthous (Oxford, U.K.), Chris Penfold (Nottingham, U.K.) and Dan Walker (Glasgow, U.K.).
We thank past and present UEA (University of East Anglia) colleagues Ruth Boetzel, Nick Cull, Angelo Figueiredo and Sara Whittaker for helpful discussions and technical assistance; and Colin Kleanthous (Oxford), Jeremy Lakey (Newcastle), Chris Penfold (Nottingham) and Richard James (Nottingham) for their productive collaborations.
We gratefully acknowledge the Wellcome Trust, the Wolfson Foundation and the Biotechnology and Biological Sciences Research Council for their support of our work.