The Collagen Toolkits are libraries of 56 and 57 triple-helical synthetic peptides spanning the length of the collagen II and collagen III helices. These have been used in solid-phase binding assays to locate sites where collagen receptors and extracellular matrix components bind to collagens. Truncation and substitution allowed exact binding sites to be identified, and corresponding minimal peptides to be synthesised for use in structural and functional studies. 170 sites where over 30 proteins bind to collagen II have been mapped, providing firm conclusions about the amino acid distribution within such binding sites. Protein binding to collagen II is not random, but displays a periodicity of approximately 28 nm, with several prominent nodes where multiple proteins bind. Notably, the vicinity of the collagenase-cleavage site in Toolkit peptide II-44 is highly promiscuous, binding over 20 different proteins. This may reflect either the diverse chemistry of that locus or its diverse function, together with the interplay between regulatory binding partners. Peptides derived from Toolkit studies have been used to determine atomic level resolution of interactions between collagen and several of its binding partners and are finding practical application in tissue engineering.
The human collagens are a family of 28 triple-helical proteins that form, collectively, the most abundant protein group in vertebrates. Collagens contain a defining Gly-x-x′ repeating sequence essential for the assembly of a right-handed triple-helix (COL domain); lacking a side chain, glycine (G) alone can occupy the axial position of the triple helix. The x and x′ positions of the constituent α-chains often contain proline (P) and hydroxyproline (O), respectively, that support the left-handed polyproline helix II which is adopted by each strand of the collagen superhelix. Ricard-Blum has reviewed the collagen family , illustrating for each member the location of its COL domains and of other structures occurring in the non-helical regions of the molecule, and the diversity of their supramolecular organisation.
The structural properties of the collagens are fundamental to the integrity of the extracellular matrix. Most obviously, the major fibrillar collagens I, II and III provide tensile strength to skin, bone, tendon, cartilage and blood vessel wall. To fulfil this seemingly inert mechanical role, collagen must offer specific interaction sites for other matrix components, which implies evolutionary diversification from the primordial collagen GPO repeat . Collagens also contribute to the cellular niche, a nidus which offers the cell both anchorage and survival, and within which connective tissue cells fulfil their normal functions. Such regulatory roles of collagen require engagement of diverse cell surface receptors  and the evolution of complex signatures within the primary Gxx′ sequence of the collagen α-chains in concert with the maintenance of triple helix stability. A recent review  expands on much of the foregoing.
Early methods of mapping binding sites for cells or proteins within the collagens were not straightforward. Typically, purified collagens extracted from tissue would be fragmented using cyanogen bromide (CB), cleaving at Met (M) residues. The resulting defined linear fragments (CB peptides) would be separated chromatographically, then, if sufficiently long to be thermally stable, reassembled as triple helices, and their capacity to bind target proteins established. This reductive approach allowed protein binding to be mapped, but those sites containing M (e.g. some integrin sites and the von Willebrand factor (VWF) locus – see below) were disrupted and so not identifiable. Barnes employed this method extensively 30 years ago to map platelet receptor binding to collagen [5,6]. Further progress towards exact binding sites might use blocking antibodies and epitope mapping [7,8]. Alternatives included rotary shadowing, using transmission electron microscopy to locate target proteins bound to collagen monomers. Unambiguous identification of the tropocollagen molecule orientation required an asymmetric marker (e.g. the N-propeptide of collagen III, or a bound antibody) [9,10]. Rotary shadowing yielded relatively low-resolution data, but, unlike CB peptide mapping, could be applied to intact heterotrimeric species such as collagen I.
Development of the Collagen Toolkits
The synthesis of collagen-like peptides in several laboratories opened the way to rapid and independent verification of putative protein-binding sites [11,12]. The Barnes group pioneered this approach by addressing the platelet-reactivity of blood vessel wall collagens; having located an integrin α2β1-binding site in CB3 from the collagen I α1 chain, they went on to synthesise seven overlapping triple-helical peptides spanning 150 residues of primary sequence, and so located the sequence GFOGER as the minimal integrin-recognition motif. This strategy neglected the α2 chain, and its success depended upon binding activity being expressed by the α1 chain alone. This peptide set was the prototype Toolkit, and identification of GFOGER led to the first integrin-ligand co-crystal (PDB: 1DZI), and by sequence homology, the location of several further integrin-binding GxOGER and related motifs. This breakthrough guided others, notably the Höök group, who used asymmetric binding to three sites in chicken collagen I and recombinant human collagen III, visualised by rotary shadowing, to locate integrin binding motifs, and proved authenticity of these sites by synthesis of triple-helical peptides [13,14].
With Wellcome Trust funding, the synthesis of the systematic collagen III Toolkit commenced in 2003 and was completed after 18 months by Nicolas Raynal and Graham Knight. Collagen III was selected first for its role in blood vessel wall and because it is homotrimeric, promoting authentic peptide self-assembly, not then feasible for the heterotrimeric collagens such as the other key vessel wall species, collagen I. Toolkit peptides (TKPs) contained 27 residues from the sequence of the collagen III COL domain, starting at its N-terminus and advancing successively by 18 residues, allowing a 9-residue overlap between adjacent peptides. Primary (guest) sequence was flanked on each side by five GPP (host) triplets with sufficient propensity to form triple helices to drive this overall conformation regardless of unfavourable guest sequence. At each end, the peptide terminated with a GPC extension to allow cross-linking of peptides as desired (an elaboration required for platelet activation by the GPO polymer, collagen-related peptide, CRP-XL). Thus, each TKP is 63 residues long.
Collagen III is anomalous; alignment of the fibrillar collagens reveals that its COL domain protrudes by 9 and 6 extra residues beyond the N- and C-termini of the corresponding 1014 residues of collagens I and II, so that it contains 1029 residues. Hence, Toolkit III comprises 57 peptides, and Toolkit II, completed a year later, 56 peptides. With hindsight, we would have aligned the peptides in Toolkits II with those of Toolkit III by sequence homology rather than starting at the N-terminus of each of their COL domains. The α1(II) chain enjoys 78.9% identity with α1(I) but rather less (65.5%) with α2(I), so that, unless the α2(I) chain is critically involved in binding, results obtained with Toolkit II are likely to apply to collagen I.
The Toolkit project was envisaged as essentially collaborative; whilst our own focus remained on platelet surface collagen receptors beginning with integrin α2β1, peptides were distributed to colleagues in other laboratories to explore their specific targets. To date, we have mapped 38 different proteins binding to collagen, several as yet unpublished. Our approach has been to locate the TKP exhibiting highest binding of target protein in ELISA-like assays, check whether adjacent peptides showed activity which might reside in their 9-residue overlap, then to synthesise truncated peptides until activity was lost. Next, an Alanine-scan of x and x′ residues within the [Gxx′]n minimal binding motif would identify the residues involved in binding, allowing short GPP- or GPO-flanked triple-helical ligands to be made which could be used in structural studies [15–21], or to manipulate the target protein in biological settings [22–26].
This strategy has been applied to the major collagen receptors: the four collagen-binding integrins [17,27–31], the three known immune receptors [32–34], both discoidin domain receptors (DDRs) [35,36] and G protein-coupled receptor 56 (GPR56).1 Matrix proteins have also been investigated: VWF, SPARC, several matrix metalloproteinases (MMPs), fibronectin (FN), thrombospondin-1 (TSP-1) , small leucine-rich repeat proteins (SLRPs) including fibromodulin (FMOD)  and chondroadherin (CHAD) , multimerin 11 and dermatopontin.1 In addition, three bacterial adhesins have been studied [39,40].
Recently, the Baumann group reported a binding site for thrombospondin-4 and -5, using a recombinant collagen II Toolkit, with its triple-helical structure stabilised by a viral foldon domain . This elegant approach suffers some drawbacks, notably the absence of proline hydroxylation in the recombinant Toolkit. Thus, it will necessarily fail to detect binding sites for which O is crucial, such as for VWF and the collagen-binding immune receptors, whilst others, such as the integrin α1β1, display reduced affinity for motifs lacking O, i.e. GFPGER . This approach, however, readily allows the insertion of sequences of any length. A similar use of foldon-templated collagen fragments allowed Zwolanek et al. to identify novel Gxx′GER integrin-binding motifs in collagen XXII . Similarly, a recombinant collagen II Toolkit was used to probe human autoantibodies that may be causal in rheumatoid arthritis . This library suffers from the same drawbacks as the foldon approach, lacking O, and as a consequence, the melting temperature of some members was quite low, limiting their application.
Limitations of the Toolkit approach
Whilst any binding site up to 9 residues will in principle be located using the Toolkits, longer binding sites that happen to span the overlap region will be disrupted and may be missed. Further, hypothetical composite binding sites comprising adjacent helices in a collagen fibre will not be identified using a random TKP coating in a solid phase ELISA-like assay. We have restricted our focus to the 1014- or 1029-residue COL domains, excluding the contiguous non-helical telopeptides and the more distal propeptides. Our findings should be applicable to homologous motifs in other collagens, at present excluding the heterotrimeric collagens, although recent progress in this area has been made. It should be noted that even some homotrimeric collagens, e.g. the fibrillar XXIV and XXVII, contain short interruptions to the Gxx′ repeat sequence, and it is not yet clear whether synthetic TKPs would properly reflect these anomalies without the support of the underpinning fibrous assembly. The inclusion of GPC terminal triplets in our Toolkits allows us to cross-link the peptides specifically as required, but may result in spontaneous oxidation and limited disulphide bond-mediated polymerisation of triple helices. This may be a drawback in some settings, e.g. for use in kinetic binding studies, but has the unexpected advantage that immobilisation of peptides on plastic ELISA wells is increased by the presence of the GPC extensions, possibly by increasing their avidity for the hydrophobic surface. This makes for more consistent coatings and binding activity .
Periodic distribution of sites across the collagens
Many target proteins bind several peptides across the Toolkits, and a compilation of 168 binding sites mapped for 30 proteins on Toolkit II is shown in Figure 1. Several conclusions can be drawn. Most prominent, TKP II-44 supports the binding of two-thirds of proteins examined to date, most with relatively high affinity. Competition between these species would be expected in vivo. Some of these binding partners are known to be involved in either the assembly of collagen fibres or their proteolysis, whereas others may simply exploit the diverse chemistry of II-44, which displays a hydrophobic N-terminal tract and a charged or polar C-terminus, whilst a GPO triplet close to the cleavage site located in TKP II-43 may introduce flexibility  and further enhance binding opportunities for the native collagen.
Binding of proteins to collagen II is periodic.
Binding activity appears periodic rather than uniformly distributed across the tropocollagen molecule. Fourier transform of the binding distribution suggests a periodicity of approximately 10 TKPs, corresponding to a half-wave of approximately 26 nm, or 0.42 D-periods, close to the reported length of the overlap region in collagen II . This might indicate that higher-order fibre structure dictates the evolution of binding activity in collagen.
Orgel has proposed that the intrinsic twist of the tropocollagen molecules within each microfibril leads to the burial of much of D-periods 1, 2 and 3 within the assembled fibre [48,49]. However, one might expect collagen-binding proteins crucial for mature tissue function to have evolved to recognise motifs that are available on the fibre surface. We have argued previously that concealment of crucial binding sites, for VWF and for integrin α2β1 located in D1 to D3, for example, would render them unable to perform their essential roles in haemostasis . The presence of binding nodes extending from D1 into D4 is at odds with crucial sites therein being inaccessible within the collagen fibre. Just 4 of the 12 nodes discovered using the Toolkits lie within the D-period overlap that Orgel dubbed the ‘Master Control Region’ where important binding sites were proposed to reside . The number of proteins that bind each D-period does not differ greatly, although the promiscuity of TKP II-44 skews the distribution of sites within D4. Our findings may inform debate on the exposure of the D-periods on the surface of the collagen fibre.
In contrast, sites utilised only in free tropocollagen molecules in situ, either prior to fibre assembly or released from fibres during collagen turnover, suffer from no such evolutionary constraints. Such sites might include those at or close to both the GxKGHR cross-linking motifs (located at residues 87 and 930 of the COL domain, in D1 and D4), where regulatory proteins may operate prior to or during fibre assembly, and the collagenase cleavage site, where proteins may bind collagen fragments after hydrolysis.
Which amino acids predominate in binding sites?
Comparison of the primary sequence of the 12 binding nodes and the 13 inert TKPs reveals significantly different compositions (Figure 2). Of hydrophobic residues, F and L are more abundant in nodes, reflecting their crucial contribution to hydrophobic interactions that have been defined in the structural studies outlined below. I, M, V and Y are relatively sparse but equally-distributed. The polar residues E, N, Q and T are also equally represented, whilst S is more abundant in inert peptides. The imino acids, P and O, are more abundant in inert and node peptides, respectively. Since O is almost confined to the collagens, a role in specific binding would be expected and is confirmed below. Other charged residues show large differences, with D and K markedly under-represented in binding nodes, and R over-represented. It is not surprising that A occurs less frequently in binding peptides, its short side chain, like those of D and S, not protruding far from the collagen helix, minimising binding opportunities. These global conclusions are derived from the full 27-residue guest sequence, extending beyond the binding motifs themselves. Local echoing of the binding residue properties may serve as a means of recruiting targets to the vicinity of the more specific motif, increasing the probability of productive interaction.
Shows the numbers of each amino acid found in the x and x′ positions of the 12 peptides that reach the node threshold of five binding proteins (red bars), and the corresponding information for the 13 peptides with no partners (blue bars)
Some interactions can be examined more precisely: crystallography of proteins in complex with peptides (short motifs identified using Toolkit assays outlined above) yields positive atomic-level data on key interactions, subject to the usual artefacts of crystallisation. In contrast, alanine-scanning of such motifs generally yields negative data: loss of interaction through ala-substitution. Both datasets are available in some cases allowing consensus to be reached (see Table 1). However, amino acid substitution may alter peptide conformation, introducing a structural artefact. For example, an O residue destabilises the endo–exo ring pucker of an adjacent P, introducing flexibility into the helix  which may be essential for candidate protein binding. It follows that Ala-scanning at O may reduce binding, without O contributing directly. In contrast, we have observed increased binding of both MMP-1 and MMP-13 to a truncated version of TKP II-44, close to the collagenase cleavage site, when E within its RGER motif was replaced with A [51,52]. It may be that structural effects, e.g. elimination of the E–R side-chain interaction  altering the effective diameter of the helix, improve contact with the MMP at the apposed hemopexin domain.
|FN (crystal contacts )||GQRGIVGLOGQRGERGFOGLOG|
|MMP-13 (Ala-scan )||GPQG∼LAGQRGIVGLOGQRGER|
|MMP-1 (Ala-scan )||GPQG∼LAGQRGIVGLOGQRGER|
|MMP-1 (crystal contacts )||GPOGPQG∼LAGQRGIVGLOGQRGER|
|FN (crystal contacts )||GQRGIVGLOGQRGERGFOGLOG|
|MMP-13 (Ala-scan )||GPQG∼LAGQRGIVGLOGQRGER|
|MMP-1 (Ala-scan )||GPQG∼LAGQRGIVGLOGQRGER|
|MMP-1 (crystal contacts )||GPOGPQG∼LAGQRGIVGLOGQRGER|
The Table shows the residues found to be crucial for binding of FN and MMPs to collagen-derived peptides. Erat et al.  used a linear peptide (sequence as indicated) to form a complex (PDB: 3EJH) with the 8-9FnI module pair, and found a series of interactions (highlighted turquoise) that stabilised the β-strand adopted by the peptide on the FN surface. We used Ala-scanning of a Toolkit-derived triple-helical peptide, including the collagenase cleavage site, and found the first Lx′GxR motif to be critical for the binding of full-length FN. The same peptide set was used to test binding of MMP-1 and MMP-13, and similar residues were involved, highlighted in magenta. Notably, some ala-substitutions of charged or polar residues enhanced binding, highlighted in yellow, perhaps for structural reasons discussed in the text. This applied even to MMP-1 although the hemopexin domain that aligns with the RGER motif makes little contact with the peptide (PDB: 4AUO). The MMP-1 co-crystal showed more extensive contacts, highlighted in turquoise, on both sides of the scissile G∼L bond than were revealed by Ala-scanning.
Represents an unpublished protein-Toolkit interaction.
Long side chains extending away from the axis of the helix dominate protein–TKP interactions, illustrated in Figure 3, which shows the structure (PDB: 1Q7D) of the free integrin-binding peptide , with F, E and R side chains clearly resolved. Thus, F occurs commonly within binding nodes, although under-represented in collagen with just 13 and 8 occurrences in collagens II and III. Random codon usage in collagen genes would result in F occurring 21 times. R also occurs frequently in binding sites, whilst the similarly abundant K has been located just once (see below). E is required in integrin-binding GxOGER motifs, but we have observed only one other contribution of E to binding, in the FN-binding site (TKP II-44) that overlaps with the MMP cleavage site. The over-occurrence of E in collagens II and III (53 and 48 times) relative to F indicates an important role for E in collagen which appears unrelated to its binding activity. This may reflect charge-stabilisation between adjacent α-chains and triple-helices in the collagen fibre.
shows the structure of the free integrin-binding peptide, [GPO]2GFOGER[GPO]3, using PDB file 1Q7D [
Hydrophobic interactions of collagen residues other than F frequently stabilise the binding of proteins. Thus, L, V and M are key contributors to several interactions, such as with integrins, DDRs, VWF, SPARC, OSCAR (PDB: 5EIV) and with MMPs and FN. Two aromatic residues, W and Y, form part of the apposed binding pocket in several proteins, with which collagen F in particular can interact. The aliphatic stems of R may also contribute to such hydrophobic pockets .
Integrins represent a special case, where the canonical E carboxylate anion of the GxOGER motif directly coordinates Mg2+ in the metal ion-dependent adhesion site of the collagen-binding integrin I-domains, in α1-, α2-, α10- and α11β1. This interaction is supplemented by the hydrophobic x residues in both the leading and middle strands of collagen which contact the surface of the I domain. These hydrophobic residues determine overall affinity, with L and M able to complete relatively high-affinity motifs . The hydrophobic stem of R in GROGER may assume the same function, although no crystal structure exists . Ala-substitution at x results in near-basal affinity for the resting integrin; GAOGER, naturally occurring in collagens II and III, becomes important only for the activated integrin .
The collagen-binding immune receptors, glycoprotein (GP) VI, LAIR1 and osteoclast-associated receptor (OSCAR), rely on the unique abundance of O residues within collagen for binding specificity. Such tracts are found at the C-terminus, TKPs II-56 and III-57. GPO polymers are sufficient for recognition by GPVI  and its subsequent activation, provided higher order structure is introduced into the collagen peptide, whilst LAIR1  and OSCAR [20,32] require a greater diversity, with OSCAR in particular requiring an F residue (PDB: 5EIV). Although both GPVI and LAIR1 bind TKP III-30 with high affinity, there is little overlap between the repertoire of TKPs recognised by the three receptors despite conservation of 3D structure and primary sequence.
Specific binding nodes
Integrin sites: GxOGER
The integrin-binding GxOGER and related motifs occur at conserved loci across collagens I, II and III . In Toolkit II, these occur at peptides 7 and 8, 28, with a low affinity site in 44. These motifs bind few other proteins, indicating the importance of the cell-integrin interaction. Note that an atypical positive charge  on its surface prevents α10β1 from binding GROGER, present in collagens I and III, but not II. These sites display selectivity for the different integrins, with α1- and α10β1 preferring GLOGEN, in collagen III, whilst α2- and α11β1 express higher affinity for GFOGER, in collagens I, II, IV and others . There is only limited evidence for absolute specificity, but analysis in a cellular setting is confounded by the activation state of the integrin, which may override intrinsic selectivity.
Cross-linking sites: GxKGHR
The only observed lysine-containing binding motif within the Toolkits is GxKGHR, the key inter-helix cross-linking site. In nature, covalent cross-links can form through the condensation of a lysine aldehyde located within the telopeptide of an α-chain in one helix with a target lysine residue in a nearby helix COL domain . Thus, in the staggered assembly of the native collagen fibre, the N-telo-lysine aldehyde condenses with K-930, and the C-telo-lysine aldehyde with K-87. The target GxKGHR motif is found in the unique sequence of TKP II-5 (K-87) and II-52 (K-930) and, as a result of the 9-residue offset between Toolkits II and III, in the overlaps of TKP III-5/6 and III-52/53. The lysine complement of native collagens is at least partially hydroxylated , whereas we have included no side-chain modification other than O in the Toolkits. This may account for the under-representation of K in the observed binding sites, since it is plausible that hydroxylysine contributes to binding in nature although absent from our experiments. Acetylation of K compromised binding of FMOD to GxKGHR motifs , suggesting that its natural covalent modification through hydroxylation or further derivatisation, e.g. glycosylation and glycation, may be disruptive. On the other hand, Ala-scanning of the unmodified GxKGHR motif abolished target molecule binding, showing both K and R to be key contributors to the interaction with FMOD. Similarly, Gebauer et al. found that substitution of K by P reduced binding of cartilage oligomeric protein (COMP or TSP-5) to a T4 foldon-guided peptide set containing GxKGHR . No crystal complex between a target protein and a full KGHR-containing motif is yet available, which might elucidate the binding mechanism, although R-containing model peptides have been shown both to bind and yield structures with HSP47 [59,60]. Several molecules known to be involved in the regulation of collagen cross-linking have been shown to bind at, or close to, this motif, including FMOD and TSP-1, both of which may help recruit the lysyloxidase which effects the hydroxylation of the telopeptide lysine residues needed for cross-link formation.
Collagenase site G∼LAGQRGIVGLOGQRGER
This site, comprising much of TKP II-44, contains the unique collagenase cleavage site. Notably, II-44 differs in only two conserved residues from the corresponding tract of collagen α1(I), so Toolkit II results will likely apply to collagen I. II-44 binds both collagenases investigated in detail, MMP-1 and MMP-13 [51,52] along with many other molecules, including low-affinity interactions with the DDRs , FMOD  and TSP-1 , whilst the GQRGER motif that interacts with the hemopexin domain of full-length MMP-13 is also a weak integrin-binding site. Interestingly, the free hemopexin domain binds to both the N-terminal GLAGQR and the central GLOGQR tracts, established by Ala-scanning , offering a second nearby locus for recruitment of MMP-13. MMPs and FN each interact with a long tract of collagen at this locus, at least four triplets, so that all three α-chains are likely to be involved [52,62]. Data are summarised in Table 1. A linear collagen peptide adds a strand to a β-sheet of FN, which might imply unwinding of helical collagen as FN binds . Although a similar mechanism has been mooted for MMP-1 binding to this locus, crystallography shows only minor relaxation of the helix in an MMP-1–peptide complex . Comparison with an FN-helical peptide complex is needed. This marked overlap of binding motifs suggests that competition between FN and MMP-1 or MMP-13 for this site in collagens I and II would be inevitable.
VWF A3 site: GxRGQOGVMGFO
This site, recently reviewed by Chen and Lin , was identified as binding the VWF A3 domain , along with SPARC [9,19] and DDRs 1 and 2 [16,35,61]. It is found at TKP II-22 and III-23. The DDRs and SPARC utilise only the last two triplets for binding, as indicated by co-crystal structures with DDR2 DS domain (PDB: 2WUH) and with SPARC (PDB: 2V53), whilst VWF A3 also interacts with the preceding two triplets (PDB: 4DMU). This preceding sequence may be required for DDR signalling . Like the extended collagenase site, all three α-chains contribute to the interaction with A3. Importantly, this site provided insight into the chain register of collagen I, since neither α1(I) nor α2(I) contain a complete A3-binding motif, and the likely composite site is best assembled with α2(I) in the trailing position .
Three collagen-binding adhesins have been studied. CNE, an adhesin of S. equi equi, binds with highest affinity to TKP II-44, and also to TKP II-1. CNE may compete with the host proteins that recognise II-44, such as FN, perhaps so contributing to the virulence of the pathogen by disrupting endothelial cell αVβ3 interaction with FN . In marked contrast, two other adhesins, Yersinia adhesin A  and S. pyogenes M31 are promiscuous, binding many sites across both Toolkits. The large number of peptides binding YadA allows statistical analysis of its binding propensity, which revealed a marked preference for hydrophobic residues, and for Pro and Hyp. This lack of specificity across the Toolkits may indicate that non-selective collagen binding has a role in virulence, where escape from the bloodstream and sequestration within the host tissue may be more important, by allowing the pathogen to evade host immune defences, than the specific disruption of host tissue homoeostasis proposed for CNE.
Several proteins have been observed to bind sites in different D-periods that prove to be aligned within the fibrillar collagen assembly. For example, FMOD binds TKPs III-5 and III-44 , and TSP-1 binds II-5, II-20 and II-45 . Both FMOD and TSP-1 have a role in regulating fibre cross-linking (which involves K in the GxKGHR motif found in II-5 and III-5) and in fibre assembly. It seems likely that one collagen helix provides a platform within a fibre from which these molecular chaperones can deliver the required activity to the site in the adjacent D-period. As well as this functional role, such alignments may also permit co-operative binding of the trimeric TSP-1. CHAD may perform a similar function, binding a single site, II-26, which aligns perfectly with the GxKGHR cross-linking site in II-52 , shown in Figure 4. This would imply simultaneous accessibility of the relevant sections of D2 and D4.
Shows a simple model of the alignment of tropocollagen molecules and D-periods in a collagen fibre (above) and an expanded section (below) showing the perfect alignment of the sole CHAD-binding sequence in TKP II-26 (at the D2–D3 boundary) with the KGHR cross-linking motif in D4
The identification of specific binding motifs within collagen has provided tools to manipulate cell function in research and other applications. The Leitinger group used integrin and DDR-specific peptides to probe integrin-mediated cell adhesion . We used peptide-coated surfaces to characterise the determinants of thrombus deposition [24,25,65,66], a method being developed elsewhere for diagnostic applications [67,68]. The Garcia group exploited the integrin motif GFOGER as a surface coating for orthopaedic devices , whilst Koide discusses the use of collagen peptide-based biomaterials . Bacterial collagens have enjoyed some popularity as alternatives to synthetic peptides, and Toolkit-derived motifs have been introduced into bacterial collagens with a view to producing materials for tissue engineering [71–73]. In this laboratory, we have used photochemical coupling of specific motifs to enhance the cell-reactivity of collagen scaffolds, a generic technique which could be applied to derivatise many organic substrates [74,75].
The next big step in the field will be to map binding sites in heterotrimeric collagens. Hartgerink and colleagues have designed heterotrimeric peptides using charge-complementarity to align the different collagen strands correctly, to produce collagen I-like self-assembling peptides [76,77], an approach recently brought to fruition by Jalan et al. [78, under review]. The synthetic challenge this strategy presents is substantial, requiring at least three discrete peptides for each heterotrimer along with knowledge of the register prevailing in the corresponding collagen.
We believe the TKP-derived collagenous materials will find application in drug discovery, diagnostics and regenerative medicine as well as being valuable and specific research reagents.
The Collagen Toolkits (systematic synthetic peptide libraries spanning the COL domains of collagens II and III) simplify the mapping of receptor and other protein-binding sites on the collagen triple helix.
Compilation of data from over 30 collagen-binding proteins reveals binding activity across the length of the mature collagen II triple-helical domain.
Binding activity is periodic, being concentrated in nodes with approximately 28 nm spacing (corresponding to approximately five TKPs) across all the D-periods of collagen II.
Amino acids are not equally distributed between nodes and inert peptides; nodes are enriched in F, L, R and O, whilst non-binding peptides contain A, D, K, P and S in excess.
Synthetic peptides derived from the Toolkits underpin structural studies and provide ligands to manipulate cell and protein function, and will in future find application in tissue engineering and regenerative medicine.
Represents an unpublished protein–Toolkit interaction.
The Toolkit project owes much to the pioneering work of Michael Barnes, and to the dedication and perseverance of the peptide chemists, Graham Knight, Nicolas Raynal, Loraine Foley, Dominique Bihan, Jean-Daniel Malcor and Arkadiusz Bonna. Samir Hamaia made a huge contribution through protein expression and the development of solid-phase binding measurement throughout.
This work was supported by the Wellcome Trust [grant numbers 068724/Z/02/Z and 094470/Z/10/Z]; the Medical Research Council [grant numbers G0400701 and G0500707]; and the British Heart Foundation [grant number RG/15/4/31268].
The author declares that there are no competing interests associated with the manuscript.
- COL domain
triple-helical domain of a collagen
discoidin domain receptor
secreted protein, acidic, and rich in cysteine
von Willebrand factor