The procollagen C-propeptides of the fibrillar collagens play key roles in the intracellular assembly of procollagen molecules from their constituent polypeptides chains, and in the extracellular assembly of collagen molecules into fibrils. Here we review recent advances in understanding the molecular mechanisms controlling C-propeptide trimerization which have revealed the importance of inter-chain disulphide bonding and a small number of charged amino acids in the stability and specificity of different types of chain association. We also show how the crystal structure of the complex between the C-propeptide trimer of procollagen III and the active fragment of procollagen C-proteinase enhancer-1 leads to a detailed model for accelerating release of the C-propeptides from procollagen by bone morphogenetic protein-1 and related proteinases. We then discuss the effects of disease-related missense mutations in the C-propeptides in relation to the sites of these mutations in the three-dimensional structure. While in general there is a good correlation between disease severity and structure-based predictions, there are notable exceptions, suggesting new interactions involving the C-propeptides yet to be characterized. Mutations affecting proteolytic release of the C-propeptides from procollagen are discussed in detail. Finally, the roles of recently discovered interaction partners for the C-propeptides are considered during fibril assembly and cross-linking.
Collagens are the most abundant proteins in mammals (∼30% of total protein mass), where they are occur as 28 different genetic types, with each molecule consisting of three polypeptide chains containing triple-helical and globular domains [1,2]. Those that form the regular banded fibrils seen by electron microscopy are synthesized as soluble procollagen molecules with large N- and C-propeptides flanking the triple-helical domain. Procollagen synthesis is a complex process involving several intracellular compartments and accessory proteins including enzymes and chaperones . The C-propeptides have long been known to (i) direct the intracellular assembly of procollagen molecule from their constituent polypeptide chains and (ii) limit further assembly into fibrils by maintaining solubility prior to proteolytic release from the rest of the molecule during secretion into the extracellular matrix (ECM). The main proteinase involved in C-propeptide release is bone morphogenetic protein-1 (BMP-1) which is assisted by another ECM protein, procollagen C-proteinase enhancer-1 (PCPE-1) that binds strongly to the C-propeptides . Despite the important roles of the C-propeptides, a distinct protein module present since the dawn of the metazoa , only in the last few years have the underlying molecular mechanisms been revealed. Here we review these advances which not only provide answers to long-standing questions but also give insights into further roles of the C-propeptides in health and disease.
C-propeptides in intracellular assembly
During intracellular assembly of the procollagen molecule, early work showed that trimerization begins, in the rough endoplasmic reticulum (RER), by association of C-propeptides from its three polypeptide chains, followed by zipper-like folding of the triple-helical region towards the N-terminal end . Depending on the procollagen type, molecules consist of three identical α1 chains (homotrimers, as in types II and III), two identical α1chains and a third non-identical α2 chain (heterotrimers, as in the usual forms of types I and V), or three non-identical chains (as in type XI). With the exception of some forms of types V and XI collagens , assembly of molecules from chains of different collagen types does not occur. In addition, since often the same cell synthesizes multiple collagen types, this means that ensuring the correct chain composition of each procollagen molecule requires a specific recognition mechanism between different C-propeptide domains. Clues to such a mechanism were provided by Bulleid and co-workers  who identified a region of 23 residues in the C-propeptide of procollagen III that when used to replace the corresponding region in a mini-procollagen chain normally incapable of homotrimerization, resulted in the formation of homotrimers. This region consists of two variable sequences, of lengths 12 and 3 residues, straddling a conserved 8 residue sequence, in the middle of an otherwise highly conserved C-propeptide sequence (∼245 residues). This discontinuous 15 residue segment, now known as the chain recognition sequence (CRS), therefore appears to play a key role in inter-chain interactions during C-propeptide trimerization.
In the absence of information on the three-dimensional structure, the molecular mechanism by which the CRS specifies inter-chain interactions remained a mystery until the first C-propeptide structure (from procollagen III) was published in 2012 (Figure 1) . This revealed a previously unidentified protein-fold, with the trimer in the shape of a flower, beginning with a stalk region near the site of proteolytic release from the rest of the procollagen molecule, then a base region containing bound Ca2+ ions and inter-chain disulphide bonds and finally three petals projecting outwards from the core. When viewed down the axis from the top (Figure 1B), the long and short segments of the CRS, at each end of helix α4, interact with the short and long CRS segments, respectively, in adjacent chains. The structure therefore reveals the molecular basis of how sequence changes in the CRS can contribute to the specificity of inter-chain interactions during C-propeptide assembly.
Structure of the C-propeptide trimer of human procollagen III
A surprising observation with the C-propeptide homotrimer from procollagen III (CPIII) was that even though the amino acid sequences of all three chains are identical, close examination shows small structural differences at one of the three inter-chain interfaces, thus making the overall structure asymmetric . Heterotrimeric C-propeptides are also asymmetric, since one of the three chains is the product of a different gene. These observations suggest the possibility that structural asymmetry in homotrimers could be a driving factor in the evolution of heterotrimers. In the case of procollagen I, while usually this is present as a heterotrimer, in embryonic tissues and in diseases (cancer [9–11], fibrosis , genetic disorders [13,14]) a homotrimeric form is observed. Unlike CPIII however, the recently determined structure of the C-propeptide homotrimer of procollagen I (homo-CPI) showed little sign of intrinsic asymmetry . Instead, compared with CPIII, there are relatively few inter-chain interactions involving the CRS in homo-CPI (Figure 2).
Interaction interfaces in different C-propeptide trimers
Concerning the heterotrimeric form of CPI (hetero-CPI), while small angle X-ray scattering showed the overall shape to be similar to CPIII, all attempts to crystallize this form have so far been unsuccessful . It has however been possible to model the structure of hetero-CPI and test this model by site-directed mutagenesis. This analysis revealed that there are just four specific inter-chain interactions (not including inter-chain disulphide bonds, see below), involving both the CRS and elsewhere in the structure, that are required for heterotrimer formation between the α2(I) chain of CPI (where the roman numeral indicates collagen type) and the two adjacent α1(I) chains (Figure 2). One of these interactions involves Glu143 in the α1(I) chain (CRS short) which forms a salt bridge with a lysine in the α2(I) chain (CRS long). Interestingly, there is a naturally occurring Glu143Lys mutation in the C-propeptide α1(I) chain that is associated with the brittle bone disease osteogenesis imperfecta (OI) type IV . As expected from the modelling, insertion of this mutation into the α1(I) chain in the hetero-CPI expression system prevented the formation of heterotrimers, while assembly of homotrimers was unaffected .
Another important structural feature in C-propeptide assembly is the presence of inter-chain disulphide bonds. In the C-propeptides of procollagens II, III and the α1(I) chain of procollagen I, there are eight cysteines per chain, among which cysteines 1, 4, 5, 6, 7 and 8 are involved in intra-chain interactions and cysteines 2 and 3 in inter-chain interactions (indicated by * in Figure 2). Cys2 however is replaced by serine in the α2 chain of procollagen I, resulting in the absence of a disulphide bond at the corresponding inter-chain interface in the heterotrimer (Figure 2D), albeit that Cys3 in the α2 chain forms a disulphide bond with another α1 chain (Figure 2C). As shown recently by Shoulders and co-workers , by sedimentation equilibrium analysis, the absence of Cys2 prevents the α2(I) C-propeptide chain from forming stable homotrimers. In contrast, when Cys2 is restored by site-directed mutagenesis, stable homotrimers are formed. This contradicts earlier observations [7,18] using a mini-procollagen version of the α2(I) chain with a restored Cys2 expressed in vitro, unlike the expression of the isolated C-propeptide region in HEK293 cells . Possible explanations for the discrepancy include differences in concentration of Ca2+ ions, which bind close to the inter-chain disulphide bond (Figure 2) and are both necessary and sufficient for non-covalent trimer formation , or perturbation of C-propeptide assembly by other regions of the mini-procollagen system. It is clear however that disulphide bonding plays a key role in stabilizing heterotrimer assembly, as demonstrated most strikingly by the finding that co-expression of α1(I) C-propeptides carrying a Cys2Ser mutation with α2(I) C-propeptides carrying a Ser2Cys mutation results in heterotrimers with a 1:2 ratio of a1(I) to a2(I) chains, unlike the normal 2:1 ratio using wild-type chains . Furthermore, for all types of heterotrimers with 3 distinct chains such as the α1(V)α2(V)α3(V) form of collagen V, chain 1 contains Cys2 and Cys3, chain 2 contains only Cys3 and chain 3 contains only Cys2, as would be expected if chains 2 and 3 both form disulphide bonds with chain 1 (Figure 3). The role of the CRS and other regions containing collagen type-specific residues involved in inter-chain interactions is therefore to fine-tune the specificity of trimerization once inter-chain disulphide bonding has determined the overall stoichiometry (homotrimer 1:1:1, heterotrimer 2:1 or heterotrimer 1:2:3).
Prediction of collagen trimerization propensities based on the pattern of cysteines C2 and C3 involved in inter-chain disulphide bonds (C1 and C4 form an intra-chain S–S bond)
C-propeptides in extracellular assembly
In addition to their roles in intracellular assembly of procollagen molecules, the C-propeptides also control the assembly of collagen fibrils in the ECM. Release of the C-propeptides from the pC-collagen molecule (procollagen lacking the N-propeptides) decreases its solubility by 1000-fold, thus leading to spontaneous fibril assembly . The proteinase mainly responsible for the cleavage of the C-propeptides is BMP-1 [20,21], though other BMP-1/tolloid-like proteinases (BTPs) can also fulfil this role  (Figure 4). Indeed, it has been shown that other members of the astacin family of metalloproteinases, meprins, are also capable of releasing the C-propeptides from procollagens [22,23]. In the case of the BTPs, but not the meprins, cleavage of the C-propeptides is accelerated (by up to 20-fold) by another ECM protein, PCPE-1 [24,25] which lacks intrinsic proteolytic activity but shares with BTPs the presence of CUB domains (Figure 4).
Domain structures of BTPs, meprins and PCPEs
While BTPs are known to act on approximately 30 substrates (mostly ECM proteins or growth factors and their antagonists), acceleration of their activity by PCPE-1 is limited to the procollagen propeptides [26,27]. Compared to using inhibitors of BTPs with the risk of off-target effects, there is interest in blocking PCPE-1 as a way of specifically preventing the excessive accumulation of collagen in fibrotic diseases [28–33]. A necessary prerequisite for this is to understand the molecular mechanism of action of PCPE-1, an area in which there has been considerable progress in recent years. First, it is clear that PCPE-1 acts by binding to the procollagen substrate rather than to the proteinase. More specifically, binding is to the C-propeptide region only and the contiguous CUB1CUB2 fragment of PCPE-1 is both necessary and sufficient for enhancing activity [34,35]. Secondly, binding requires acidic residues involved in Ca2+ binding in the CUB domains of PCPE-1 which form salt bridges with conserved lysines in the stalk/base region of the C-propeptide trimer [36,37]. Also, despite there being three chains, only one molecule of PCPE-1 is bound to the trimer. Third, the presence of PCPE-1 increases the affinity of BMP-1 for its procollagen substrate as well as increasing the reaction rate, as shown using both procollagen I  and procollagen III  substrates. This suggests that binding of PCPE-1 to the C-propeptide might create a higher affinity interaction surface for BMP-1 and/or lead to a conformational change (in the substrate, the enzyme, or both) that increases catalytic activity.
Recently, important new insights into the mechanism of enhancement of BMP-1 activity have been obtained with the determination of the crystal structure of the complex between the C-propeptide trimer of procollagen III (CPIII) and the CUB1CUB2 fragment of PCPE-1 (Figure 5A) . This confirms that CUB1CUB2 binds to the stalk/base region of CPIII, as previously suggested by mutagenesis and small angle scattering data . Furthermore, it reveals why the stoichiometry of binding is only 1:1, since CUB1 binds mainly to chain A of CPIII and CUB2 binds mainly to chain B, leaving only chain C available for further interactions. As it was previously shown that both the CUB1 and CUB2 domains must be bound for enhancing activity , there are not enough sites on CPIII available for additional binding. Most interestingly however, the structure shows the end of the stalk region of chain A (shown in magenta colour in Figure 5) to be ‘pulled’ into the interface between the CUB1 and CUB2 domains of PCPE-1, thus separating chain A away from the other two chains. This interaction is stabilized by interactions involving three residues in CUB1CUB2 which when mutated to alanine almost abolish enhancing activity . These observations support the idea that the role of PCPE-1 is to introduce a conformational change in the substrate that facilitates cleavage by BMP-1. Using the crystal structure as a template, it was then possible to model the catalytic domain of BMP-1  with the cleavage site of chain A (not seen in the crystal structure) bound to the active site as a result of being pulled into the CUB1CUB2 interface (Figure 5B). Since the active site of BMP-1 is too small to accommodate all three chains, this provides a mechanism for increasing catalytic efficiency. In addition, the model allows the BMP-1 catalytic domain to bind to the CUB1CUB2 region of PCPE-1, thus further contributing to enhancing activity.
Structures of the CPIII:CUB1CUB2 complex
Though the new structural data give important insights into cleavage of procollagen by BTPs, much remains to be learned. For example, how are the remaining chains cleaved following proteolysis of chain A? Also, what are the roles of the non-catalytic domains of BMP-1 and other BTPs? Small angle scattering indicates that these are folded back towards the catalytic domain . The presence of CUB domains in BMP-1 is clearly critical for both catalytic activity and enhancement by PCPE-1, as shown by a mutation in the CUB2 domain of BMP-1 that blocks activity completely , and by the fact that the activities of meprins (which are devoid of CUB domains) are not enhanced by PCPE-1 .
C-propeptides in disease and novel functions
So far approximately 300 mutations (including missense, nonsense, frameshift and silent mutations) have been described in the C-propeptides of procollagens I, II, III and V (https://www.le.ac.uk/ge/collagen, https://databases.lovd.nl/shared/genes/COL2A1). Most of these are associated with major disorders such as OI (and other skeletal dysplasias) and Ehlers–Danlos syndrome (affecting the skin and vascular system). Among them, about a third are missense mutations whose positions have been mapped in three dimensions [43–46] using the crystal structures of the C-propeptides of procollagens I  and III . For mutations in the α1-chain of procollagen I (COL1A1 gene), in most cases, there is a correlation between the position of the mutation and the severity of the disease. Those that interfere with disulphide bonding, hydrogen bonding, inter-chain interactions, Ca2+ binding or stability of the hydrophobic core are usually associated with severe phenotypes. In contrast, mutations located on the outer surface of the petal region, remote from inter-chain interactions, are generally mild. There are exceptions to this rule however, such as p.Asp1413(195)Asn and p.Asp1441(223)Tyr (where the first number is relative to the transcription start site and second number is from the start of the C-propeptide domain), both of which are on the outer surface of the petal but lead to severe or lethal phenotypes, respectively [46,47]. A possible explanation for this paradox is the recent discovery  that these mutations lead to mislocalization of procollagen I to the lumen of the RER, in contrast with the wild-type protein which is membrane bound. Such an interaction with the RER had previously been suspected of being important in collagen biosynthesis [3,48]; these latest results strongly support this hypothesis, though the mechanism of binding remains to be elucidated.
In contrast with COL1A1, for mutations in the C-propeptide of α2 chain of procollagen I (COL1A2 gene), there is a relatively poor correlation between predicted and observed phenotypes . This could be due to the fact that, for heterozygous mutations, 50% of procollagen molecules will be normal, unlike for COL1A1 mutations where 75% of molecules will have one or two defective chains. Alternatively, defective α2 chains could be replaced by normal α1 chains resulting in proα1(I)3 homotrimers.
For both COLA1 and COL1A2, there has been considerable interest in recent years in missense mutations in the Ala and Asp residues that define the proteolytic cleavage site for release of the C-propeptides from procollagen I [49–53]. These mutations, which are heterozygous, give rise to a relatively mild form of OI similar to type I but distinguished by unusually dense bones. While bone mineral density in these patients is high, paradoxically there is also delayed mineralization as shown by increased amounts of unmineralized osteoid. A similar phenotype has been observed in patients for some [54,55], but not all [56,57], mutations in BMP-1, all of which are either homozygous or bi-allelic. Since both cleavage-site and BMP-1 mutations interfere with cleavage of the C-propeptides, there will be an increase in the population of partially processed procollagen molecules with intact C-propeptides on one or more chains. (Note that not all procollagen chains will be affected in these patients, either because the C-propeptide mutations are heterozygous, or because other proteinases can take over from defective BMP-1, such as mTLL-1 or meprins.) Such mutations may therefore affect the rate of collagen deposition in osteoid and other tissues, and also the regulation of collagen synthesis by C-propeptide-mediated feedback control, as observed previously for both preosteoblasts  and fibroblasts [59,60]. Concerning bone mineralization, collagen fibrils play a key role in this process, either by nucleating mineral deposition directly or via non-collagenous proteins [61–63]. In particular, there is a positively charged region in the fibril corresponding to the C-terminal end of the collagen molecule that appears to play a key role in calcium phosphate accumulation . The increased presence of C-propeptides on the surface of collagen fibrils in patients with defective C-terminal processing may amplify this effect, leading to hypermineralization, particularly as the C-propeptides (also known as chondrocalcin in the case of procollagen II) have themselves been implicated in the mineralization process .
Concerning novel functions, the heterotrimeric form of CPI (unlike CPII ) has been reported to be chemotactic for endothelial and mammary carcinoma cells, inducing the expression of VEGF and CXCR4, suggesting a role in tumour growth and angiogenesis [66–69]. Also, in addition to previously known binding partners including PCPEs, integrins  and chaperone proteins , the C-propeptides of procollagen I have recently been found to bind to the ECM proteins thrombospondin-1 (TSP1) and lysyl oxidase  (Figure 6). In addition, TSP1 binds to PCPE-1  and to a positively-charged KGHR sequence in the collagen molecule involved in lysyl oxidase initiated cross-linking to the C-terminal region of an adjacent molecule in the fibril . Furthermore, the proteoglycan fibromodulin (FMOD) binds to the same KGHR sequence in collagen and also to lysyl oxidase . In view of all these interactions occurring in the vicinity of the C-terminal end of the collagen molecule (including its C-propeptides) and the importance of this region in mineral deposition, this opens the way to further studies on the complexity of the factors controlling the assembly, cross-linking and mineralization of the collagen fibril.
Diagram of the interactions found in the region of the C-terminal end of one collagen molecule within the collagen fibril
Recent years have seen considerable progress in elucidating the molecular mechanisms by which the C-propeptides control intracellular procollagen trimerization and extracellular proteolytic processing. A number of areas remain for future study however, notably the 3D structures of the CPI heterotrimer and the C-propeptides of procollagen II, as well as the interactions and signalling functions of the C-propeptides, with membranes and in cross-linking and mineralization, feedback control of collagen biosynthesis, and tumour growth and angiogenesis.
Procollagen C-propeptides control both intra- and extracellular assembly during collagen biosynthesis.
Disulphide-bonding patterns and specific inter-chain interactions direct C-propeptide trimerization.
Binding of PCPE-1 to the C-propeptide trimer provides a mechanism for accelerated proteolytic release.
Mutations in C-propeptides account for debilitating diseases often with specific phenotypes.
Recently discovered interaction partners of the C-propeptides give new insights into their roles.
I am grateful to the many friends and colleagues who have shared with me this interest in the procollagen C-propeptides over many years.
The author declares that there are no competing interests associated with the manuscript.