Co- and post-translational hydroxylation of proline residues is critical for the stability of the triple helical collagen structure. In this review, we summarise the biology of collagen prolyl 4-hydroxylases and collagen prolyl 3-hydroxylases, the enzymes responsible for proline hydroxylation. Furthermore, we describe the potential roles of hydroxyproline residues in the complex interplay between collagens and other proteins, especially integrin and discoidin domain receptor type cell adhesion receptors. Qualitative and quantitative regulation of collagen hydroxylation may have remarkable effects on the properties of the extracellular matrix and consequently on the cell behaviour.
The members of the large collagen family have characteristic domains with a triple-helical structure formed by three collagen α chains. Special features in the primary α chain structure enable the formation of the collagenous triple helix. These include the location of glycine as every third residue and the presence of co- and post-translationally hydroxylated proline residues .
Proline residues are hydroxylated by three isoenzymes forming the group of collagen prolyl 4-hydroxylases (C-P4Hs) (Figure 1). In addition to abundant 4-hydroxyproline (4-Hyp) residues, collagens also harbour few 3-hydroxyprolines (3-Hyp)  (Figure 2). Specific collagen prolyl 3-hydroxylases (C-P3Hs) are responsible for this modification.
Prolyl 4-hydroxylases (C-P4H-I, C-P4H-II and C-P4H-III) hydroxylate co- and post-translationally proline residues in collagens in a reaction that requires O2, 2-oxoglutarate, Fe++ and ascorbic acid
Prolyl 3-hydroxylases hydroxylate selected proline residues in many, but not all collagens
The melting temperature of collagenous triple helix is directly proportional to the 4-Hyp content [3,4]. Thus, prolyl 4-hydroxylation is critical for the stability of the individual tropocollagen molecules. Collagens are known to have numerous interaction partners that can recognise triple-helical motifs in collagens . Here, we review the many putative roles of prolyl 4-hydroxylation in the molecular interactions of collagen including fibril formation (Figure 3). We also introduce the collagen receptor subgroup of the integrin superfamily [6,7]. The structural basis of the interaction of the integrin type adhesion receptors with collagens has been unveiled in detail and their typical recognition sites often contain 4-Hyp residues [8,9], although there are also some exceptions [10,11]. Furthermore, another group of cellular collagen receptors, formed by discoidin domain receptors 1 and 2 (DDR1 and DDR2), also recognises a 4-Hyp containing motif . Finally, we discuss the potential physiological significance of direct cell adhesion to collagens.
Hydroxylated proline residues participate in collagen fibril formation
Hydroxylation of proline residues in collagen
The primary structure of a collagen α chain is often described as -X-Y-Gly-, which indicates that glycine is every third residue, whereas in X and Y positions there can be any amino acid. C-P4Hs catalyse the formation of 4-Hyp on collagens by modifying proline residues in the Y position, in a process that requires 2-oxoglutarate, Fe2+, molecular oxygen and ascorbic acid  (Figure 1). This modification takes place in the endoplasmic reticulum before collagen triple helix formation. Scurvy, a disease caused by the deficiency of ascorbic acid in nutrition, leads to defects in connective tissue-related physiological mechanisms, such as the healing of tissue injuries .
Both Pro and 4-Hyp stabilise the collagen molecules via imino acid rings, but 4-Hyp provides the additional thermal stability needed in physiological temperature. The molecular basis how 4-Hyp stabilises the triple helix is not fully understood. It has been shown with collagen-like peptides that the 4R-hydroxylation of Pro stabilises the triple helix by a stereoelectronic effect that causes a defined Cγ-exo proline ring puckering. The stability is also influenced by a water network around collagen molecules. Hydrogen bonding, especially via water bridges, between the collagen chains also has an effect on the stability and this can involve 4-Hyp residues [14–16].
Vertebrates have three C-P4H isoenzymes . These enzymes have an α2β2 subunit composition. The α subunit (α(I) in C-P4H-I, α(II) in C-P4H-II and α(III) in C-P4H-III) is the catalytic subunit, whereas the β subunit in the enzyme tetramer is protein disulphide isomerase (PDI). The role of PDI as a C-P4H β subunit is to render the insoluble α subunit into a soluble and catalytically active conformation in the C-P4H tetramers [1,17]. PDI also employs its KDEL motif to retain the C-P4H tetramers in the ER lumen .
C-P4H-I has an abundant expression pattern, including all tissues studied so far, whereas C-P4H-II is mainly expressed in osteoblasts, chondrocytes and endothelial cells. Low level of C-P4H-III expression has been found in many tissue types, but the significance of its function, when compared with the two other C-P4Hs, is not clear .
In humans, bi-allelic mutations in P4HA1, the gene encoding the catalytic α(I) subunit of the C-P4H-I, have been shown to lead to a congenital connective tissue disorder with joint hypermobility, contractures, mild skeletal dysplasia and high myopia . In gene-modified mice, the complete homozygous inactivation of the same gene leads to an 80% decrease in total C-P4H activity and embryonic death at E10.5 . Heterozygous mutations in P4HA2, encoding the catalytic α(II) subunit, have been identified to cause myopia . Mice lacking C-P4H-II activity have no obvious phenotypic abnormalities, whereas compound P4ha1+/−;P4ha2−/− mice with 65% reduction of total C-P4H activity in chondrocytes show signs of moderate chondrodysplasia . Heterozygous mutations in P4HB-encoding PDI underlie Cole-Carpenter syndrome, a bone fragility disorder [22,23].
C-P4H-I and C-P4H-II are likely to differ in their substrate-binding specificity. Based on the mass spectrometric analysis of skin collagen derived from mice lacking C-P4H-II activity or with reduced C-P4H-I activity, we have recently shown that the hydroxylation of individual proline residues is dependent on the neighbouring residue and that the two isoforms have different requirements . For example, there is a clear difference in hydroxylation of triplets with acidic residues in the X position between P4ha1 and P4ha2 mutants when compared with wild-type mice (unpublished). Furthermore, the Km values of C-P4H-II for synthetic collagen-like (Pro-Pro-Gly)10 and Gly-Val-Pro-Gly-Val peptides and a full-length procollagen chain are 3-6-fold higher than in the case of C-P4H-I [17,25]. There is also a marked difference between the two enzymes in the inhibition by a competitive inhibitor poly(L-proline) [17,25]. A peptide-substrate-binding domain has been identified in the C-P4H α subunits and there are distinct differences in the binding properties and structure of this domain in the C-P4H-I and II [26,27].
The differences in the substrate specificity of the C-P4H isoenzymes makes it possible to speculate that the collagens produced by different cell types may have differences in the hydroxylation pattern of collagens. It remains to be shown whether such diversity occurs in tissues and whether it has functional consequences.
A recent study on endochondral ossification has shown that hypoxia, and consequently, the activation of hypoxia-inducible transcription factor HIF-1, causes metabolic changes that regulate proline and lysine hydroxylation on collagen. These modifications lead to accumulation of collagenous matrix due to increased resistance to metalloproteinases . Furthermore, the genes for the C-P4H α subunits are themselves HIF-1 targets and it has been shown that hypoxic induction of C-P4Hs is required to sustain normal collagen production level in hypoxic chondrocytes [29,30]. On the other hand, hypoxic induction of C-P4Hs has been shown to enhance breast cancer metastasis . C-P4H related mechanisms may thus play significant roles in the pathogenesis of human diseases, such as fibrosis and cancer . Furthermore, it is possible that the connection between the products of metabolic pathways and the post-translational modifications in collagen regulates the structure and function of extracellular matrix during physiological processes as well.
Three different genes, namely P3H1, P3H2 and P3H3, encode C-P3Hs, the enzymes responsible for synthesis of 3-Hyp residues on collagen. Their substrate proline residues occur in a prerequisite sequence -Pro-(4-Hyp)-Gly-, where 3-Hyp is formed from the proline residue (Figure 2). The human fibrillar collagen chains α1(I), α1(II) and α2(V) are all known to have one fully occupied 3-Hyp site at Pro986. This motif is conserved in vertebrate species. In addition, partially occupied sites in mammalian fibrillar collagen chains α2(I), α1(II), α1(V), α2(V), α1(XI) and α2(XI) have been found [32,33]. Interestingly, mammalian fibrillar collagen III does not seem to have any 3-Hyp sites . 3-Hyp is more abundant in certain other collagens, including the basement membrane collagen IV. Synthesis of 3-Hyp seems to only have a modest effect on the stability of the triple helix [34,35].
Proline hydroxylation and collagen fibril formation
Experiments utilising recombinant collagen produced in insect cells, yeast and plants have indicated that collagen molecules lacking 4-Hyp cannot form triple helices that are stable at a physiological temperature [36–38]. Furthermore, unlike fully hydroxylated recombinant collagen I, unhydroxylated triple-helical recombinant collagen I cannot self-assemble into banded supramolecular collagen fibrils in physiological ionic strength buffer at 20°C [38,39]. The stabilising effect of the hydroxyl group of the 4-Hyp on the collagen triple helix is mediated via stereoelectronic effects .
Despite the limited number of 3-Hyp residues in the fibril forming collagens, inactivating mutations in P3H1 can cause a severe, recessive form of Osteogenesis imperfecta, a human inherited skeletal disorder primarily caused by mutations in the collagen I genes . A mouse model in which the P3h1 gene has been inactivated, demonstrates disturbances in collagen fiber architecture in bone, tendon and skin . Therefore, 3-Hyp in collagen must have important, but yet largely unknown biological functions.
Interestingly, in tendon and sclera, but not in skin or bone, the triple-helical C terminus of collagen I contains a 3-Hyp rich motif . Hydroxylation of this site appears to be completely dependent on the P3h2 gene, at least in mouse . In contrast, widely occurring hydroxylation of Pro986 in the α1 chain of collagen I has been shown to require P3h1 . It has been speculated that the tendon and sclera specific prolyl 3-hydroxylation sites may contribute to the unique mechanisms in the collagen fibril assembly in these tissues [33,43,44]. It should be noted that also prolyl 3-hydroxylation of the non-fibrillar basement membrane collagen IV has been suggested to be mainly P3H2-dependent [43,45]. Mutations in P3H2 have been shown to cause high myopia .
Deletion of P3h3, the third gene encoding a C-P3H, does not seem to have any effect on prolyl 3-hydroxylation. Instead, collagens in the P3h3 null mice have under-hydroxylated lysines and cross-linking deficiencies .
Hydroxyproline containing functional motifs in collagen
Collagens are known to have tens of putative interaction partners, including extracellular matrix glycoproteins, such as fibronectin and decorin, and also cell surface receptors, such as integrins and DDRs, DDR1 and DDR2 [5–7,12]. In mammals, 24 heterodimers form the large integrin family. These receptors can anchor cells to extracellular matrix and regulate survival, proliferation, differentiation and migration. Heterodimers α1β1, α2β1, α10β1 and α11β1 form the subgroup of collagen receptor integrins. They are specialised to recognise functional motifs in triple-helical collagens. Most, but not all, of these sequences contain hydroxylated proline residues. The consensus motif can be formulated as triple-helical GXX’GER. The GFOGER (O = 4-Hyp) is a widely studied site , while other similar motifs include GROGER , GLOGER , GMOGER , GLSGER , GQRGER , GLOGEN , GAOGER , GVOGEA , GFKGER , GLQGER  and GASGER .
The 4-hydroxylation of the proline residue in these functional domains is not absolutely required, but it increases the avidity of integrin binding. The functional assays have often used recombinant α inserted (αI) domains (also known as αA domains) of the collagen receptor integrins to show differences between different integrins. For example, α2I domain can bind to GFPGER containing peptides, but more weakly than to GFOGER . Similarly, both integrin α1I and integrin α2I domains can bind to GFPGER and GLPGER motifs, as well as to unnatural GFPGEN motif, when they have been inserted into a streptococcal collagen-like protein Scl2 . Still, it has been reported that while integrin α2β1 can bind to unhydroxylated collagen I that has been produced in transgenic plants, another collagen receptor, α1β1, fails to do so . The difference between α1β1 and α2β1 integrin was confirmed in our recent paper . Furthermore, we showed that the lack of prolyl 4-hydroxylation in collagen can affect integrin binding by two different mechanisms, directly and via structural destabilisation of the triple helix .
Importantly, the mass spectrometric analysis of skin collagen derived from transgenic mice indicated that even after concomitant inactivation of both alleles of P4ha2 and one allele of the P4ha1 gene, all the major integrin-binding motifs were completely hydroxylated. At the same time, the overall hydroxylation of proline residues in -X-P-G- sequences dropped significantly . Furthermore, analysis of data from the same study reveals that relative hydroxylation level of the sequence GVMGFO, which has been determined as the motif required for both DDR1 and DDR2 binding in fibrillar collagens , is close to 100% in both wild-type and P4ha1+/−;P4ha2−/− mice. These observations stress the importance of direct interaction of collagen between integrins and DDRs, and propose that the site specificity of C-P4H-I has evolved to favour the receptor recognition motifs over other sites in collagen. This is supported by the fact that both GFOGER, the well-known integrin-binding motif in fibrillar collagens, and DDR-binding sequence GVMGFO  contain a GFO triplet. Another possibility is that only the tropocollagens that have fully hydroxylated receptor-binding sites end up to the skin collagen fibrils.
The high avidity DDR-recognition motif GVMGFO cannot alone trigger receptor signalling indicating that this site only partially explains DDR-binding mechanism . Experiments with bacterially produced collagens have indicated that GVMGFP motif can bind specifically to DDR. Importantly, replacement of 4-Hyp with Pro significantly decreases the binding affinity .
Glycoprotein VI (GPVI), an immunoglobulin receptor family protein, is a platelet receptor that together with α2β1 integrin mediates the binding of platelets to collagens . Two GPVI molecules form a dimer that binds to triple helical collagen via a motif containing multiple GPO sequences . Minimum number of GPO triples that can activate GPVI signalling is two . GPVI shows no significant binding to GPP repeats .
Biological functions of the collagen receptors
The synthesis of triple-helical collagens and the expression of integrin type adhesion receptors are common features of all multicellular animals. Still the αI domain containing collagen receptors α1β1, α2β1, α10β1 and α11β1 are found in vertebrates, including lampreys, only [6,61]. Other chordates, such as Ciona intestinalis (Sea squirt) have integrins with αI domains, but these receptors lack the structural and functional properties of vertebrate collagen receptors . Thus, the evolution of some vertebrate specific tissues and cell types has required the co-evolution of integrins and their ligands and the generation of high-affinity receptor-binding sites in collagens. In humans, integrin α1β1 is mainly found on mesenchymal cells, including smooth muscle, cartilage and endothelial cells, fibroblasts and lymphocytes. Integrin α2β1 is abundant on epithelial cells and platelets, but it is also expressed on numerous mesenchymal cell types. Integrin α11β1 is the main collagen receptor on fibroblasts, while α10β1 is mainly found on chondrocytes .
The knockout mice for α1β1, α2β1, α10β1 and α11β1 integrins are viable and fertile and without obvious structural defects [64–67]. The fact that the collagen receptors are found in vertebrates only, and thus evolved quite recently, explains the observations that they only play a minor role in the embryonic development of mammals.
DDR1 and DDR2 (for reviews see: [12,68]) form a special subgroup in the large family of receptor tyrosine kinases. DDR1 is expressed on epithelial cells, but also on chondrocytes and certain inflammatory cells, e.g. activated T cells. DDR2 is found on mesenchymal cells, e.g. fibroblasts, chondrocytes and neutrophils. Both DDRs can interact with fibril-forming collagens and DDR1 also binds to basement membrane collagen IV. DDRs are considered to act as cellular sensors for ECM rather than anchoring receptors. Transgenic mice lacking DDR1 are viable, but they have multiple structural and functional defects in various tissues, such as female breast, kidney  and cartilage . DDR2 regulates proliferation of many cell types. DDR2 null mice are dwarfs and show defects in e.g. limb development .
The exact biological significance of direct integrin–collagen interaction is still under discussion. The fact that fibril-forming collagens have several high avidity-binding sites for the collagen receptors stresses the importance of this phenomenon. It can be speculated that the collagen receptors may participate in functions related to e.g. the formation of collagen fibrils, anchorage of cells to fibrils or degradation of fibrils. Integrin-dependent cell locomotion is one more option.
Based on experimental approaches in which collagen fibril formation is followed in in vitro conditions, it has been suggested that collagen receptor integrins, especially α2β1 and α11β1, can participate in the formation of collagen fibrils, acting for example as nucleation sites [72,73]. The potential role of integrins in collagen fibrillogenesis, however, is not generally supported by the observations based on knockout mouse lines. Studies on integrin α10-deficient mice have revealed a reduced density of the collagen fibrillar network matrix in the growth plates . However, in the case of the other three collagen receptor integrins, no abnormalities in the shape or the size of the collagen fibrils have been reported [64–67]. It is not probable that the receptors could replace each other, since the double knockouts have no additional phenotypes [75,76].
Fully organised collagen I and collagen II containing fibrils are typically covered with a large variety of glycoproteins, including fibronectin, proteoglycans, like decorin, and FACIT collagens, such as types IX and XII . Therefore, it is probable that direct cell adhesion to mature fibrils is mediated by integrins other than the actual collagen receptor integrins or that the collagen receptor integrins interact with FACIT collagens rather than fibril forming collagens. Indeed, in vitro formed cartilage type collagen fibrils have not shown significant binding to the collagen receptor integrins . Still, the reorganisation of collagen I molecules to fibrils does not automatically prevent their interaction with collagen receptors, e.g. α2β1 integrin. It has been shown that at least in vitro such interaction is possible . Furthermore, cell dependent contraction of collagen gels is, at least partially, mediated by α2β1 integrin . A recent paper has indicated that the surface reconstruction of collagen I fibrils may allow cryptic-binding sites to become accessible for collagen receptor integrins . In heterotrimeric collagens, GPO triplets do not occur at the same locus in all α chains. This asymmetrical distribution of GPO triplets may facilitate functional bending of the triple helix and preserve the structural integrity of binding sites for other proteins. Importantly, the site-specific in vivo flexibility, based on exo-endo flipping in the proline ring, may promote the binding of interaction partners, such as integrins, DDR-type receptors and von Willebrand factor, to motifs next to GPO-sites .
Integrins are known to transduce mechanical forces into cells. However, collagen receptors may act differentially in this respect when compared with e.g. fibronectin receptors, since it has been shown that no mechanical stress is needed when α2β1 integrin, unlike α5β1 fibronectin receptor, activates focal adhesion kinase .
Collagen receptors may also participate in the degradation of collagen fibrils. For example, they may work together with membrane-type metalloproteinases and direct the collagenolytic activity during cell migration . One of the main functions of α2β1 integrin is to act as a platelet collagen receptor and recognise collagen fibrils at the site of injury in the vessel wall [65,66]. Accordingly, the deficiency of α2β1 integrin on human platelets causes a mild bleeding disorder . It is possible that on injured vessel wall, the platelet collagen receptors interact with partially degraded collagen fibrils rather than fully organised and intact fibrils.
The main function of the collagen receptor integrins may also be to mediate cellular signals that promote survival or regulate cell division in a collagenous environment. The deficiency of integrin α10 in mouse leads to a small delay in the development of long bones, but not to structural defects in adult animals . Interestingly, in a larger size mammal, i.e. certain dog breeds, the lack of α10 causes significant shortening of the legs . Furthermore, in α1 knockout mouse skin the dermis is hypocellular  and during the healing of bone fractures the number of mesenchymal stem cell-like cells is reduced and the callus size is diminished . Accordingly, α2 integrin deficiency leads to acceleration in epithelial cell migration and increased angiogenesis during wound healing  and the lack of α11 compromises the formation of granulation tissue and the stability of the wound . In these phenomena, the role of high avidity integrin-binding motifs, and integrin–collagen interaction in general, is unknown.
The 4-hydroxylation of proline residues on collagen is a critical factor in the regulation of the stability of the triple helix. Furthermore, both 4-Hyp and 3-Hyp patterns may influence on collagen fibril formation. Recent reports have indicated that proline hydroxylation may also have other effects on the biology of collagens.
The coevolution of vertebrate collagens and the αI domain containing integrins has led to a special subgroup of receptors that has the ability to recognise GXX′GER type triple-helical motifs where X′ is often 4-Hyp. The high avidity-binding site for DDR type collagen receptors contains 4-Hyp too. It is uncertain, whether these receptors can participate in collagen fibril formation or whether they can mediate cell adhesion to fully maturated collagen fibrils in tissues. Still, these receptors seem to regulate cellular functions in collagenous environment. Integrins are active during platelet adhesion to injured vessel walls as well as during wound and bone fracture healing and inflammation. There are still many unanswered questions related to the physiological functions of the integrin-type collagen receptors and DDRs. Creation of transgenic mouse lines harbouring mutations that inactivate the 4-Hyp containing high avidity-binding sites in collagens could shed some light on this issue. Hydroxylated proline residues can also have other functions in cell adhesion, since GPO triplets may increase the flexibility of collagen molecules and make functional sites, including DDR-binding motifs, available for cells.
Proper hydroxylation of proline residues in collagens is known to be critical for the homeostasis of tissues. The lack of vitamin C or inherited mutations in the collagen-modifying enzymes may break this balance and lead to serious human diseases. Furthermore, there is a regulatory link between hypoxia, metabolic pathways and post-translational modifications in collagen and it is possible that prolyl hydroxylation contributes to the pathogenesis of fibrosis and cancer. The differences in the substrate specificity of C-P4Hs may add an extra level on the machinery regulating the pattern of collagen prolyl 4-hydroxylation and consequently the interaction of collagen molecules with their binding partners.
The characteristic primary structure of collagen α chains, glycine as every third residue and the presence of hydroxylated proline residues, enables the formation of collagenous triple helix.
Proline residues are hydroxylated co- and post-translationally by three isoenzymes forming the group of C-P4Hs and by C-P3Hs. The differences in the substrate specificity of prolyl hydroxylases may add an extra level on the machinery regulating the pattern of collagen prolyl hydroxylation.
The 4-hydroxylation of proline residues on collagen enhance the stability of the triple helix and influence on collagen fibril formation. Proline hydroxylation may regulate the flexibility of collagen molecules and make functional sites available for interacting proteins and receptors.
Collagens have high affinity-binding motifs for cell surface receptors, integrins and DDRs. Most of these sequences contain 4-hydroxylated proline residues. These receptors regulate cellular functions in collagenous environment.
There is a recently described regulatory link between hypoxia, metabolic pathways and post-translational modifications in collagen and it is possible that prolyl hydroxylation contributes to the pathogenesis of fibrosis and cancer.
This work has been supported by the Sigrid Jusélius Foundation (to J.H., P.R., J.M., A.S.); the Finnish Cancer Foundation (to J.H., P.R.); the Academy of Finland Project (to J.M., A.S.); and the Jane and Aatos Erkko Foundation (to J.M., A.S.).
J.M. owns equity in FibroGen Inc., which develops HIF-P4H inhibitors as potential therapeutics. This company supports HIF-related research in the J.M. group.