Non-linearity of the collagen triple helix in solution and implications for collagen function

Collagen adopts a characteristic supercoiled triple helical conformation which requires a repeating (Xaa-Yaa-Gly)n sequence. Despite the abundance of collagen, a combined experimental and atomistic modelling approach has not so far quantitated the degree of flexibility seen experimentally in the solution structures of collagen triple helices. To address this question, we report an experimental study on the flexibility of varying lengths of collagen triple helical peptides, composed of six, eight, ten and twelve repeats of the most stable Pro-Hyp-Gly (POG) units. In addition, one unblocked peptide, (POG)10unblocked, was compared with the blocked (POG)10 as a control for the significance of end effects. Complementary analytical ultracentrifugation and synchrotron small angle X-ray scattering data showed that the conformations of the longer triple helical peptides were not well explained by a linear structure derived from crystallography. To interpret these data, molecular dynamics simulations were used to generate 50 000 physically realistic collagen structures for each of the helices. These structures were fitted against their respective scattering data to reveal the best fitting structures from this large ensemble of possible helix structures. This curve fitting confirmed a small degree of non-linearity to exist in these best fit triple helices, with the degree of bending approximated as 4–17° from linearity. Our results open the way for further studies of other collagen triple helices with different sequences and stabilities in order to clarify the role of molecular rigidity and flexibility in collagen extracellular and immune function and disease.


Introduction
Collagen is the most abundant protein in the human body, providing required structural and mechanical properties, as well as cell signalling, to tissues such as tendon, skin and cartilage. Collagen is also found as a domain in a range of host-defense proteins including C1q, collectins, and macrophage scavenger receptors. The characteristic molecular structure of collagen is the triple helix, a specialized protein structure motif. The triple helix is composed of three polyproline II-like polypeptide chains staggered by 1 residue with respect to each other [1-3]. The three chains are supercoiled about a common axis in a right handed manner, with backbone NH (Gly)…CO hydrogen bonds formed between the chains. The close packing of the three chains can accommodate only the smallest residue Gly in the center, generating the requirement for Gly as every third residue in the triplet repeats of the collagen amino acid sequence, (Xaa-Yaa-Gly) n . The residues in the Xaa and Yaa positions are largely exposed to solvent, and are frequently occupied by the imino acids Pro (in the Xaa position) or the post-translationally modified hydroxyproline (Hyp or O, in the Yaa position), which further stabilize the triple helix. The unique supercoiled triple helix conformation forms an elongated rod-like domain, which often associates to form fibrils or other supramolecular structures in many biologically important collagens and host defence proteins [4,5].
The rod-like nature of the collagen molecule and its dimensions (∼1.5 nm diameter and 300 nm long) were established in early physical chemical studies of collagen [6]. Diverse experimental approaches, including sedimentation, viscosity, viscoelastic measurements and light scattering studies, as well as direct microscopic visualization and computational approaches suggested that the collagen triple helix is not fully rigid [7][8][9][10][11][12]. A wide range of persistence lengths have been reported for collagen by different techniques, and their values range from those consistent with a random coil to those expected for an almost rigid structure. These studies were much enhanced by studies of synthetic peptides that adopted the collagen triple helical conformation because Gly was present as every third residue, the imino acid content was sufficiently high, and the peptide was of sufficient length [13]. The tripeptide Pro-Hyp-Gly (POG) is the most stabilizing sequence found in collagens and peptides based on this sequence have been extensively studied in terms of stability, structure and dynamics [13]. Thus high resolution X-ray crystallography studies of peptides with repeating sequence (Pro-Hyp-Gly) n have defined the molecular details of a linear structure for the triple helix [14,15]. For example, the high resolution structure of (Pro-Hyp-Gly) 10 , also known as (POG) 10 , has a precise 7/2 superhelical symmetry, and details of the imino acid ring pucker and hydration have been determined [3,15]. NMR relaxation parameters and hydrogen-exchange studies for (Pro-Hyp-Gly) 10 and a peptide with an imino acid poor collagenase cleavage region have been reported [16], but it is not clear how such highly localized dynamic features relate to overall triple helix flexibility.
Determining whether the rod-like triple helix is rigid or flexible by advanced molecular modelling will improve our understanding of structure-function relationships and disease-causing mutations in collagens and immune defense proteins [4,5]. Here, a recently developed atomistic modelling approach to monitor the bending for heparin, heparan sulfate and other linear polymers of increasing length was adapted to define molecular structures for well-characterized collagen triple helices [17]. Firstly, the degree of flexibility as a function of length for (POG) n collagen triple helices in solution was monitored by a combination of analytical ultracentrifugation (AUC) and small angle X-ray scattering (SAXS). These methods provided experimental data for five model collagen helices, namely the blocked peptides (POG) 6 , (POG) 8 , (POG) 10 and (POG) 12 , and the unblocked peptide (POG) 10 . Secondly, molecular dynamics (MD) methods were used to determine 50 000 physically-realistic atomistic models for each of these five triple helices. The filtering of these models against the experimental SAXS data identified a small family of best-fit structures in solution, to follow recent other similar studies [18,19]. Our combined experimental and atomistic modelling evidence demonstrated for the first time the existence of non-linearity in detailed molecular models of collagen triple helices as their lengths increased, and provided representative molecular structures for these (Supplementary Information). These results will be key for future studies on the flexibility within biologically active or mutated collagen sequences and clarify the role of molecular rigidity and bending in collagen function and disease.

Production and preparation of collagen peptides
The peptide (Pro-Hyp-Gly) 10 with both ends unblocked was obtained from Peptides International (Louisville, Kentucky). All other peptides were synthesized by the Tufts University Core Facility (Boston, MA), and were blocked at both ends (acetylated N-terminus; amidated C-terminus). All peptides were purified using a reversephase HPLC system on a C-18 column, confirming a purity of >95%. The identity and molecular mass of the peptides were confirmed by laser desorption mass spectrometry. Peptide concentrations were measured by absorbance at 214 nm using ɛ 214 = 2200 cm −1 M −1 per peptide bond.
Sedimentation velocity data for collagen helices AUC data were obtained on two Beckman XL-I instruments equipped with An-60Ti analytical rotors and using two-sector cells with column heights of 12 mm at a rotor speed of 60 000 rpm. Sedimentation velocity experiments were performed at 20°C, at three concentrations between 1.6-2.6 mg/ml for (POG) 6 , 0.4-4.9 mg/ml for (POG) 8 , 1.39-3.97 mg/ml for (POG) 10 , 1.20-3.6 mg/ml for (POG) 12 and 1.09-3.5 mg/ml for (POG) 10unblocked and in 137 mM PBS (137 mM NaCl, 2.7 mM KCl, 1.4 mM KH 2 PO 4 , 4.3 mM NaH 2 PO 4 , pH 7.4). Prior to experiments, the helices were formed by taking the peptides up in PBS and allowing the solution to equilibrate for at least two hours prior to measurement. Sedimentation was monitored using absorbance optics at 232 and 225 nm. The sedimentation coefficient distribution c(s) analyses were performed by fitting up to 45 absorbance scans directly to the Lamm equation using SEDFIT software version 14.6e [20,21]. In the c(s) analyses, the meniscus, the bottom of the cell, the baseline, and the average frictional ratio f/f 0 were floated until the overall root mean deviation and the fits between the observed and calculated sedimentation boundaries were satisfactory. The f/f 0 values used were 1.04 for (POG) 6 , 1.08 for (POG) 8 , 1.17 for (POG) 10 , 1.32 for (POG) 12 and 1.04 for (POG) 10 unblocked . The partial specific volume ṽwas calculated to be 0.735 ml/g for all (POG) n peptides [20]. The buffer density was 1.00543 g/ml, measured using an Anton Paar DMA 5000 density meter.
X-ray scattering data for collagen helices SAXS data were acquired in two beam sessions on the BioSAXS robot at Instrument BM29 at the European Synchrotron Radiation Facility, Grenoble, France [23,24]. Data were recorded using a CMOS hybrid pixel Pilatus 1M detector with a resolution of 981 × 1043 pixels ( pixel size of 172 mm × 172 mm). Both sessions were operated with a ring-energy of 6.0 GeV in 16-bunch mode. The sample-detector distance was set to 2.864 m, the X-ray wavelength was 0.09919 nm, and the diameter of the flow cell quartz capillary was 1.8 mm in both sessions. Potential radiation damage was averted by the continuous movement of the sample in the flow cell during beam exposure; the use of 10 time frames with an exposure time of 1 s per frame; and on-line checks for the absence of radiation damage. The scattering data were collected for (POG) 6 at 0.25-2.64 mg/ml and for (POG) 8, (POG) 10, (POG) 12, and (POG) 10 unblocked at 0.25-1.00 mg/ml in 137 mM NaCl PBS buffer at 20°C. Data reduction was performed using ISPyB software [25].
In a given solute-solvent contrast, the radius of gyration R g corresponds to the mean square distance of scattering elements from their centre of gravity, and is a measure of structural elongation. Guinier analyses at low Q values (where Q = 4π sin θ/λ; 2θ is the scattering angle and λ is the wavelength) give the R g value and the forward scattering at zero angle I(0) from the expression [26]: This expression is valid in a Q.R g range up to 1.3. GNOM software was used to calculate a real space estimation of the R g [27].

Generation of linear collagen helices for comparisons against AUC and SAXS data
Linear models for the collagen helices were constructed from a linear crystal structure (PDB code 3B0S), which has a 7/2 helical conformation and a (GPO) 9 repeat unit [15]. Linear models of (POG) 10 and (POG) 12 were created through duplication and superimposition of the crystal structure, followed by the removal of excess residues. (POG) 6 and (POG) 8 were created through the removal of excess residues from the (GPO) 9 structure.

MD simulations of collagen helices
Initial triple helices based on all-atom (POG) n triplets, where n = 6, 8, 10 and 12, were generated using the THeBuScr version 1.07 software package [28]. The N-and C-termini were then subsequently blocked with acetyl and amide groups respectively. For the (POG) 10unblocked simulation, this step was omitted. The models were energy minimized using the steepest descent algorithm and then placed in a TIP3P-solvated, cubic box with an edge length 0.7 nm larger than the largest axis of the model. Following solvation, the systems were energy minimized. Then the systems were equilibrated by running short, subsequent 1 ns simulations at 50, 100, 150, 250 and 300 K, such that the final structure from the previous simulation was used as the starting structure for the next simulation. Following temperature equilibration, a production MD was run at 300 K utilizing a 2 fs time step for a simulation time of 50 ns for each of the collagen models. Extending the simulations to longer time scales (up to 200 ns) did not qualitatively change the results of the R g analyses. All molecular dynamics simulations were carried out using the GROMACS v4.5.5 software package with the AMBER99sb-ILDNP forcefield [29]. Other simulation details were the same as described by us previously [30].

SAXS curve calculation using SCT
Scattering curves were calculated from the MD snapshot structures using the open source SCT software [31]. The atomic co-ordinates for each structure were coarse grained into sphere models, using a grid with a box size of 0.53 nm and a cut off of four atoms. The hydration shell bound at the protein surface contributes to the SAXS curves at a similar level to the protein, and this was modelled by the addition of hydration spheres corresponding to 0.3 g of water per gram of protein [22,32]. Scattering curves were calculated using the Debye equation adapted to spheres. The experimental and calculated scattering curves were compared through the calculation of R factors: Similar to crystallography, low R factors represent the better fit structures.

Results
Five standard collagen homotrimeric helices were selected for the present study. These comprised four blocked collagen peptides of increasing lengths: (POG) 6 , (POG) 8 , (POG) 10 , and (POG) 12 . In addition, one unblocked peptide, (POG) 10unblocked , was studied in order to compare this with the blocked (POG) 10 peptide to study the significance of end effects (Table 1). All five peptides formed a stable triple helical structure in solution [13,33]. Joint AUC sedimentation velocity ( Figure 1) and SAXS ( Figure 2) studies were performed to determine the hydrodynamic and X-ray scattering properties of these helices in solution. The sedimentation of an elongated macromolecule under high centrifugal force is dependent on its length and mass [34]. SAXS characterizes the average structure of an elongated molecule in solution in terms of its length, then the use of high resolution atomistic structures to model the low resolution scattering curves provides information on its flexibility [35].

Sedimentation velocity analysis of collagen helices
Sedimentation velocity experiments were conducted on the five collagen model peptides. Absorbance sedimentation data were measured at three concentrations for each, and processed with SEDFIT software to give the size distribution functions c(s). The c(s) analyses resulted in good fits to the sedimentation boundaries and produced single distinct and narrow peaks for each collagen helix to show that these were structurally monodisperse with no dissociation as desired ( Figure 1). No peaks were seen at the lowest S values that would suggest the presence of single dissociated helices. The experimental s 20,w values were calculated from the peak positions in the c(s) analyses. The averaged s 20,w values were 0.83 ± 0.03 S for (POG) 6 , 0.95 ± 0.08 S for (POG) 8 , 1.07 ± 0.04 S for (POG) 10 , 1.16 ± 0.08 S for (POG) 12 and 1.19 ± 0.01 S for (POG) 10unblocked (Table 1). The averaged s 20,w values increased with the number of (POG) repeats, which reflected the increasing mass and lengths of the helices. The peptide (POG) 10unblocked sedimented only slightly faster (0.12 S more) than the peptide (POG) 10 , suggesting that there were no significant difference in hydrodynamic properties between these two helices. High resolution crystal structures were known for three standard collagen helices (POG) n , where n = 9, 10, and 11, and these revealed linear molecules [15,16]. These structures were used to derive starting structures that corresponded to each of the five peptides of the present study; these were designated as 'crystal-derived structures'. Theoretical sedimentation coefficients s 0 20,w were calculated from these crystal-derived structures using HYDROPRO. The AUC experimental data were compared with these calculated s 0 20,w values to assess their divergence (Figure 2; Table 1). The experimental s 20,w values were consistently larger for the five collagen helices compared with the theoretical values. The increasing divergence between the experimental and theoretical values of up to 0.27 S with peptide length suggested that the longer helices were less extended and linear than expected and corresponded to bent structures. Because the previously reported magnitude of the differences between HYDROPRO predictions and experimental values for macromolecules of well characterized s 20,w values was typically ± 0.21 S [36], the low differences in the pairs of s 20,w values for the (POG) n peptides meant that additional experiments using SAXS were needed to confirm these solution structures.
X-ray solution scattering of collagen helices SAXS I(Q) data sets were obtained for the five collagen model helices. The Guinier analyses of these SAXS curves produced the radii of gyration R g , which was a measure of macromolecular extension (Figure 3). The Guinier regions became shorter with increased macromolecular extension. Therefore, the Q range of the Guinier fit was reduced successively from 0.26-0.84 nm −1 for (POG) 6 to 0.17-0.50 nm −1 for (POG) 12 . The experimental R g values for the collagen helices were 1.49 ± 0.02 nm for (POG) 6 , 1.82 ± 0.03 nm for (POG) 8 , 2.29 ± 0.02 nm for (POG) 10 , 2.55 ± 0.04 nm for (POG) 12 and 2.33 ± 0.05 nm for (POG) 10unblocked (Table 1). The increase in R g values with increasing peptide length agreed with the AUC results, which also showed an increase in s 20,w values with collagen size. The similarity of the SAXS R g values for (POG) 10unblocked and (POG) 10 agreed with the AUC data for these two peptides.
The experimental R g values were also calculated by an alternative approach based on pair-distance distribution analyses P(r) of the full I(Q) curves ( Figure 3). This alternative set of R g values again increased with collagen size in a similar manner to the Guinier R g values (Table 1). The appearance of the P(r) curves was affected by noise in the I(Q) data sets that were attributed to the comparatively low molecular masses in use. These masses ranged from 4989 Da for (POG) 6 to 9801 Da for (POG) 12 when measured by mass spectrometry during their synthesis (Materials and Methods). Minor perturbations resulted in the form of small extra P(r) peaks for (POG) 12 or an abrupt end to the P(r) curve for (POG) 10 at large r values. Nonetheless, the P(r) analyses provided the maximum length L of each collagen helix which corresponded to the value of r when the P(r) curve reaches zero. The L values were 6.5 nm for (POG) 6 , 6.4 nm for (POG) 8 , 7.8 nm for (POG) 10 , 8.9 nm for (POG) 12 and 8.0 nm for (POG) 10unblocked . The increases in these L values were consistent with the increasing length of the collagen peptides. The P(r) peak maxima provided values for M, the most common interatomic distance in each collagen molecule, which reflects the width of the collagen triple helix. While the values of M shifted slightly, no correlation was seen with collagen length, as expected.
The experimental R g values were compared with the theoretical R g values calculated from the crystal-derived structures. The theoretical scattering curves were calculated from a coarse grained sphere model of our crystalderived structures. The theoretical R g values were calculated from Guinier fits of the I(Q) curves in the same Q ranges that were used with the experimental curves for reason of consistency. The resulting R g values for the crystal-derived structures were 1.44 nm for (POG) 6 , 1.91 nm for (POG) 8 , 2.39 nm for (POG) 10 , 2.77 nm for (POG) 12 and 2.42 nm for (POG) 10unblocked (Figure 2; Table 1). Comparison with the experimental R g values showed that the former were smaller than the theoretical R g values for the crystal-derived models, except for the smallest peptide (POG) 6 ( Figure 2). This increasing divergence with size indicated that, as the collagen peptides became longer, their solution structure became less extended. In addition, the theoretical I(Q) curves from the crystal-derived structures were compared against the experimental I(Q) curves. The goodness-of-fit R factors monitor the degree of deviation between the two curves. The observed R factors became higher with increased peptide length, these being 4.2% for (POG) 6 , 7.4% for (POG) 8 , 6.3% for (POG) 10 , 9.6% for (POG) 12 and 6.4% for (POG) 10unblocked (Table 1). Given that the crystal-derived structures were very close to linear, the higher R factors and poorer curve fits with increasing peptide length suggested the presence of increased non-linearity in the longer helices. This result was consistent with the larger deviations seen between the experimental and theoretical s 20,w values when the helices became longer (Table 1). The first value was determined from Guinier R g anlysis; the second value was determined from GNOM P(r) analysis. 4 The first and second values were obtained from Guinier R g and GNOM P(r) analyses of the modelled I(Q) curves for linear (POG) 6 -(POG) 12 crystal-derived structures. The same Q range was used for both experimental and modelled curves.

5
The R factors were calculated by comparing experimental I(Q) curves against modelled I(Q) curves for the linear crystal-derived structures.

Comparison of experimental data against MD ensembles
The full I(Q) curve out to large Q values contained further information on the solution structure of a macromolecule in addition to that on the overall structure provided by the R g values at low Q values. Comparison of the experimental SAXS curve with the theoretical SAXS curves calculated from stereochemically realistic atomistic conformations permitted the identification of the best fitting structural models. These conformations were determined from a 50 ns MD simulation at 300 K that produced 50 000 snapshots of stereochemically-correct structures for each of the collagen triple helices. These ensembles include a range of triple helical conformations, including non-linear molecules. In all five cases, the MD simulations resulted in an equilibrated system that contained the structures that were allowed by the force-field conformational space. As a check, simulations at 400 K did not change the distribution of the calculated R g values (see below). This indicated that the MD simulations reached equilibrium and had produced a suitably diverse pool of models. These models resulted from simulations using a physics-based all-atom explicit solvent force field, and thus sampled structurally representative conformations in solution. A theoretical I(Q) curve for every structural snapshot in the five MD simulations was calculated using coarse grained bead models of the atomic co-ordinates. Each of the resulting 50 000 curves was then compared against the corresponding experimental scattering curve. For this, the R g values were calculated for all 50 000 snapshots using the same Q range used for experimental Guinier fits (Figure 3), together with the 50 000 goodness-of-fit R factors between the experimental and theoretical I(Q) curves. For all five collagen peptides, many triple helix structures from the MD snapshots gave R factors that were lower than that seen for the crystal-derived structure (Figure 4). Ten optimal best-fit structures was identified for each peptide. The R factors for the 10 optimal structures (orange circles) were consistently lower than seen for the crystal derived triple helix models (yellow circles), indicating that these MD ensemble-derived structures gave a better fit to the experimental I(Q) data (Figure 4). The best-fit models ( Figure 5, left, in orange) showed visually-improved I(Q) curve fits at the lowest Q values when compared with the crystal-derived linear structures ( Figure 5, red). This improvement also reflected the reduced R g values of the longer POG peptides. This improvement in the curve fit based on the best-fit MD triple helix structures was not as clear for (POG) 12 as for the three smaller triple helices, suggesting that a single structure for (POG) 12 might not be a good representation of the solution average structure for this longest peptide. Views of the experimental and modelled P(r) curves ( Figure 5, right) support the outcomes seen in the I(Q) fits. At large r values, where the P(r) curves contributed more strongly to the R g values, the orange best-fit curves were closer to the black curves than the red crystal-derived curves.
The R factors confirmed the above visual interpretation of the scattering fits. For the smallest triple helix (POG) 6 , the best fit MD structures resulted in R factors of 3.7% with R g values close to the experimental R g , indicating an excellent X-ray curve fit ( Figure 5). The best fit triple helix structures were close to linear, which was consistent with the good agreement of the experimental (POG) 6 data with the linear crystal derived scattering curves (R factor of 4.2%). For the three triple helices (POG) 8 , (POG) 10 , and unblocked (POG) 10 , the best-fit R factors were slightly higher at 4.8-5.4%. The best fit models showed R g values similar to the R g values seen experimentally, both of which were lower than expected from a linear crystal-derived structure ( Table 1). The largest triple helix (POG) 12 was the least well modelled by the simulations (Figure 4). The best-fit structures showed higher R factors of about 8.3% compared with the range of 3.6-5.4% for the lowest R factors for the four shorter peptides, suggesting that this peptide showed the most flexibility. Nonetheless, the best-fit structures for (POG) 12 possessed R g values close to the experimental R g value, and, again, these values were lower than observed for the linear crystal-derived structure.
Visual examination and analysis of the best fit triple helix structures for each collagen peptide were carried out. As expected from the lower R g values, the longer peptides showed modest non-linearity. The comparison of the single best fit MD ensemble structure with the crystal-derived structure indicated an increased amount of non-linearity that appeared as a subtle bending of the triple helix ( Figure 6). To visualise the degree of conformational variation, the single best-fit MD structure (bold color) was superimposed upon the best ten MD structures (gray) (Figure 7). The smallest peptide (POG) 6 showed the least diversity, with all ten structures clustering around the best-fit structure. The other peptides showed somewhat more conformational variability, with the largest deviations seen at the N-and C-termini.

Discussion
In order to understand collagen flexibility better, we have here reported a multidisciplinary experimental study on varying lengths of collagen triple helices composed of the most stable Pro-Hyp-Gly units, the results of which were compared with atomistic MD simulations. First, complementary AUC and SAXS data indicated that the shortest (POG) 6 triple helix was linear, while the longer triple helices were increasingly less well explained by a linear structure. Second, the MD simulations identified best-fit atomistic models from a large ensemble of physically-realistic triple helices at computational equilibrium. The bent best fit MD models showed notably better agreement with the experimental data than that from linear models derived from crystal structures. The decrease in the R g values of these best fit models compared with the R g value of the linear model indicated bending or flexibility that had not been previously characterised at the molecular level. While all the R factors between the best fit MD models and experimental scattering curves were reasonable, the final best-fit R factors increased with longer triple helix length. This outcome indicated that, not only did the crystalderived linear structures became even less appropriate to describe the collagen scattering curves with increase in length, but also that a single ensemble of related MD structures did not represent as well the average solution structure for the longer triple helices as their lengths increased; this also indicated flexibility. The extrapolation of our best-fit (POG) 12 models to create longer theoretical triple helices indicate that bending and flexibility would be present in a molecule similar in length to type I collagen. It is important to note that type I collagen, (Xaa-Yaa-Gly) 338 , has a much lower imino acid content than (POG) n , with Pro in the Xaa position of only 1/3 of the collagen triplets; Hyp in the Yaa position of 1/3 of the triplets; and 10% of all triplets being Pro-Hyp-Gly. In addition to the bending reported here, real collagen sequences with a decreased proportion of imino acids will introduce additional conformational flexibility through twist variability, a feature reported in crystal structures of collagen peptides [37]. It is likely that variability in twist will also be linked with bending. Further studies would be needed on real collagen sequences to evaluate the impact of local collagen regions with low imino acid sequences on the flexibility of the collagen triple helix. In addition to variability in these triple helix features, it is likely that flexibility will also occur in non-fibrillar collagens at the numerous sites where the repeating (Xaa-Yaa-Gly) n sequence is interrupted.
Traditionally the collagen molecule is regarded as a prototype of a rod-like protein with little flexibility. In collagen, the w and ψ angular range of Pro and Hyp residues is extremely limited, in contrast with the large The scattering curves I(Q) were calculated for each of the 50 000 structures created in the five molecular dynamics ensembles for (POG) 6 -(POG) 12 , following which the R factor and R g values for each structure were plotted. The dashed lines represent the experimental R g values for (POG) 6 -(POG) 12 , and the shaded bands represent a ±5% error range in the R g values. For each of the 50 000 structures, the ten best-fit structures with lowest R factors are shown in orange. The values for the four linear crystallographic models are shown in yellow.
area of the Ramachandran plot available to Gly residues. At each axial level in the three staggered chains in the triple helix, a Gly residue in one chain is positioned at the same level as a Xaa residue from the second chain and a Yaa residue from the third chain. Features thought to confer rigidity to the triple helix include the high imino acid content and the tight supercoiling of the three chains mediated by interchain hydrogen bonding, which makes the triple helix resistant to digestion by most enzymes. In spite of these rigidity factors, data sets from experimental and computational studies have indicated some overall flexibility of the triple helix as well as instances of site-specific bending. For example, rotary shadowing electron microscopy showed a kink corresponding to the imino acid-poor collagenase cleavage site for type I collagen; many flexible sites were seen in type IV collagen, which may correspond to interruptions in the Xaa-Yaa-Gly repeat important for the basement membrane network structure [9]. In addition, a pronounced kink was visualized within the heterotrimeric triple helix of complement protein C1q that creates the bouquet-like structure considered essential for efficient immune interactions [38]. It is important for collagen function to define the sites of non-linearity and flexibility in molecular terms within the triple helix, and establish their relationship to enzyme susceptibility, higher order structure, and other biological features. The MD structures from the present study (Figure 7) now offer a means of characterizing these sites in molecular terms, and testing these against experimental data to validate them.
The experimental results showed that, even for the most imino acid-rich collagen sequence that forms the most stable triple helix as represented by the POG repeats, a small degree of non-linearity was observed in solution for the longer peptides. Visual inspection of the best fit MD models indicated the presence of some curvature in the triple helix. In order to quantify this bending, the N-terminal and C-terminal halves were taken to be linear segments, and the angle between these two halves was measured ( Figure 6). This approximate bending angle in the best fit MD structures varied between 4°and 17°for the five peptides (with 0°being a perfect linear model). This contrasts with the relatively linear conformations seen for the crystal-derived structures (0°-3°) using the same method ( Figure 6). Therefore, our data supported a bending in solution greater than that seen in the crystal structures for these (POG) n peptides. As a control, our results for (POG) 10unblocked were compared with the blocked (POG) 10 to clarify potential end effects. There was little difference between Figure 6. Comparisons between the scattering best-fit and linear crystal-derived models. The three polypeptide chains of the best-fit structures for (POG) 6 -(POG) 12 are shown in orange, while those for the linear models are shown in red. In order to monitor the degree of bend in these structures, two verticies shown in grey were fitted to the N-terminal and C-terminal halves of the collagen triple helix, and the angle between the two verticies is displayed above each structure, with 0°representing a linear structure. The grey half circles for (POG) 6 define how the angles were measured.
the blocked and unblocked helices in the AUC or SAXS data, suggesting similar hydrodynamic properties. However, the terminal regions showed the largest variation when the top ten best fit structures for all peptides longer than (POG) 6 were superimposed (Figure 7), consistent with the terminal disorder seen in crystal structures and NMR studies [39].
Interestingly, a small amount of molecular bending similar to that observed for our (POG) n helices in solution, has been reported in four crystal structures of triple helical peptides containing sequences other than Pro-Hyp-Gly, either alone or in complex with another protein. For instance, the integrin binding peptide (GPO) 3 GFOGER(GPO) 3 alone showed 'junctional kinking', with an ∼8°angle between the N-terminal (GPO) 3 and the central GFOGER sequence, and a ∼9°angle between the C-terminal (GPO) 3 and the central GFOGER [24]. These angles were increased considerably to ∼16°and ∼14.5°when the same peptide was complexed with the integrin I domain [40]. Bends of similar magnitude within the triple helix were reported in the high resolution crystal structures for an imino acid poor peptide [12] (POG) 3 ITGARGLAG(POG) 4 , a triple helical peptide complexed with MMP (GPO) 3 GPQGLAGQRGIVGLOGQRGER(GPO) 3 [41], and a peptide complexed with the Streptococcus aureus collagen binding protein CNA, (GPO) 4 GPRGRT(GPO) 4 [42]. The bending seen in these triple helices is most likely to be related to their amino acid sequence. In comparison with our present study, our results confirm that bending occurs in (POG) n helices and not just to non-POG helices. It will also be important to determine how Xaa-Yaa-Gly sequences, which are less stable than POG sequences, will affect the flexibility and specific binding sites in triple helices.
The biological significance of non-linearity in the collagen triple helix in both standard POG and non-POG sequences will be manifested in the context of longer collagen molecules. The collagen triple helix characterized here is formed from only Pro-Hyp-Gly units and is likely to be the most rigid triple helix formed with natural amino acids. Non-linearity within the collagen molecule can affect its biosynthesis and secretion, its interactions with other matrix proteins and enzymes, and the formation of complete collagen fibril structures and tissue mechanics. Even a low level of bending in a full-length collagen helix could provide sufficient flexibility to allow collagen molecules to be transported through the Golgi or could create fibrils that can withstand forces that may break a more brittle structure. As an example, the low angle X-ray diffraction pattern from rat tail tendon indicated that the collagen molecules cannot be straight within the fibril, and may kink as they pass through the less dense gap region [43]. In addition, all non-fibrillar collagen types and collagen domains in Figure 7. Superimposition of the ten best-fit structures for (POG) 6 -(POG) 12 .
Stick representations of the ten best-fit models from each scattering fit analysis were superimposed upon each other. The best-fit structure in each is opaque, with each polypeptide chain represented in blue, red and orange. The remaining nine structures were rendered semi-transparent. Hydrogen atoms were not shown for reason of clarity.
proteins such as C1q and mannose binding lectin contain breaks in the repeating (Xaa-Yaa-Gly) n sequence, which are likely to lead to kinks and flexible sites [9,44]. Flexibility in the C1q and mannose binding lectin collagen helices are likely to facilitate immune function through their binding to flexible multi-domain ligands, as recently exemplified by the MASP solution structure [45]. This will contribute to their function in activating the complement pathways of immune defense. Kinks have also been reported for several collagens with mutations leading to the bone disorder osteogenesis imperfecta [46][47][48]. These disease-associated mutations are likely to perturb the regular packing of the collagen helices in bone through excessive bending. The present work opens the way for further molecular studies to investigate collagen flexibility at biologically important sites and mutation sites.
Our multidisciplinary methods provide a new approach to investigating the flexibility of the collagen triple helix in solution, namely the combination of AUC and SAXS with MD simulations. This joint approach has been applied to other proteins during the past decade [18,19,45] and is indeed being adopted by other groups [49]. Both AUC and SAXS provided shape parameters in the s 20,w and R g values respectively that monitored the overall length of the helices. AUC demonstrated through the single c(s) peaks that only triple helices had formed in solution, while SAXS provided the diffraction data for modelling fits. The AUC and SAXS data were fitted using the generation of physically-realistic all-atom structures from MD starting from known crystal structures [19]. The collagen helices were modelled using all-atom explicit solvent MD simulations using a physics-based force field. Our MD simulations were performed starting from a standard triple helix geometry; at MD equilibrium, these resulted in 50 000 structures for each of the five helices. Following calculation of the SAXS curves [31], each set of structures was compared with experiment to confirm the bending of the collagen helices at a molecular level, quantify the extent of bending, and provide molecular structures for these (Supplementary Information). Utilizing varying lengths of helices enabled the conformation of the helices to be examined as their lengths increased. In conclusion, our joint AUC-SAXS-MD approach complements previous methods used to study collagen flexibility, and is carried out in solution under physiological conditions. Because physically realistic full structures were used, this approach has high potential to answer further questions relating to the collagen triple helix sequence, stability and flexibility.