Elucidating the extent of energetic coupling between residues in single-domain proteins, which is a fundamental determinant of allostery, information transfer and folding cooperativity, has remained a grand challenge. While several sequence- and structure-based approaches have been proposed, a self-consistent description that is simultaneously compatible with unfolding thermodynamics is lacking. We recently developed a simple structural perturbation protocol that captures the changes in thermodynamic stabilities induced by point mutations within the protein interior. Here, we show that a fundamental residue-specific component of this perturbation approach, the coupling distance, is uniquely sensitive to the environment of a residue in the protein to a distance of ∼15 Å. With just the protein contact map as an input, we reproduce the extent of percolation of perturbations within the structure as observed in network analysis of intra-protein interactions, molecular dynamics simulations and NMR-observed changes in chemical shifts. Using this rapid protocol that relies on a single structure, we explain the results of statistical coupling analysis (SCA) that requires hundreds of sequences to identify functionally critical sectors, the propagation and dissipation of perturbations within proteins and the higher-order couplings deduced from detailed NMR experiments. Our results thus shed light on the possible mechanistic origins of signaling through the interaction network within proteins, the likely distance dependence of perturbations induced by ligands and post-translational modifications and the origins of folding cooperativity through many-body interactions.
Signal transmission is at the core of chemical reactions that drive biological processes in the cellular milieu, not just through the protein–protein interaction network but also within a single protein domain [1–7]. The network of physically interacting residues within proteins plays two vital roles: they stabilize the protein structure by making numerous short-range van der Waals (vdW) interactions and, importantly, serve as conduits for the long-range transmission of signals. These signals generally involve information transfer or propagation of energy from the binding site of one ligand (protein, DNA or cofactor) to a distal site where another ligand binds or conveys the information of cellular status through post-translational modifications. These so-called perturbations at one site can have positive, negative or no effects (i.e. neutral) at a distal region , thus forming the basis of allostery both in the presence or absence of a conformational change [9–11]. While the precise mechanism of information transfer can be varied and is still an open question, it is well recognized that the energetic coupling of residues within a protein—in other words, the intra-protein interaction network—plays a dominant role in this process [1,12–15].
A well-established and heuristic method to infer energetic coupling is the statistical coupling analysis (SCA) . SCA involves a combination of multiple sequence alignment (MSA) of a particular protein family and a perturbative analysis of the MSA to infer coupled residues through sequence co-variation. Other avenues involve experimentally intensive double-mutant cycles  or a combination of elastic network models of proteins and normal mode analysis [16–18]. While these methods have contributed to the understanding of specific systems, an inclusive answer to the question on the extent and magnitude of energetic or thermodynamic coupling has remained elusive. In particular, modeling the extent of energetic coupling requires an understanding of the strength and distribution of the local and non-local interactions within a protein structure, and this is explicitly available through mutational studies. It is also well known that most mutational effects, particularly those that perturb the interaction network within the protein interior, are destabilizing  (perturbation of solvent-exposed charged residues can result in non-trivial effects [20,21]). Reproducing these thermodynamic signatures is therefore a critical requirement for any model as the same set of interactions that are perturbed upon mutations are the ones that ‘transmit’ signals upon ligand or protein binding or post-translational modifications .
In this regard, employing a multi-model approach involving interaction network analysis using network theory and all-atom molecular dynamics (MD) simulations, we recently showed that the effect of mutations propagate into the structure as far as ∼15−20 Å from the mutated site and dissipate exponentially mimicking experimental observations . Simplifying the distance dependence to two interaction shells around a perturbed residue (see Supplementary Figure S1A, for a simple schematic) and recasting these results into the Wako-Saitô-Muñoz-Eaton (WSME) model [23,24], it was possible to estimate the partitioning of destabilization energy in the first (x1) and second shell (x2) of interactions (radius 6 Å each) . The partitioning is found to be 0.50:0.20 (x1:x2) and is also weighted by the nature of the mutation (Mut). These observations can be concisely represented in the form of an empirical relation below with respect to the perturbed wild-type (WT) residue i:
where Qa,b is the number of heavy atom contacts within a specific cut-off (say, 6 Å) between residues a and b that is obtained from a contact map; ΔQ represents the loss of interactions (or the derived perturbations) between pairs of interactions between neighbors in the first shell of residue i (defined by j), and between the first shell and second shell neighbors (k); and n is the number of atoms corresponding to residue i in the wild-type (nWT) or in the mutated residue (nMut), respectively (Figure 1A). An equivalent version of equation (1) in terms of van der Waals interaction energy is provided in reference . This structural perturbation approach is able to reproduce the changes in stabilities induced by 375 point-mutations in 19 different protein structures, highlighting that it is faithful to protein destabilization thermodynamics . Importantly, it suggests that the effect of any perturbation percolates into the second shell of interactions and also dissipates as x2 < x1.
The structural perturbation approach reveals an exponential dissipation of mutational effects.
Because the interaction network serves as a conduit for signals, the empirical approach discussed above (eq. 1) can potentially shed light on the degree of coupling between protein residues. Does it therefore capture the exponential dissipation of mutational effects observed in network analysis, MD simulations and experiments?  Does the observed propagation obtained from the perturbation of residues in just a single structure explain the allosteric coupling between residues gleaned from the SCA that requires hundreds of sequences? Understanding the extent of energetic coupling would also shed light on the remarkably cooperative folding behaviors of proteins that has its origins in the nature of the interaction network. In fact, it is well established from theory , off-lattice models of proteins [26,27] and detailed NMR experiments [28,29] that packing effects arising from both the first shell of interactions around a residue or even the second shell (i.e. neighbors of first shell residues) contribute to the cooperative folding of proteins. Signaling through the interaction network within proteins and folding cooperativity are therefore two distinct but related aspects of the same problem, and understanding one will necessarily reveal insights into the other. Specifically, can the perturbation approach (eq. 1) predict the degree of residue-level thermodynamic coupling gleaned from detailed NMR experiments that employ temperature as a perturbation?
Predicting the changes in thermodynamic stability induced by mutations
The changes in protein stability upon point mutations in the hydrophobic core of proteins are predicted using the Ising-like Wako-Saitô-Muñoz-Eaton (WSME) model [23,24] with energy terms for packing, solvation and electrostatics, following the protocol prescribed previously [20,30]. The following four parameters are fixed for all the proteins: van der Waals interaction (vdW) energy per native heavy-atom contact from the contact map (ξ = 70.5 J mol−1); heat capacity change per native contact ( = −0.36 J mol−1 K−1); entropic cost for fixing a residue in the native conformation (ΔSconf = −16.5 J mol−1 K−1 per residue); and the m-value per residue for deriving a chemical unfolding curve (mres = 0.1 kJ mol−1 M−1), apart from the pH (7.0) and ionic strength conditions (0.05 M). Since we are interested in predicting only the relative changes in stability upon point mutations, the absolute values of the parameters are not critical. Here, we employ the average numbers from a previous detailed analysis . An all heavy-atom based contact map with a distance cut-off of 6 Å, that excludes nearest neighbors is used for calculating van der Waals interactions. This contact map is provided as an input to calculate total partition function employing the transfer-matrix formalism of Wako and Saitô  from which an unfolding curve is derived. The unfolding curves thus generated are then fitted to a two-state model, thus yielding the stability at zero denaturant concentration for the wild-type protein (ΔGWT). A more detailed description of the model and its energy terms can be obtained from our previous works [20,30,31].
The mutational effects are predicted (i.e. ΔGmut) by using identical parameters to those above and by modulating the van der Waals interactions of the mutant using the fixed relation given below:
where i is the mutated residue, j refers to first shell neighbors and k refers to second shell neighbors of the mutated residue. The pair-wise van der Waals interaction energies (E) are weighted by the nature of the mutation (the number of atoms in mutated residue nmut or wild type nWT) and an empirical factor (x1 or x2) to denote the extent of fractional destabilization (0 < x1, x2 < 1) in the first and second shells (Supplementary Figure S1A). A value of 0.5 and 0.2 was employed for x1 and x2, respectively, and with equal shell radii of 6 Å as prescribed previously . This was arrived at from a detailed analysis of more than 20,625 unfolding curves and different first-/second-shell radii (including 6 and 6 Å, 6 and 5 Å, 6 and 4 Å, 5 and 5 Å, 5 and 4 Å apart from various x1 and x2 combinations), which was capable of reproducing the mutational destabilizations involving 375 mutations from 19 proteins .
Rapid alanine scanning mutagenesis
Equation (2) can also be recast in terms of the pair-wise contact map (Q) and is provided in the main text (eq. 1). This essentially liberates the empirical relation from the WSME model and allows a scan of the environment of every residue in a rapid manner by inputting just the contact map and the identity of the mutation. Specifically, for an alanine scanning mutagenesis, we mutate every residue to alanine (and in the case of alanine, to glycine), except for proline and glycine, on the six proteins studied (see Supplementary Figure S1B for a flowchart). The effective change in the number of interactions expected for a given residue, ΔQ, is estimated by summing up the pair-wise terms in equation (1) (equivalent to the changes in the van der Waals interaction energy between the wild-type and mutant). This is now plotted as a function of distance from the mutated site for every residue. Note that ΔQ is always positive in this method simply because we truncate the residue to alanine as is always done in experimental alanine-scanning mutagenesis. Potential changes to the structure introduced through mutations or local stabilizations will not be captured as the method relies solely on the contact map of the WT.
Understanding the coupling distance
We first perform an in silico alanine scanning mutagenesis following the empirical relation outlined in equation (1) on six proteins with varied secondary structural contents and topologies (bACBP, Im9, SH3, Ubq, FKBP12 and Sso7d). We find that the decay of ΔQ is exponential-like from the mutated site with a mean distance constant or a coupling distance (dC) of 3.8 Å and 3.7 for Ubq and bACBP, respectively (Figure 1B and Supplementary Figure S2). The magnitude of dC is of the same order as observed from the interaction network perturbation analysis and MD simulations of several mutants of ubiquitin  (dC ∼4.1–4.7 Å), despite this dependence not being explicitly included in the perturbation equation. Moreover, the mean dC for each of the proteins is also ∼3.8 ± 0.06 Å indicating that the average extent of coupling is largely independent of protein size or structural class (Supplementary Figure S2). It is also important to note that this structural perturbation is consistent with thermodynamic experiments, as we obtain an average correlation of ∼0.8 between the experimental and predicted destabilization (ΔΔG) for the six proteins when equation (1) is cast into the WSME model (Supplementary Figure S3).
The average dC can be de-convoluted into contributions from individual residues that range from ∼2 Å to ∼5 Å for all proteins contributing to the apparent large scatter in Figure 1B (Figure 2A,B). The extent of percolation into the protein structure (i.e. the distance at which no effect is felt) in these cases corresponds to ∼8 and ∼17 Å, respectively. Figure 2A plots the ΔQ vs. distance plot for a buried residue L56 in ubiquitin that displays a coupling distance of ∼4.8 Å that is much higher than the average. A solvent-exposed residue S20, on the other hand, exhibits a smaller coupling distance (Figure 2B). These observations are intuitively expected, as residues whose side-chains are pointing within the hydrophobic core will be interacting with an array of residues in the crowded protein interior. The situation is the exact opposite for a solvent-exposed residue that is merely coupled to residues on the surface. In other words, buried residues will contribute to large destabilization and hence will be more conserved over evolutionary time (to avoid potential detrimental effects) compared with solvent-exposed residues (where the mutational effect is small). This is exactly what is observed in sequence-structure-based studies of protein evolution that point to a large conservation of buried residues and a weaker conservation of solvent-exposed residues . This enables a rapid exploration of sequence-function space during evolution without significantly affecting the protein stability.
Understanding the coupling distance.
A plot of the magnitude of dC as a function of sequence index reveals no apparent dependence on the protein class or topology (Figure 2C and Supplementary Figure S4). However, we find a weak correlation (r ∼ 0.56 and explains just ∼31% of the variance in the dataset) between relative solvent accessible surface area (rSASA) of residues in the protein structure and the corresponding coupling distances. This holds true irrespective of a global correlation or binning analysis (green in Figure 2D). This is also evident in Figure 2D where we find a large spread in the coupling distances even at low rSASA, indicating that dC is not just a function of rSASA. One reason for only a weak correlation is that the coupling distance is very sensitive to the local packing environment and the size of the amino acid. For example, alanine will be coupled to only a few distant residues even if it is completely buried, since the side-chain is short. The opposite holds true for a large residue that is exposed to the solvent. It can be seen that different secondary-structure elements by themselves exhibit different mean coupling magnitudes. Even within a single secondary structure, a unique pattern is evident where the dC magnitude fluctuates between high and low; for example, adjacent residues in beta strands point to opposite coupling magnitudes while the helices display a similar pattern but with a weaker dependence (Supplementary Figure S5). These differences have their origins in the organization of residues within the corresponding secondary-structure elements: adjacent residues in beta strands have their side-chains pointing in opposite directions and if the residue is buried then it exhibits larger coupling and vice versa (Supplementary Figure S5). In other words, the coupling distance is a robust property of protein residues—since it is sensitive to the local packing environment, secondary structure, the nature of the amino acid and the relative solvent accessibility—and can potentially highlight strongly coupled residues (Figure 2E). Note that ‘strong coupling’ is a relative term and it will depend on the immediate environment of the different residues in the protein.
Comparison with SCA predictions
The structural perturbation method we present here shows that mutating a residue at a particular site can have non-intuitive effects since most residues are coupled to residues even in their second shell, i.e. higher order effects. It therefore serves as a very powerful and a rapid avenue to probe for the extent of energetic or allosteric coupling in proteins that has been proposed to be determine dynamic allostery  (responses at distal sites in the absence of conformational change). It should therefore be possible to recapitulate SCA observations by a mere perturbation of specific residues that can also be functionally important.
The SCA method was first applied to the protein PDZ3 (Figure 3A), which plays a vital role in cell signaling by aiding the assembly of multi-protein complexes . We compare the results of our perturbation approach to the more complex SCA. We perturb just four ligand-binding residues (F325, N326, H372, K380) and obtain a similar exponential dissipation of their effects with a mean coupling distance of 4.3 Å (blue in Figure 3B). Remarkably, this approach in itself captures most of the residues identified as energetically coupled from the SCA  (cyan circles in Figure 3B). The same holds true for the enzyme DHFR (dihydrofolate reductase; Figure 3C), which binds to the substrate dihydrofolate and reduces it to tetrahydrofolate aided by the cofactors NADPH and NADP+. SCA of DHFR has shown nearly 50 residues in the structure to directly affect enzyme activity . To see if these observations can be reproduced from a simple distance dependence of the dissipation of perturbation, we perturb four spatially contacting residues close to the ligands (I5, T46, I50 and I94, of which only T46 is highly conserved) resulting in a dC of 3.6 Å (Figure 3D). This perturbation approach again captures nearly 70% of the residues identified as being energetically coupled from the SCA (cyan circles in Figure 3D).
Identifying allosterically coupled residues via perturbations.
Rationalizing the allosteric modulations of PTMs
A related phenomenon is observed in a post-translational modification (PTM) event that takes place in PDZ3. Phosphorylation of Y92 in PDZ3 has been shown to allosterically modulate the binding affinity of a peptide ligand at a distant site . Since a detailed chemical shift mapping is available from experiments, we plot the difference in chemical shifts of backbone atoms between the WT and the phosphorylated variant as a function of distance from Y92. We find that the backbone chemical shifts of distant residues are also modulated, effectively resulting in an exponential dependence with a coupling distance of 4.9 Å (Figure 3E). Perturbation analysis of PDZ3 Y92 (without considering charge-induced structural changes) reveals a dC of 4.1 Å (Figure 3F). While the experimental backbone chemical shifts cannot be directly compared to the predicted perturbations (ΔQ), the extent of propagation is very similar, suggesting that the experimentally observed allosteric modulation is a result of the propagation and dissipation of the phosphorylation event into the structure. CheY, which plays a vital role in bacterial chemotaxis, also undergoes a phosphorylation event at D57 that in turn determines the direction of flagellar rotation through regulation of binding (Figure 3G) . A perturbation analysis of D57 results in a low coupling distance of 3.3 Å (since it is solvent exposed; Figure 3H), but the sphere of influence of perturbation still encompasses the known allosteric quartet of residues (including Y106, whose rotameric status is modified; residues in cyan sticks in Figure 3G).
Insights into protein folding cooperativity
NMR experiments that monitor changes in the chemical environment of individual atoms in proteins with temperature indicate that higher-order couplings emerge through interactions mediated by the first-shell contacts [28,29]. Since our method explicitly accounts for these effects, we map the changes in the number of contacts (ΔQ) upon perturbation of every residue in the form of a global ΔQ map (bottom right triangle of Figure 4A,C). This is identical to the approach taken for the lower triangular matrix of Figure 1; in this case, however, the ΔQ distribution from every residue is summed up to provide a global ΔQ map. We observe a pattern very similar to the interpretation of NMR experiments wherein additional spots expectedly appear on the ΔQ map for both BBL and gpW (Figure 4A,C). The ΔQ map is more discrete compared with the continuous pattern of experimental coupling indices, as it is derived from a discrete contact map (upper left triangles in Figures 4A and 4C). Residues with large coupling distances are concentrated in the hydrophobic core, as noted before (Figure 4B). For example, in BBL, we observe that L9 and several residues of the second helix mediate an array of non-local contacts exactly as seen in experiments  (orange arrows in Figure 4A). Interestingly, I6 seems to interact strongly with several residues farther along the sequence (yellow arrow in Figure 4A); however, this is not observed in experiments. This possibly arises from the use of a single structure as a representative of possibly a heterogeneous native ensemble, thus over-estimating the coupling.
Higher-order interactions from ΔQ map.
In gpW, we observe several long-range couplings, particularly between the two strands and the rest of the structure, which are similar to experimental observations  (orange circle in Figure 4C). However, the two strands of gpW exhibit low coupling (〈dC〉 of ∼3.5 Å), since they interact only weakly with the hydrophobic core defined by the two helices. This weak coupling potentially contributes to the population of a partially structured state during the folding of gpW in which both the strands are disordered . The second helix has a higher mean coupling distance (more oranges and reds in Figure 4D; 〈dC〉 ∼4.1 Å) compared with the first helix (〈dC〉 ∼3.8 Å) consistent with NMR experiments that indicate the second helix to be more stable. These results highlight that variations in energetic coupling within a single protein domain determine the effective folding cooperativity, as previously noted. Furthermore, the large dispersion in the ΔQ map suggests that a residue can be considered as ‘fully folded’ only when the second-shell of interactions is also formed; this aspect is generally not considered in protein folding models, including the ones employed by us (i.e. the WSME model).
The residue-level structural perturbation approach we propose here simultaneously explains experimental protein destabilization energetics and the results of sequence-based statistical coupling analysis. This self-consistency, we believe, is an important requirement to model dynamic allostery since signal transmission necessarily takes place through the network of interactions within proteins. The propagative effect is not just restricted to mutations, but is also surprisingly consistent with the effects of ligand binding and post-translational modifications on proteins. Our results further explain a recent meta-analysis of the conservation patterns in enzymes ; it was found that residues up until 20–25 Å away from the active site exhibit minimal mutational rate compared with the rest of the structure, suggestive of long-range coupling among residues. We find that such couplings arise merely from a combination of distance proximity, as originally proposed , and the propagative effect of perturbations in an interaction network. This could therefore be a generic feature of proteins or systems with multiple weak non-covalent interactions. Our observations also highlight that there are potentially numerous allosteric sites in enzymes since any residue within ∼15 Å from the active site is expected to influence enzyme activity. Our method therefore opens the door for rapidly identifying specific solvent-exposed sites or ‘hot spots’ that a drug molecule can bind to in order to regulate enzyme function without directly binding to the active site.
Both the SCA and the method proposed here are computational approaches and it is therefore unreasonable to expect a perfect correlation between the two. However, the large overlap between the predictions of the SCA (from numerous sequences) and that gleaned from a simple structural perturbation (Figure 3) suggests that our perturbation approach can be employed as a first step to identify potentially coupled residues that can further be refined by either SCA or more directly through experiments. In elastic network models (ENMs), it is generally assumed that residues separated by up to 10–15 Å are coupled through springs of either a fixed spring constant or that decay with increasing distance. Interestingly, incorporation of a distance-dependent spring constant provides a better agreement with experiments . The results of our perturbation approach points to a simple reason as to why such an assumption provides a better agreement, since the effect of any perturbation is expected to decay the farther a particular residue is from a perturbed site (the coupling is weaker; also see ref. ). It is also important to note that the perturbation through alanine scanning is purely employed as a tool to probe the underlying features of the intra-molecular interaction network; the resulting output is ΔQ as a function of residue index (or distance from the perturbed residue), which is a measure of the extent to which the perturbed residue is coupled to its neighbors, i.e. its immediate environment. Our method therefore does not report on possible local stabilization (negative ΔQ) of the protein structure. Moreover, since the perturbation approach relies on the contact map of the WT as the input, any potential changes to the structure upon ligand binding or mutation (e.g., small to large side-chain substitutions) will not be accounted for.
We also highlight how cooperativity is built into protein systems through weak higher order interactions by explicitly decoupling the environment of a residue into two shells of interactions and destabilization energies. Many-body terms (or the second-shell effects) in protein folding models should therefore provide a more realistic description of the (un)folding processes. This however, requires the folding community to consistently move beyond contact maps and introduce residue-environment-specific energy terms in structure-based models, similar to the amino acid ‘burial’ code pioneered by de Araujo and Onuchic . However, a more detailed understanding of energetic coupling and the associated timescales from multiple systems is required to paint a quantitative picture on the modulations of binding affinity at a distant site (the ‘holy grail’ of allostery) or to predict the degree of folding cooperativity from first principles. The experimentally derived thermodynamically consistent approach we present here that recasts the residue environment into a single number provides a stepping-stone in that direction.
N. R and A. N. N. designed the approach, analyzed the data and wrote the paper.
A. N. N. is a Wellcome Trust/ DBT India Alliance Intermediate Fellow.
The Authors declare that there are no competing interests associated with the manuscript.