Abstract
Background
Longrange communication is very common in proteins but the physical basis of this phenomenon remains unclear. In order to gain insight into this problem, we decided to explore whether longrange interactions exist in lattice models of proteins. Lattice models of proteins have proven to capture some of the basic properties of real proteins and, thus, can be used for elucidating general principles of protein stability and folding.
Results
Using a computational version of doublemutant cycle analysis, we show that longrange interactions emerge in lattice models even though they are not an input feature of them. The coupling energy of both short and longrange pairwise interactions is found to become more positive (destabilizing) in a linear fashion with increasing 'contactfrequency', an entropic term that corresponds to the fraction of states in the conformational ensemble of the sequence in which the pair of residues is in contact. A mathematical derivation of the linear dependence of the coupling energy on 'contactfrequency' is provided.
Conclusion
Our work shows how 'contactfrequency' should be taken into account in attempts to stabilize proteins by introducing (or stabilizing) contacts in the native state and/or through 'negative design' of nonnative contacts.
Background
There is a wealth of information that indicates that distant sites in proteins are often coupled to each other energetically. Evidence for such coupling initially emerged through studies of allosteric regulation of proteins [1] when it became clear that allosteric control is often achieved by ligand bindinginduced conformational changes that are propagated from one ligand binding site to other distant sites. Later, it became possible to identify distant sites in proteins that are coupled to each other energetically by protein engineering through the use of the doublemutant cycle (DMC) method [for review see ref. [2]]. It has become clear from many such DMC studies that distant sites in proteins are often coupled to each other in a weak but significant manner [for review see ref. [3]]. More recently, it has become possible to demonstrate longrange coupling experimentally also by employing NMR methods [4]. Finally, computational methods have also indicated the presence of longrange communication in proteins. One class of computational methods is based on detection of coevolving residues in multiple sequence alignment data. Such methods were originally developed in order to detect residues that are in physical contact [5,6] but, more recently, have been used to reveal longrange pathways of energetic connectivity in proteins [79]. Longrange communication in proteins has also been revealed in computational studies based on normal mode analysis and its coarsegrained versions in which correlations between fluctuations of distant residues are detected [1013].
Despite the wealth of evidence indicating that longrange communication is extremely common in proteins, the physical basis of this phenomenon is still unclear. In addition, there are some uncertainties associated with many of the computational and experimental methods used to detect such longrange interactions. For example, it is not clear whether correlated mutations at distant positions reflect longrange coupling or common ancestry [1417]. In the case of the DMC method, there is always a concern that the calculated coupling energy reflects a reorganization energy in one or more of the mutants in the cycle and not the true pairwise interaction energy [18]. Given these reasons, we decided to explore whether longrange interactions exist in 2D and 3D lattice models of proteins although such interactions are not an input feature of them. Simple lattice models of proteins have proven to capture some of the basic properties of real proteins and, although they ignore many important details, they have been used successfully for elucidating general principles of protein folding and stability [1926]. Here, we show by invoking computational DMC analysis that longrange interactions are also common in lattice models of proteins. Hence, our results indicate that longrange communication in proteins may also occur as a result of interactions in the nonnative states and not just via pathways by which information is transmitted through the native state structure as other computational methods suggest [7,12]. Our analysis also shows that the values of the coupling energies of both short and longrange interactions have a linear dependence on their respective contact frequencies in the conformational ensemble.
Theory
The energy of a sequence in a specific lattice conformation, E(C), is calculated by summing all the pairwise contact energies, e_{ij }(see Table 1), between neighboring lattice points excluding consecutive residues in the sequence, as follows:
Table 1. Pairwise residue interaction energies.
where r_{i } r_{j} is the distance in lattice units between residues i and j that are separated in sequence by at least two residues and . The free energy of folding, ΔG, of the native conformation of a sequence was calculated using [21]:
where P_{N }is the probability that the chain is in its native state. This probability is given by: , where (Z is the ensemble of all possible conformations on the relevant lattice), E(N) is the energy of the native conformation, T is the temperature and k is the Boltzmann constant. Eq. (2) can be written as follows: . It, therefore, follows that:
We designate the sum over all the nonnative conformations by Q' where Q' = Q  e^{E(N)/kT}.
The strength of a pairwise interaction can be estimated from DMC calculations or by computing the perturbation energy, ΔΔG_{per }= ΔG_{wt } ΔG_{m}, where ΔG_{wt }and ΔG_{m }are the respective free energies of the wildtype native conformation before and after a particular pairwise interaction is removed ('turned off') without affecting any other interactions. For simplicity, the derivation that follows is for this measure termed 'perturbation energy' and not for the coupling energy calculated from DMC that involves more algebraic terms (see Methods). It is important to note, however, that the perturbation energy of a pairwise interaction is almost equal to the coupling energy calculated from DMC for that interaction since in the DMC method the effects of the different mutations on other interactions tend to cancel out [18]. We show in the Results that our derivation holds for perturbation energies as well as for coupling energies that, in contrast with the perturbation energies, can be determined in experiments. The perturbation energy can be expressed, as follows:
where E_{c }is the energy of the contact that was removed. It is convenient to partition the sum of all the nonnative conformations of the mutant, Q'_{m}, into the sets of C_{1 }and C_{2 }conformations (C_{1} + C_{2} = N) in which the interaction being targeted is either absent or present, respectively, as follows: Q'_{m }= , where λ is the contact energy of the perturbed interaction (Table 1). The expression for Q'_{m }can be rewritten, as follows:
Eq.(4) can, therefore, be rewritten as:
Taylor series expansion (ln(1+x) ≈ x for x < 1) of Eq. (5) and multiplication of the resulting expression by (= 1) yields:
The Boltzmann weighted contact frequency, BWCF(i, j), is defined as: , where i and j are two positions in the sequence and each occurrence of a contact is multiplied by the Boltzmann weight of the conformation (C) in which it occurs. Hence, inspection of Eq. (6) shows that plots of the perturbation energy (or coupling energy) as a function of BWCF(i, j) are expected to be approximately linear with a slope that is a function of λ.
Results and discussion
DMC have been used extensively to determine experimentally the strengths of various pairwise interactions in proteins [2]. Here, DMC were invoked in order to evaluate, for the first time to the best of our knowledge, coupling energies between all possible pairs of positions in 2D and 3D lattice models of proteins (Figure 1). Evidence for correlations between distant sites in lattice models has been reported before in the context of protein aggregation [27]. The distributions of the values of the coupling energies for all possible pairs of positions in the different native states of 10 sequences with 16 residues on a 2D lattice with full enumeration and 10 sequences with 27 residues on a 3 × 3 × 3 3D lattice are shown in Figure 2a and 2b, respectively. It can be seen that the values of the coupling energies for pairs in contact are mostly negative whereas the values of the coupling energies for pairs that are not in contact are mostly (but not exclusively) positive and smaller in absolute terms. Pairs that are in contact in a given native conformation could, therefore, be identified with high confidence using this procedure.
Figure 1. Scheme of a doublemutant cycle for a 2D lattice model protein. Two residues, i and j, are mutated (the mutations are designated by B on a dark background) either singly or in combination. ΔG(i,j→B,j) and ΔG(i,B→B,B) are the respective free energy changes upon mutation of residue i when residue j is present and when it has also been mutated. If these free energy changes are equal to each other then residues i and j are not coupled. Otherwise, residues i and j are energetically coupled. The same is true for the difference between the free energy changes ΔG(i,j→i,B) and ΔG(B,j→B,B). In this scheme, residues i and j form a direct contact in the native structure of the wildtype sequence. The doublemutant cycle method can be applied, however, also for residues that are distant in space in the native structure as carried out in the paper.
Figure 2. Distributions of the values of the pairwise coupling energies for all possible pairs of positions in sequences with different native states on 2D and 3D lattices. The values of the coupling energies for all possible pairs of positions in 10 sequences of 16 residues with different native states on a 2D lattice with full enumeration (a) and 10 sequences of 27 residues with different native states on a 3 × 3 × 3 3D lattice (b) were calculated. The distributions of the values of the pairwise coupling energies for positions in contact and not in contact in these native conformations are shown by filled and empty bars, respectively.
The fraction of conformations in the ensemble in which residues at two positions in a sequence are in contact is termed the 'contact frequency'. The 'contact frequency' is not defined for pairs of consecutive positions in a sequence since the interaction energy of such pairs is by definition zero (see Eq. (1)). It is also not defined for pairs of even or odd positions in a sequence since they cannot interact on a square or cubic lattice and, thus, have a contactfrequency of zero. Therefore, only pairs of residues with nonzero values of contactfrequency are considered here. A more accurate measure of the frequency of a contact in a conformational ensemble is the 'Boltzmann weighted contact frequency', BWCF, where the occurrence of each contact is multiplied by the Boltzmann weight of the conformation (C) in which it is found (see Theory). In the Theory section it was shown that the strength of a pairwise interaction is expected to have a linear dependence on its BWCF. Such linear plots of different measures of the strength of pairwise interactions as a function of BWCF are depicted in Figure 3 for several representative examples of lattice models.
Figure 3. Plots of different measures of the strength of pairwise interactions as a function of measures of contact frequency for several representative examples of lattice models. In panels a and b, the average coupling energies, <ΔΔG_{int}>, of all the pairs in contact (a) and not in contact (b) are plotted against their respective average BWCF in the case of a set of sequences with 30 residues that have the same native conformation on a lattice of 5 × 6. In panels c and d, the coupling (c) and perturbation (d) energies, ΔΔG_{int }and ΔΔG_{per}, are plotted against the BWCF for all the pairs of positions not in contact in the case of a sequence with L = 20 on a 2D lattice with full enumeration. In panels e and f, the coupling (e) and perturbation (f) energies are plotted against the BWCF for all the pairs of positions not in contact in the case of a sequence with L = 27 on a 3 × 3 × 3 cubic lattice. The data in panels cf corresponding to different values of λ are color coded, as follows: λ = 1.25, red; λ = 1, blue; λ = 0.75, green; λ = 0.25, magenta; λ = 0, black; λ = 1, cyan.
In the first example (Figure 3a and 3b), a set of sequences with a length, L, of 30 residues that have the same native structure was constructed (such structurebased sequence sets are designated SBSS) and the coupling energy was determined for every possible pair of positions in each sequence. The average value of the coupling energy for each pair of positions in the SBSS was then calculated in order to improve the signaltonoise ratio. In this example, only conformations that fit into a 5 × 6 lattice were considered. It may be seen that a strong linear correlation is found between the average coupling energy for each pair of positions in the SBSS and the corresponding average BWCF index. This correlation holds for pairs of residues that form native contacts (Figure 3a, r = 0.78; Pvalue = 5.5 × 10^{5}) and also, surprisingly, for pairs of residues that are not in contact in this particular native conformation (Figure 3b, r = 0.87; Pvalue = 1.3 × 10^{55}). Such linear correlations (with average correlation coefficients of about 0.84 (± 0.05) for the noncontacting pairs and 0.62 (± 0.15) for the pairs in contact) were also found for SBSS that correspond to 8 other native conformations (i.e. 2 SBSS for sequences with L = 30 on a 5 × 6 lattice, 4 SBSS for sequences with L = 25 on a 5 × 5 lattice and 2 SBSS for sequences with L = 25 on a 5 × 6 lattice) when only the conformations that fit into the lattice were considered.
In the second example, the coupling (Figure 3c) and perturbation (Figure 3d) energies for all residue pairs not in contact in the native state of a sequence with L = 20 on a 2D lattice are plotted as a function of their BWCF. Here, values of the BWCF were calculated for the entire conformational ensemble (Z = 41,889,578) and not just for the relatively compact states as in Figure 3a and 3b. The colorcoding designates the different contacts that have a given value of λ (Table 1). It may be seen (Figure 3d) that almost perfect correlations (r ≈ 1) are found between the perturbation energies and the BWCF for each given value of λ as predicted by Eq. (6). The correlations between the coupling energies and the BWCF for each given value of λ (except for λ = 0) are also excellent (Figure 3c, r > 0.97; Pvalue < 10^{6}) but not perfect as those in Figure 3d for the perturbation energies. Plots for residue pairs in contact in the native state are not shown since the number of such pairs is small and the correlations are, thus, not significant.
In the third example, the coupling (Figure 3e) and perturbation (Figure 3f) energies for all residue pairs not in contact in the native state of a sequence with L = 27 on a 3 × 3 × 3 lattice are plotted as a function of their BWCF. Here, too, almost perfect correlations (r ≈ 1) are found between the perturbation energies and the BWCF for each given value of λ (Figure 3f) whereas the correlations for the coupling energies (Figure 3e) are excellent (r = 0.92, 0.85, 0.92, 0.58 and 0.97 for λ values of 1.25, 1, 0.75, 0.25 and 1, respectively, with Pvalues < 2 × 10^{3 }except for λ = 1.25 where the number of data points, n, is small (n = 4)) but not perfect as those in Figure 3f. In summary, therefore, the data depicted in Figure 3 for different types of lattice models (2D or 3D lattices with or without full enumeration of all the conformational states in the ensemble and for single sequences or averaged for a SBSS) support the general result described by Eq. (6) that the free energies of both direct (in contact in the native state) and indirect pairwise interactions are linearly dependent on their Boltzmannweighted contact frequencies. It should be pointed out that only weak or no correlations are observed when pairwise energies taken directly from Table 1 are plotted against the BWCF, thereby providing further justification for the approach in this study that is based on the coupling or perturbation energies. The correlations in Figure 3 indicate that rare native contacts have more negative coupling energies than abundant native contacts. Likewise, rare noncontacting pairs have less positive coupling energies than abundant noncontacting pairs. Therefore, one may infer that native states can be stabilized by stabilizing contacts with low contactfrequency and destabilizing noncontacting pairs with a high contactfrequency.
Given that the interaction energy of a sequence in a specific lattice conformation is calculated by summing over all pairwise interactions between neighboring lattice points, it may seem surprising that nondirect interactions with significant positive coupling energies are found to exist (Figure 3). However, it has been pointed out that the strengths of pairwise interactions in the native state determined by DMC are always relative to the unfolded state [28]. Hence, the positive coupling energies observed here in the case of noncontacting pairs reflect, to a large extent, pairwise interactions in the nonnative conformations in the ensemble. Surprisingly, however, positive coupling energies are also observed in the case of residue pairs such as P, H that have interaction energies of zero (Table 1) and, therefore, should not be coupled even when they are in contact in nonnative conformations. These nonzero coupling energies arise owing to nonadditivity in entropy calculations [29].
The correlations shown in Figure 3 can be understood more intuitively by considering several extreme cases and keeping in mind that the free energy of the native state is a function of both the energy of the native conformation and the energies of all the other nonnative conformations in the ensemble (see Eq. (3)). For simplicity, the Boltzmann weights of the different states will be neglected in the discussion that follows and we will, therefore, refer to the contactfrequency (and not the BWCF) of residue pairs. The following four extreme cases of perturbations will be considered: (i) elimination of a native contact with a contactfrequency of 1/Z; (ii) elimination of a native contact with a contactfrequency that approaches one; (iii) elimination of a nonnative contact with a contact frequency of 1/Z; and (iv) elimination of a nonnative contact with a contactfrequency that approaches one.
In the first case, a contact that exists only in the native state is perturbed and, therefore, only the energy of the native state is affected. Hence, the gap between the energy of the native conformation and the energies of the nonnative conformations is reduced (Figure 4, case (i)). Such a perturbation reduces ΔH by the value of the contact energy, E_{c}, and has no effect on ΔS (which is a function of the sum, Q', over all the nonnative states). The perturbation energy, ΔΔG_{per}, in this case is, therefore, equal to E_{c}.
Figure 4. Effects of different perturbations on the energy spectrum of the native state and the ensemble of nonnative conformations. The effects of four different extreme cases of perturbations are depicted. In case (i), a native contact with a contactfrequency of 1/Z (Z is the ensemble size) is eliminated, thereby causing the energy of the native state, E_{n}, to increase to E'_{n }but not affecting the energies of the nonnative states. The gap between the energy of the native state and the energies of all the nonnative states is, therefore, reduced by E'_{n}E_{n}. In case (ii), a native contact with a contactfrequency value that approaches one is eliminated, thereby causing the energies of the native state and most of the nonnative states to increase by E'_{n}E_{n }without changing the energy gap. In case (iii), a nonnative contact with a contact frequency of 1/Z is eliminated without changing the energy gap as there is no change in the energies of the native state and most of the nonnative states. In case (iv), a nonnative contact with a contactfrequency value that approaches one is eliminated, thereby increasing the energies of most of the nonnative states and also the gap in energy between these states and the native state.
In the second case, a contact that exists in both the native state and in most of the nonnative states is perturbed and, therefore, the gap between the energy of the native conformation and the energies of the nonnative conformations hardly changes (Figure 4, case (ii)). In this case, Q'_{m}/Q'_{wt }< 1 and the perturbation energy, ΔΔG_{per }= E_{c } kTln(Q'_{m}/Q'_{wt}), therefore, increases (note that E_{c }is negative) in accordance with the plot in Figure 3a. Native contacts with a low contactfrequency, therefore, contribute more than those with a large contactfrequency to the gap between the energy of the native state and the energies of the nonnative conformations, thereby explaining why they have more negative coupling energies (Figure 3a).
In the third case of a perturbation of a nonnative contact with a low contact frequency, it is clear that the energies of the native state and most of the nonnative states do not change and, therefore, the energy gap also remains unchanged (Figure 4, case (iii)). In the fourth case of a perturbation of a nonnative contact with a high contactfrequency, most of the nonnative conformations are destabilized but the energy of the native state is not affected and the gap between the energy of the native conformation and the energies of the nonnative conformations, therefore, becomes larger (Figure 4, case (iv)). In cases such as (iii) and (iv), when a pairwise interaction between residues that are not in contact in the native state is removed, there is no effect on ΔH and the perturbation energy is given by: ΔΔG_{per }=  kTln(Q'_{m}/Q'_{wt}). If the contactfrequency of the removed interaction is low (case (iii)), then Q'_{m }≈ Q'_{wt }and the perturbation energy will be equal to approximately zero. If the contactfrequency of the removed interaction is high (case (iv)), then Q'_{m}/Q'_{wt }< 1 and the value of the perturbation energy will increase in accordance with the plots in Figure 3. Nonnative contacts with a high contactfrequency, therefore, contribute more than those with a low contactfrequency to the gap between the energy of the native state and the energies of the nonnative conformations, thereby explaining why they have more positive coupling energies (Figure 3). The effects shown schematically in Figure 4 almost always result in an increase of the energy of either the native state (case (i)), the nonnative states (case (iv)) or both (case (ii)) since nonfavorable pairwise interactions (Table 1) are rare given the amino acid composition we used. It is clear, however, that protein evolution might favor nonfavorable interactions in nonnative conformations that would destabilize them relative to the native state. Such an evolutionary process termed 'negative design' [3032] would be reflected in negative (favorable) coupling energies between residues that are not in contact in the native state.
How important is contactfrequency for protein stability? In order to obtain some insight into this question, we compared the stabilization achieved when optimizing a sequence for a particular native conformation using two different functions: (i) F1 (Eq. (8)) that minimizes the energy of native contacts and maximizes the energy of nonnative contacts ('negative design'); and (ii) F2 (Eq. (9)) in which the contributions of native and nonnative contacts is weighted by their contactfrequency. Both functions have an adjustable parameter, W_{c}, which determines the relative weight of the contributions of the native vs. nonnative interactions to stability. It can be seen (Figure 5) that for sequences with L = 30 on a 5 × 6 lattice, greater stability is achieved when contactfrequency is taken into account across the entire range of W_{c }values. Similar results were obtained in cases of other lattice dimensions and sequence lengths when only the most compact conformations were considered. A more general scoring function will be needed for efficient design when the entire conformational space is considered.
Figure 5. Stabilization of 2Dlattice model proteins by taking into consideration the contact frequency of residue pairs in contact and not in contact in the native state. The average free energy of folding of 100 sequences designed either with (○) or without (△) taking into account the contact frequency is plotted against the value of the contact weight, W_{c }(see Eqs. (8) and (9)), used in the design. The results shown here are for sequences with L = 30 on a 5 × 6 lattice. Similar results were obtained in cases of other lattice dimensions and sequence lengths when only the most compact conformations were considered. For more details, see Methods.
Conclusion
It is shown in this study that longrange pairwise interactions are also present in simple lattice models of proteins despite the fact that the interaction energy of a sequence in a specific conformation is based solely on direct interactions (Eq. (1)). Doublemutant cycle analysis of these lattice models and a mathematical analysis show that the strength of both direct and indirect native interactions increases (i.e. their coupling free energy becomes more negative) in a linear fashion with decreasing contactfrequency that is an entropic term. Hence, proteins can be stabilized by introducing (or stabilizing) contacts in the native state with a low contactfrequency and removing (or destabilizing) contacts in nonnative states with a high contactfrequency, as shown in Figure 5. Although manifestations of the latter strategy of 'negative design' have been recognized before [32] it has not been fully appreciated how the choice of interactions to be introduced (stabilized) or removed (destabilized) affects the extent of stabilization. Our findings are not dependent on sequence length and lattice dimensions that determine the conformational ensemble size and are, thus, likely to be relevant to the selection of folding pathways, folding rates and the design of real proteins. It may be possible to implement our findings using ensembles that are derived computationally (such as with COREX [33]) before experimentally characterized conformational ensembles become available. The new approach described here, that involves combining DMC analysis with lattice models, may also pave the way for a rigorous analysis of other complex aspects of protein behavior. For example, simulation of protein evolution by subjecting lattice models to rounds of mutagenesis followed by selection can be used to assess the contribution of correlated mutations at distant positions to protein folding, stability and allosteric communication. Employing lattice models to address this issue has the distinct advantage that it renders possible separating between correlated mutations due to common ancestry and those due to biophysical factors. Such studies may reveal relationships between contactfrequency, correlated mutations and other properties of proteins such as contactorder [34].
Methods
The lattice model of proteins
2D or 3D lattice models that are similar to the one described by Jacob and Unger [35] were used. In brief, the protein sequence consists of an alphabet of five amino acids: hydrophobic (H), neutral polar (P), positively charged (+), negatively charged () and blank (B) for the use of mutations. The pairwise interaction energies (e_{ij}) are taken from Table 1 and reflect in a qualitative manner the strengths of interactions between different types of amino acids. Similar results were obtained using other contact interaction matrices. The energies of all possible conformations of a given sequence on a particular lattice were calculated and the conformation with the lowest energy, if a single such one exists, was considered as its native conformation. A value of 1 was used for kT. It is important to note that the size of the ensemble, Z, is determined by the lattice dimensions and the same conformation of a given sequence may, therefore, have different values of ΔG due to different lattice dimensions.
Sequences of length (L) 16, 20, 25 and 30 were used for the 2D models and sequences with L = 27 for the 3D models. In the case of sequences with L = 16 or 20, all the respective 802,075 and 41,889,578 nonsymmetric conformations were enumerated. In the case of sequences with L = 25 or L = 30 where the total number of conformations is too large to enumerate, we considered only the conformations that could be fitted into 5 × 5 or 5 × 6 lattices. Likewise, only the conformations that could be fitted into a 3 × 3 × 3 lattice were considered in the case of the 3D lattice models for sequences with L = 27. The numbers of all compact nonsymmetric conformations of sequences with L = 25 on 5 × 5 and 5 × 6 lattices are 1081 and 377,779, respectively. The numbers of all compact nonsymmetric conformations of sequences with L = 30 on a 5 × 6 lattice and L = 27 on a 3 × 3 × 3 lattice are 6431 and 103,346, respectively. The sequences were generated by random rearrangements of L residues with compositions of 44% H, 31% P, 12.5% (+) and 12.5% () in the case of sequences with L = 16, 40% H, 28% P, 16% (+) and 16% () in the case of sequences with L = 25, 42% H, 30% P, 14% (+) and 14% () in the case of sequences with L = 30 and 40% H, 30% P, 15% (+) and 15% () in the case of sequences with L = 20 or 27 (these compositions correspond roughly to those in the PDB).
Generation of structurebased sequence sets (SBSS)
SBSS that contained more than 40 different sequences of the same length and with the same native conformation were generated. These SBSS have a mean sequence identity that is only between 0.29–0.34 since (as described above) the sequences were generated by random rearrangements and, thus, represent a random sample of sequence space. Nine different SBSS corresponding to different native conformations were examined.
Calculation of coupling energies using doublemutant cycles
The strength of a pairwise interaction between residues i and j in the native conformation of a given sequence was evaluated by constructing a DMC that comprises the original wildtype sequence, two single mutants in which either residue i or j are replaced with the blank (B) residue and the corresponding double mutant in which both residues are replaced with this residue. The blank residue corresponds to alanine which is usually chosen as a reference state in experimental DMC since it is assumed that (i) replacement by this residue tends, in general, to reduce structural perturbations upon mutation and that (ii) interactions between alanine at one position and any other type of residue at the second position are minimal. The coupling energy, ΔΔG_{int}, which is a measure of the strength of the pairwise interaction between residues i and j was calculated, as follows:
where ΔG_{i,j}, ΔG_{i,B}, ΔG_{B,j }and ΔG_{B,B }are the respective free energies of folding of the wildtype protein, the two single mutants and the double mutant that are calculated using Eq. (2). The coupling energy is equal to the difference in the free energies of two parallel processes in the cycle, Δ G(i,j→B,j) and ΔG(i,B→B,B), that correspond to the effect of mutating residue i (or j) when the other residue is present or absent, respectively (Figure 1). In these calculations, negative and positive coupling energies reflect interactions that stabilize or destabilize the native state, respectively. We implemented such an experiment for each given pair of positions so that a coupling energy could be calculated for every possible pair of positions in each sequence.
Calculation of perturbation energies
We also calculated a perturbation energy, ΔΔG_{per}, = ΔG_{wt } ΔG_{m}, for every possible pair of positions in each sequence where ΔG_{wt }and ΔG_{m }are the respective free energies of the wildtype native conformation before and after a particular pairwise interaction is 'turned off' but without affecting any other interactions. Under ideal circumstances [18], the coupling energy, which can be determined experimentally or calculated as described above, provides a good estimate of the perturbation energy that can only be determined by computation.
Contact frequencybased protein stabilization
Sequences with a specific native conformation were generated by a Monte Carlo (MC) process that maximizes two design scores, F_{1 }and F_{2}, that either ignore the contact frequency or take it into account, respectively. The expressions for the scores are:
where W_{c }is the contact weight, N_{c }and N_{non }are the total number of contacts and noncontacts in the specific conformation, respectively, and f_{c }is the contactfrequency. The values of W_{c} were varied between 0.05–0.95. For each value of W_{c}, 100 designed sequences were generated in 10,000 MC steps and the average free energy of folding was then calculated.
Authors' contributions
ON carried out all the calculations. AH wrote the paper. All the authors analysed the data, helped draft the paper and read and approved the final manuscript.
Acknowledgements
We thank Profs. Gilad Haran, Michael Levitt and John Moult for useful comments on an earlier draft of this paper and Etai Jacob for providing us with source codes for lattice model calculations. This work was supported by grant 1339/08 of the Israel Science Foundation to R.U. O.N.B. was supported in part by a Fellowship from the Kahn Family Research Center for Systems Biology of the Human Cell and the Kimmelman Center for Macromolecular Assembly.
References

Perutz MF: Mechanisms of cooperativity and allosteric regulation in proteins.
Q Rev Biophys 1989, 22:139236. PubMed Abstract  Publisher Full Text

Horovitz A: Doublemutant cycles: a powerful tool for analysing protein structure and function.
Fold & Des 1996, 1:R121R126. PubMed Abstract  Publisher Full Text

LiCata VJ, Ackers GK: Longrange, small magnitude nonadditivity of mutational effects in proteins.
Biochemistry 1995, 34:31333139. PubMed Abstract  Publisher Full Text

Clarkson MW, Gilmore SA, Edgell MH, Lee AL: Dynamic coupling and allosteric behavior in a nonallosteric protein.
Biochemistry 2006, 45:76937699. PubMed Abstract  Publisher Full Text  PubMed Central Full Text

Göbel U, Sander C, Schneider R, Valencia A: Correlated mutations and residue contacts in proteins.
Proteins: Struct Funct Genet 1994, 18:309317. Publisher Full Text

Neher E: How frequent are correlated changes in families of protein sequences?
Proc Natl Acad Sci USA 1994, 91:98102. PubMed Abstract  Publisher Full Text  PubMed Central Full Text

Lockless SW, Ranganathan R: Evolutionarily conserved pathways of energetic connectivity in protein families.
Science 1999, 286:295299. PubMed Abstract  Publisher Full Text

Kass I, Horovitz A: Mapping pathways of allosteric communication in GroEL by analysis of correlated mutations.
Proteins: Struct Funct Genet 2002, 48:611617. Publisher Full Text

Dima RI, Thirumalai D: Determination of network of residues that regulate allostery in protein families using sequence analysis.
Protein Sci 2006, 15:258268. PubMed Abstract  Publisher Full Text  PubMed Central Full Text

Ichiye T, Karplus M: Collective motions in proteins: a covariance analysis of atomic fluctuations in molecular dynamics and normal mode simulations.
Proteins: Struct Funct Genet 1991, 11:205217. Publisher Full Text

Rod TH, Radkiewicz JL, Brooks CL III: Correlated motion and the effect of distal mutations in dihydrofolate reductase.
Proc Natl Acad Sci USA 2003, 100:69806985. PubMed Abstract  Publisher Full Text  PubMed Central Full Text

Chennubhotla C, Rader AJ, Yang LW, Bahar I: Elastic network models for understanding biomolecular machinery: from enzymes to supramolecular assemblies.
Phys Biol 2005, 2:S173S180. PubMed Abstract  Publisher Full Text

Ma J: Usefulness and limitations of normal mode analysis in modeling dynamics of biomolecular complexes.
Structure 2005, 13:373380. PubMed Abstract  Publisher Full Text

Pollock DD, Taylor WR, Goldman N: Coevolving protein residues: maximum likelihood identification and relationship to structure.
J Mol Biol 1999, 287:187198. PubMed Abstract  Publisher Full Text

Wollenberg KR, Atchley WR: Separation of phylogenetic and functional associations in biological sequences by using the parametric bootstrap.
Proc Natl Acad Sci USA 2000, 97:32883291. PubMed Abstract  Publisher Full Text  PubMed Central Full Text

Larson SM, Di Nardo AA, Davidson AR: Analysis of covariation in an SH3 domain sequence alignment: applications in tertiary contact prediction and the design of compensating hydrophobic core substitutions.
J Mol Biol 2000, 303:433446. PubMed Abstract  Publisher Full Text

Noivirt O, Eisenstein M, Horovitz A: Detection and reduction of evolutionary noise in correlated mutation analysis.
Protein Eng Des Sel 2005, 18:247253. PubMed Abstract  Publisher Full Text

Serrano L, Horovitz A, Avron B, Bycroft M, Fersht AR: Estimating the contribution of engineered surface electrostatic interactions to protein stability by using doublemutant cycles.
Biochemistry 1990, 29:93439352. PubMed Abstract  Publisher Full Text

Sali A, Shakhnovich E, Karplus M: How does a protein fold?
Nature 1994, 369:248251. PubMed Abstract  Publisher Full Text

Hinds DA, Levitt M: Exploring conformational space with a simple lattice model for protein structure.
J Mol Biol 1994, 243:668682. PubMed Abstract  Publisher Full Text

Dill KA, Bromberg S, Yue K, Fiebig KM, Yee DP, Thomas PD, Chan HS: Principles of protein foldinga perspective from simple exact models.
Protein Sci 1995, 4:561602. PubMed Abstract  Publisher Full Text  PubMed Central Full Text

Onuchic JN, Socci ND, LutheySchulten Z, Wolynes PG: Protein folding funnels: the nature of the transition state ensemble.
Fold & Des 1996, 1:441450. PubMed Abstract  Publisher Full Text

Unger R, Moult J: Local interactions dominate folding in a simple protein model.
J Mol Biol 1996, 259:988994. PubMed Abstract  Publisher Full Text

Govindarajan S, Goldstein RA: On the thermodynamic hypothesis of protein folding.
Proc Natl Acad Sci USA 1998, 95:55455549. PubMed Abstract  Publisher Full Text  PubMed Central Full Text

Mirny L, Shakhnovich E: Protein folding theory: from lattice to allatom models.
Annu Rev Biophys Biomol Struct 2001, 30:361396. PubMed Abstract  Publisher Full Text

Vendruscolo M, Mirny LA, Shakhnovich EI, Domany E: Comparison of two optimization methods to derive energy parameters for protein folding: perceptron and Z score.
Proteins: Struct Funct Genet 2000, 41:192201. Publisher Full Text

Bratko D, Blanch HW: Effect of secondary structure on protein aggregation: A replica exchange simulation study.
J Chem Phys 2003, 118:51855194. Publisher Full Text

Horovitz A, Fersht AR: Strategy for analysing the cooperativity of intramolecular interactions in peptides and proteins.
J Mol Biol 1990, 214:613617. PubMed Abstract  Publisher Full Text

Mark AE, van Gunsteren WF: Decomposition of the free energy of a system in terms of specific interactions. Implications for theoretical and experimental studies.
J Mol Biol 1994, 240:167176. PubMed Abstract  Publisher Full Text

Hecht MH, Richardson JS, Richardson DC, Ogden RC: De novo design, expression, and characterization of Felix: a fourhelix bundle protein of nativelike sequence.
Science 1990, 249:884891. PubMed Abstract  Publisher Full Text

Hellinga HW: Rational protein design: combining theory and experiment.
Proc Natl Acad Sci USA 1997, 94:1001510017. PubMed Abstract  Publisher Full Text  PubMed Central Full Text

Berezovsky IN, Zeldovich KB, Shakhnovich EI: Positive and negative design in stability and thermal adaptation of natural proteins.
PLoS Comput Biol 2007, 3:498507. Publisher Full Text

Vertrees J, Barritt P, Whitten S, Hilser VJ: COREX/BEST server: a web browserbased program that calculates regional stability variations within protein structures.
Bioinformatics 2005, 21:33183319. PubMed Abstract  Publisher Full Text

Plaxco KW, Simons KT, Baker D: Contact order, transition state placement and the refolding rates of single domain proteins.
J Mol Biol 1998, 277:985994. PubMed Abstract  Publisher Full Text

Jacob E, Unger R: A tale of two tails: why are terminal residues of proteins exposed?
Bioinformatics 2007, 23:e225e230. PubMed Abstract  Publisher Full Text