Skip to main content
  • Research article
  • Open access
  • Published:

Analysing the origin of long-range interactions in proteins using lattice models

Abstract

Background

Long-range communication is very common in proteins but the physical basis of this phenomenon remains unclear. In order to gain insight into this problem, we decided to explore whether long-range interactions exist in lattice models of proteins. Lattice models of proteins have proven to capture some of the basic properties of real proteins and, thus, can be used for elucidating general principles of protein stability and folding.

Results

Using a computational version of double-mutant cycle analysis, we show that long-range interactions emerge in lattice models even though they are not an input feature of them. The coupling energy of both short- and long-range pairwise interactions is found to become more positive (destabilizing) in a linear fashion with increasing 'contact-frequency', an entropic term that corresponds to the fraction of states in the conformational ensemble of the sequence in which the pair of residues is in contact. A mathematical derivation of the linear dependence of the coupling energy on 'contact-frequency' is provided.

Conclusion

Our work shows how 'contact-frequency' should be taken into account in attempts to stabilize proteins by introducing (or stabilizing) contacts in the native state and/or through 'negative design' of non-native contacts.

Background

There is a wealth of information that indicates that distant sites in proteins are often coupled to each other energetically. Evidence for such coupling initially emerged through studies of allosteric regulation of proteins [1] when it became clear that allosteric control is often achieved by ligand binding-induced conformational changes that are propagated from one ligand binding site to other distant sites. Later, it became possible to identify distant sites in proteins that are coupled to each other energetically by protein engineering through the use of the double-mutant cycle (DMC) method [for review see ref. [2]]. It has become clear from many such DMC studies that distant sites in proteins are often coupled to each other in a weak but significant manner [for review see ref. [3]]. More recently, it has become possible to demonstrate long-range coupling experimentally also by employing NMR methods [4]. Finally, computational methods have also indicated the presence of long-range communication in proteins. One class of computational methods is based on detection of co-evolving residues in multiple sequence alignment data. Such methods were originally developed in order to detect residues that are in physical contact [5, 6] but, more recently, have been used to reveal long-range pathways of energetic connectivity in proteins [7–9]. Long-range communication in proteins has also been revealed in computational studies based on normal mode analysis and its coarse-grained versions in which correlations between fluctuations of distant residues are detected [10–13].

Despite the wealth of evidence indicating that long-range communication is extremely common in proteins, the physical basis of this phenomenon is still unclear. In addition, there are some uncertainties associated with many of the computational and experimental methods used to detect such long-range interactions. For example, it is not clear whether correlated mutations at distant positions reflect long-range coupling or common ancestry [14–17]. In the case of the DMC method, there is always a concern that the calculated coupling energy reflects a reorganization energy in one or more of the mutants in the cycle and not the true pairwise interaction energy [18]. Given these reasons, we decided to explore whether long-range interactions exist in 2D and 3D lattice models of proteins although such interactions are not an input feature of them. Simple lattice models of proteins have proven to capture some of the basic properties of real proteins and, although they ignore many important details, they have been used successfully for elucidating general principles of protein folding and stability [19–26]. Here, we show by invoking computational DMC analysis that long-range interactions are also common in lattice models of proteins. Hence, our results indicate that long-range communication in proteins may also occur as a result of interactions in the non-native states and not just via pathways by which information is transmitted through the native state structure as other computational methods suggest [7, 12]. Our analysis also shows that the values of the coupling energies of both short- and long-range interactions have a linear dependence on their respective contact frequencies in the conformational ensemble.

Theory

The energy of a sequence in a specific lattice conformation, E(C), is calculated by summing all the pairwise contact energies, eij (see Table 1), between neighboring lattice points excluding consecutive residues in the sequence, as follows:

Table 1 Pairwise residue interaction energies.
E ( C ) = ∑ j > i + 2 N e ij δ ( | r i − r j | ) MathType@MTEF@5@5@+=feaagaart1ev2aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xI8qiVKYPFjYdHaVhbbf9v8qqaqFr0xc9vqFj0dXdbba91qpepeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGaciGaaiaabeqaaeqabiWaaaGcbaGaeeyrauKaeiikaGIaee4qamKaeiykaKIaeyypa0ZaaabCaeaacqqGLbqzdaWgaaWcbaGaeeyAaKMaeeOAaOgabeaaaeaacqqGQbGAcqGH+aGpcqqGPbqAcqGHRaWkcqaIYaGmaeaacqqGobGta0GaeyyeIuoakiabes7aKjabcIcaOmaaemaabaGaeeOCai3aaSbaaSqaaiabbMgaPbqabaGccqGHsislcqqGYbGCdaWgaaWcbaGaeeOAaOgabeaaaOGaay5bSlaawIa7aiabcMcaPaaa@4B7A@
(1)

where |ri - rj| is the distance in lattice units between residues i and j that are separated in sequence by at least two residues and δ ( x ) = { 1 x = 1 0 o t h e r w i s e MathType@MTEF@5@5@+=feaagaart1ev2aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xH8viVGI8Gi=hEeeu0xXdbba9frFj0xb9qqpG0dXdb9aspeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGaciGaaiaabeqaaeqabiWaaaGcbaGaeqiTdqMaeiikaGIaemiEaGNaeiykaKIaeyypa0ZaaiqaaqaabeqaauaabeqabiaaaeaacqaIXaqmaeaacqWG4baEcqGH9aqpcqaIXaqmaaaabaqbaeqabeGaaaqaaiabicdaWaqaaiabd+gaVjabdsha0jabdIgaOjabdwgaLjabdkhaYjabdEha3jabdMgaPjabdohaZjabdwgaLbaaaaGaay5Eaaaaaa@44B8@ . The free energy of folding, ΔG, of the native conformation of a sequence was calculated using [21]:

Δ G = − k T ln ( P N 1 − P N ) MathType@MTEF@5@5@+=feaagaart1ev2aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xI8qiVKYPFjYdHaVhbbf9v8qqaqFr0xc9vqFj0dXdbba91qpepeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGaciGaaiaabeqaaeqabiWaaaGcbaGaeuiLdqKaem4raCKaeyypa0JaeyOeI0Iaem4AaSMaemivaqLagiiBaWMaeiOBa4MaeiikaGscfa4aaSaaaeaacqWGqbaudaWgaaqaaiabd6eaobqabaaabaGaeGymaeJaeyOeI0Iaemiuaa1aaSbaaeaacqWGobGtaeqaaaaakiabcMcaPaaa@3EFE@
(2)

where PN is the probability that the chain is in its native state. This probability is given by: P N = e − E ( N ) / k T Q MathType@MTEF@5@5@+=feaagaart1ev2aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xH8viVGI8Gi=hEeeu0xXdbba9frFj0xb9qqpG0dXdb9aspeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGaciGaaiaabeqaaeqabiWaaaGcbaGaemiuaa1aaSbaaSqaaiabd6eaobqabaGccqGH9aqpjuaGdaWcaaqaaiabdwgaLnaaCaaabeqaaiabgkHiTiabdweafjabcIcaOiabd6eaojabcMcaPiabc+caViabdUgaRjabdsfaubaaaeaacqWGrbquaaaaaa@3AEA@ , where Q = ∑ C ∈ Z e − E ( C ) / k T MathType@MTEF@5@5@+=feaagaart1ev2aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xH8viVGI8Gi=hEeeu0xXdbba9frFj0xb9qqpG0dXdb9aspeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGaciGaaiaabeqaaeqabiWaaaGcbaGaeeyuaeLaeyypa0ZaaabuaeaacqqGLbqzdaahaaWcbeqaaiabgkHiTiabbweafjabcIcaOiabboeadjabcMcaPiabc+caViabdUgaRjabdsfaubaaaeaacqqGdbWqcqGHiiIZcqqGAbGwaeqaniabggHiLdaaaa@3D99@ (Z is the ensemble of all possible conformations on the relevant lattice), E(N) is the energy of the native conformation, T is the temperature and k is the Boltzmann constant. Eq. (2) can be written as follows: Δ G = − k T ln ( e − E ( N ) / k T Q − e − E ( N ) / k T ) MathType@MTEF@5@5@+=feaagaart1ev2aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xH8viVGI8Gi=hEeeu0xXdbba9frFj0xb9qqpG0dXdb9aspeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGaciGaaiaabeqaaeqabiWaaaGcbaGaeuiLdqKaem4raCKaeyypa0JaeyOeI0Iaem4AaSMaemivaqLagiiBaWMaeiOBa42aaeWaaKqbagaadaWcaaqaaiabbwgaLnaaCaaabeqaaiabgkHiTiabbweafjabcIcaOiabb6eaojabcMcaPiabc+caViabdUgaRjabdsfaubaaaeaacqWGrbqucqGHsislcqqGLbqzdaahaaqabeaacqGHsislcqqGfbqrcqGGOaakcqqGobGtcqGGPaqkcqGGVaWlcqWGRbWAcqWGubavaaaaaaGccaGLOaGaayzkaaaaaa@4D5C@ . It, therefore, follows that:ΔG = E(N) + kT ln(Q - e-E(N)/kT)

We designate the sum over all the non-native conformations by Q' where Q' = Q - e-E(N)/kT.

The strength of a pairwise interaction can be estimated from DMC calculations or by computing the perturbation energy, ΔΔGper = ΔGwt - ΔGm, where ΔGwt and ΔGm are the respective free energies of the wild-type native conformation before and after a particular pairwise interaction is removed ('turned off') without affecting any other interactions. For simplicity, the derivation that follows is for this measure termed 'perturbation energy' and not for the coupling energy calculated from DMC that involves more algebraic terms (see Methods). It is important to note, however, that the perturbation energy of a pairwise interaction is almost equal to the coupling energy calculated from DMC for that interaction since in the DMC method the effects of the different mutations on other interactions tend to cancel out [18]. We show in the Results that our derivation holds for perturbation energies as well as for coupling energies that, in contrast with the perturbation energies, can be determined in experiments. The perturbation energy can be expressed, as follows:ΔΔGper = Ec - kT ln(Q'm/Q'wt)

where Ec is the energy of the contact that was removed. It is convenient to partition the sum of all the non-native conformations of the mutant, Q'm, into the sets of C1 and C2 conformations (|C1| + |C2| = N) in which the interaction being targeted is either absent or present, respectively, as follows: Q'm = ∑ C ∈ C 1 e − E(C)/ k T + ∑ C ∈ C 2 e − (E(C) − λ ) / k T MathType@MTEF@5@5@+=feaagaart1ev2aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xH8viVGI8Gi=hEeeu0xXdbba9frFj0xb9qqpG0dXdb9aspeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGaciGaaiaabeqaaeqabiWaaaGcbaWaaabuaeaacqqGLbqzdaahaaWcbeqaaGGaaiab=jHiTiabbweafjabbIcaOiabboeadjabbMcaPiabb+caViabdUgaRjabdsfaubaaaeaacqqGdbWqcqGHiiIZcqqGdbWqdaWgaaadbaGaeeymaedabeaaaSqab0GaeyyeIuoakiabgUcaRmaaqafabaGaeeyzau2aaWbaaSqabeaacqWFsislcqqGOaakcqqGfbqrcqqGOaakcqqGdbWqcqqGPaqkcqGHsislcqaH7oaBcqGGPaqkcqGGVaWlcqWGRbWAcqWGubavaaaabaGaee4qamKaeyicI4Saee4qam0aaSbaaWqaaiabikdaYaqabaaaleqaniabggHiLdaaaa@5223@ , where λ is the contact energy of the perturbed interaction (Table 1). The expression for Q'm can be rewritten, as follows:

Q' m = ∑ C ∈ C 1 e − E(C)/ k T + ∑ C ∈ C 2 e − E(C)/ k T ( e λ / k T + 1 − 1 ) = ∑ i = 1 N e − E i / k T + ∑ C ∈ C 2 e − E ( C ) / k T ( e λ / k T − 1 ) MathType@MTEF@5@5@+=feaagaart1ev2aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xH8viVGI8Gi=hEeeu0xXdbba9frFj0xb9qqpG0dXdb9aspeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGaciGaaiaabeqaaeqabiWaaaGceaabbeaacqqGrbqucqqGNaWjdaWgaaWcbaGaeeyBa0gabeaakiabg2da9maaqafabaGaeeyzau2aaWbaaSqabeaaiiaacqWFsislcqqGfbqrcqqGOaakcqqGdbWqcqqGPaqkcqqGVaWlcqWGRbWAcqWGubavaaaabaGaee4qamKaeyicI4Saee4qam0aaSbaaWqaaiabbgdaXaqabaaaleqaniabggHiLdGccqGHRaWkdaaeqbqaaiabbwgaLnaaCaaaleqabaGae8NeI0IaeeyrauKaeeikaGIaee4qamKaeeykaKIaee4la8Iaem4AaSMaemivaqfaaaqaaiabboeadjabgIGiolabboeadnaaBaaameaacqaIYaGmaeqaaaWcbeqdcqGHris5aOGaeiikaGIaeeyzau2aaWbaaSqabeaacqaH7oaBcqGGVaWlcqWGRbWAcqWGubavaaGccqGHRaWkcqaIXaqmcqGHsislcqaIXaqmcqGGPaqkaeaacqGH9aqpdaaeWbqaaiabbwgaLnaaCaaaleqabaGaeyOeI0Iaeeyrau0aaSbaaWqaaiabbMgaPbqabaWccqGGVaWlcqWGRbWAcqWGubavaaaabaGaeeyAaKMaeyypa0JaeGymaedabaGaeeOta4eaniabggHiLdGccqGHRaWkdaaeqbqaaiabbwgaLnaaCaaaleqabaGaeyOeI0IaeeyrauKaeiikaGIaee4qamKaeiykaKIaei4la8Iaem4AaSMaemivaqfaaaqaaiabboeadjabgIGiolabboeadnaaBaaameaacqaIYaGmaeqaaaWcbeqdcqGHris5aOGaeiikaGIaeeyzau2aaWbaaSqabeaacqaH7oaBcqGGVaWlcqWGRbWAcqWGubavaaGccqGHsislcqaIXaqmcqGGPaqkaaaa@8A8C@ .

Eq.(4) can, therefore, be rewritten as:

Δ Δ G per = E c − k T ln ( ∑ i = 1 N e − E i / k T + ( e λ / k T − 1 ) ∑ C ∈ C 2 e − E ( C ) / k T ∑ i = 1 N e − E i / k T ) = E c − k T ln ( 1 + ( e λ / k T − 1 ) ∑ C ∈ C 2 e − E ( C ) / k T ∑ i = 1 N e − E i / k T ) MathType@MTEF@5@5@+=feaagaart1ev2aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xI8qiVKYPFjYdHaVhbbf9v8qqaqFr0xc9vqFj0dXdbba91qpepeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGaciGaaiaabeqaaeqabiWaaaGceaabbeaacqqHuoarcqqHuoarcqWGhbWrdaWgaaWcbaGaeeiCaaNaeeyzauMaeeOCaihabeaakiabg2da9iabdweafnaaBaaaleaacqqGJbWyaeqaaOGaeyOeI0Iaem4AaSMaemivaqLagiiBaWMaeiOBa4MaeiikaGscfa4aaSaaaeaadaaeWbqaaiabbwgaLnaaCaaabeqaaiabgkHiTiabbweafnaaBaaabaGaeeyAaKgabeaacqGGVaWlcqWGRbWAcqWGubavaaaabaGaeeyAaKMaeyypa0JaeGymaedabaGaeeOta4eacqGHris5aiabgUcaRiabcIcaOiabbwgaLnaaCaaabeqaaiabeU7aSjabc+caViabdUgaRjabdsfaubaacqGHsislcqaIXaqmcqGGPaqkdaaeqbqaaiabbwgaLnaaCaaabeqaaiabgkHiTiabbweafjabcIcaOiabboeadjabcMcaPiabc+caViabdUgaRjabdsfaubaaaeaacqqGdbWqcqGHiiIZcqqGdbWqdaWgaaqaaiabikdaYaqabaaabeGaeyyeIuoaaeaadaaeWbqaaiabbwgaLnaaCaaabeqaaiabgkHiTiabbweafnaaBaaabaGaeeyAaKgabeaacqGGVaWlcqWGRbWAcqWGubavaaaabaGaeeyAaKMaeyypa0JaeGymaedabaGaeeOta4eacqGHris5aaaakiabcMcaPaqaaiabg2da9iabdweafnaaBaaaleaacqqGJbWyaeqaaOGaeyOeI0Iaem4AaSMaemivaqLagiiBaWMaeiOBa4MaeiikaGIaeGymaeJaey4kaSscfa4aaSaaaeaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaeHbbr2BIvgidf2CS9MBGaLCVbIqVXgzaGabaiaa=HcacqqGLbqzdaahaaqabeaacqaH7oaBcqGGVaWlcqWGRbWAcqWGubavaaGaeyOeI0IaeGymaeJaeiykaKYaaabuaeaacqqGLbqzdaahaaqabeaacqGHsislcqqGfbqrcqGGOaakcqqGdbWqcqGGPaqkcqGGVaWlcqWGRbWAcqWGubavaaaabaGaee4qamKaeyicI4Saee4qam0aaSbaaeaacqaIYaGmaeqaaaqabiabggHiLdaabaWaaabCaeaacqqGLbqzdaahaaqabeaacqGHsislcqqGfbqrdaWgaaqaaiabbMgaPbqabaGaei4la8Iaem4AaSMaemivaqfaaaqaaiabbMgaPjabg2da9iabigdaXaqaaiabb6eaobGaeyyeIuoaaaGccqGGPaqkcqGGUaGlaaaa@C16E@
(5)

Taylor series expansion (ln(1+x) ≈ x for |x| < 1) of Eq. (5) and multiplication of the resulting expression by 1 Q | Z | / 1 Q | Z | MathType@MTEF@5@5@+=feaagaart1ev2aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xH8viVGI8Gi=hEeeu0xXdbba9frFj0xb9qqpG0dXdb9aspeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGaciGaaiaabeqaaeqabiWaaaGcbaqcfa4aaSaaaeaacqaIXaqmaeaacqqGrbqucqGG8baFcqqGAbGwcqGG8baFaaGccqGGVaWljuaGdaWcaaqaaiabigdaXaqaaiabbgfarjabcYha8jabbQfaAjabcYha8baaaaa@3AA9@ (= 1) yields:

Δ Δ G per = E c − k T ( e λ / k T − 1 ) ∑ C ∈ C 2 e − E ( C ) / k T ∑ i = 1 N e − E i / k T = E c − k T ( ( e λ / k T − 1 ) / | Z | ) ∑ C ∈ C 2 e − E ( C ) / k T Q Q’/Q|Z| MathType@MTEF@5@5@+=feaagaart1ev2aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xI8qiVKYPFjYdHaVhbbf9v8qqaqFr0xc9vqFj0dXdbba91qpepeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGaciGaaiaabeqaaeqabiWaaaGceaabbeaacqqHuoarcqqHuoarcqWGhbWrdaWgaaWcbaGaeeiCaaNaeeyzauMaeeOCaihabeaakiabg2da9iabdweafnaaBaaaleaacqqGJbWyaeqaaOGaeyOeI0scfa4aaSaaaeaacqWGRbWAcqWGubavcqGGOaakcqqGLbqzdaahaaqabeaacqaH7oaBcqGGVaWlcqWGRbWAcqWGubavaaGaeyOeI0IaeGymaeJaeiykaKYaaabuaeaacqqGLbqzdaahaaqabeaacqGHsislcqqGfbqrcqGGOaakcqqGdbWqcqGGPaqkcqGGVaWlcqWGRbWAcqWGubavaaaabaGaee4qamKaeyicI4Saee4qam0aaSbaaeaacqaIYaGmaeqaaaqabiabggHiLdaabaWaaabCaeaacqqGLbqzdaahaaqabeaacqGHsislcqqGfbqrdaWgaaqaaiabbMgaPbqabaGaei4la8Iaem4AaSMaemivaqfaaaqaaiabbMgaPjabg2da9iabigdaXaqaaiabb6eaobGaeyyeIuoaaaaakeaacqGH9aqpcqWGfbqrdaWgaaWcbaGaee4yamgabeaakiabgkHiTKqbaoaalaaabaGaem4AaSMaemivaqLaeiikaGIaeiikaGIaeeyzau2aaWbaaeqabaGaeq4UdWMaei4la8Iaem4AaSMaemivaqfaaiabgkHiTiabigdaXiabcMcaPiabc+caViabcYha8jabbQfaAjabcYha8jabcMcaPmaaqafabaWaaSaaaeaacqqGLbqzdaahaaqabeaacqGHsislcqqGfbqrcqGGOaakcqqGdbWqcqGGPaqkcqGGVaWlcqWGRbWAcqWGubavaaaabaGaeeyuaefaaaqaaiabboeadjabgIGiolabboeadnaaBaaabaGaeGOmaidabeaaaeqacqGHris5aaqaaiabbgfarjabbMbiskabb+caViabbgfarjabbYha8jabbQfaAjabbYha8baaaaaa@98C8@
(6)

The Boltzmann weighted contact frequency, BWCF(i, j), is defined as: ( ∑ C ∈ Z e − E ( C ) / kT Q δ ( | r i − r j | c ) ) / | Z | MathType@MTEF@5@5@+=feaagaart1ev2aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xH8viVGI8Gi=hEeeu0xXdbba9frFj0xb9qqpG0dXdb9aspeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGaciGaaiaabeqaaeqabiWaaaGcbaGaeiikaGYaaabuaKqbagaadaWcaaqaaiabdwgaLnaaCaaabeqaaiabgkHiTiabbweafjabcIcaOiabboeadjabcMcaPiabc+caViabbUgaRjabbsfaubaaaeaacqqGrbquaaaaleaacqqGdbWqcqGHiiIZcqqGAbGwaeqaniabggHiLdGccqaH0oazcqGGOaakdaabdaqaaiabbkhaYnaaBaaaleaacqqGPbqAaeqaaOGaeyOeI0IaeeOCai3aaSbaaSqaaiabbQgaQbqabaaakiaawEa7caGLiWoadaWgaaWcbaGaee4yamgabeaakiabcMcaPiabcMcaPiabc+caViabcYha8jabbQfaAjabcYha8baa@52EB@ , where i and j are two positions in the sequence and each occurrence of a contact is multiplied by the Boltzmann weight of the conformation (C) in which it occurs. Hence, inspection of Eq. (6) shows that plots of the perturbation energy (or coupling energy) as a function of BWCF(i, j) are expected to be approximately linear with a slope that is a function of λ.

Results and discussion

DMC have been used extensively to determine experimentally the strengths of various pairwise interactions in proteins [2]. Here, DMC were invoked in order to evaluate, for the first time to the best of our knowledge, coupling energies between all possible pairs of positions in 2D and 3D lattice models of proteins (Figure 1). Evidence for correlations between distant sites in lattice models has been reported before in the context of protein aggregation [27]. The distributions of the values of the coupling energies for all possible pairs of positions in the different native states of 10 sequences with 16 residues on a 2D lattice with full enumeration and 10 sequences with 27 residues on a 3 × 3 × 3 3D lattice are shown in Figure 2a and 2b, respectively. It can be seen that the values of the coupling energies for pairs in contact are mostly negative whereas the values of the coupling energies for pairs that are not in contact are mostly (but not exclusively) positive and smaller in absolute terms. Pairs that are in contact in a given native conformation could, therefore, be identified with high confidence using this procedure.

Figure 1
figure 1

Scheme of a double-mutant cycle for a 2D lattice model protein. Two residues, i and j, are mutated (the mutations are designated by B on a dark background) either singly or in combination. ΔG(i,j→B,j) and ΔG(i,B→B,B) are the respective free energy changes upon mutation of residue i when residue j is present and when it has also been mutated. If these free energy changes are equal to each other then residues i and j are not coupled. Otherwise, residues i and j are energetically coupled. The same is true for the difference between the free energy changes ΔG(i,j→i,B) and ΔG(B,j→B,B). In this scheme, residues i and j form a direct contact in the native structure of the wild-type sequence. The double-mutant cycle method can be applied, however, also for residues that are distant in space in the native structure as carried out in the paper.

Figure 2
figure 2

Distributions of the values of the pairwise coupling energies for all possible pairs of positions in sequences with different native states on 2D and 3D lattices. The values of the coupling energies for all possible pairs of positions in 10 sequences of 16 residues with different native states on a 2D lattice with full enumeration (a) and 10 sequences of 27 residues with different native states on a 3 × 3 × 3 3D lattice (b) were calculated. The distributions of the values of the pairwise coupling energies for positions in contact and not in contact in these native conformations are shown by filled and empty bars, respectively.

The fraction of conformations in the ensemble in which residues at two positions in a sequence are in contact is termed the 'contact frequency'. The 'contact frequency' is not defined for pairs of consecutive positions in a sequence since the interaction energy of such pairs is by definition zero (see Eq. (1)). It is also not defined for pairs of even or odd positions in a sequence since they cannot interact on a square or cubic lattice and, thus, have a contact-frequency of zero. Therefore, only pairs of residues with non-zero values of contact-frequency are considered here. A more accurate measure of the frequency of a contact in a conformational ensemble is the 'Boltzmann weighted contact frequency', BWCF, where the occurrence of each contact is multiplied by the Boltzmann weight of the conformation (C) in which it is found (see Theory). In the Theory section it was shown that the strength of a pairwise interaction is expected to have a linear dependence on its BWCF. Such linear plots of different measures of the strength of pairwise interactions as a function of BWCF are depicted in Figure 3 for several representative examples of lattice models.

Figure 3
figure 3

Plots of different measures of the strength of pairwise interactions as a function of measures of contact frequency for several representative examples of lattice models. In panels a and b, the average coupling energies, <ΔΔGint>, of all the pairs in contact (a) and not in contact (b) are plotted against their respective average BWCF in the case of a set of sequences with 30 residues that have the same native conformation on a lattice of 5 × 6. In panels c and d, the coupling (c) and perturbation (d) energies, ΔΔGint and ΔΔGper, are plotted against the BWCF for all the pairs of positions not in contact in the case of a sequence with L = 20 on a 2D lattice with full enumeration. In panels e and f, the coupling (e) and perturbation (f) energies are plotted against the BWCF for all the pairs of positions not in contact in the case of a sequence with L = 27 on a 3 × 3 × 3 cubic lattice. The data in panels c-f corresponding to different values of λ are color coded, as follows: λ = -1.25, red; λ = -1, blue; λ = -0.75, green; λ = -0.25, magenta; λ = 0, black; λ = 1, cyan.

In the first example (Figure 3a and 3b), a set of sequences with a length, L, of 30 residues that have the same native structure was constructed (such structure-based sequence sets are designated SBSS) and the coupling energy was determined for every possible pair of positions in each sequence. The average value of the coupling energy for each pair of positions in the SBSS was then calculated in order to improve the signal-to-noise ratio. In this example, only conformations that fit into a 5 × 6 lattice were considered. It may be seen that a strong linear correlation is found between the average coupling energy for each pair of positions in the SBSS and the corresponding average BWCF index. This correlation holds for pairs of residues that form native contacts (Figure 3a, r = 0.78; P-value = 5.5 × 10-5) and also, surprisingly, for pairs of residues that are not in contact in this particular native conformation (Figure 3b, r = 0.87; P-value = 1.3 × 10-55). Such linear correlations (with average correlation coefficients of about 0.84 (± 0.05) for the non-contacting pairs and 0.62 (± 0.15) for the pairs in contact) were also found for SBSS that correspond to 8 other native conformations (i.e. 2 SBSS for sequences with L = 30 on a 5 × 6 lattice, 4 SBSS for sequences with L = 25 on a 5 × 5 lattice and 2 SBSS for sequences with L = 25 on a 5 × 6 lattice) when only the conformations that fit into the lattice were considered.

In the second example, the coupling (Figure 3c) and perturbation (Figure 3d) energies for all residue pairs not in contact in the native state of a sequence with L = 20 on a 2D lattice are plotted as a function of their BWCF. Here, values of the BWCF were calculated for the entire conformational ensemble (|Z| = 41,889,578) and not just for the relatively compact states as in Figure 3a and 3b. The color-coding designates the different contacts that have a given value of λ (Table 1). It may be seen (Figure 3d) that almost perfect correlations (r ≈ 1) are found between the perturbation energies and the BWCF for each given value of λ as predicted by Eq. (6). The correlations between the coupling energies and the BWCF for each given value of λ (except for λ = 0) are also excellent (Figure 3c, r > 0.97; P-value < 10-6) but not perfect as those in Figure 3d for the perturbation energies. Plots for residue pairs in contact in the native state are not shown since the number of such pairs is small and the correlations are, thus, not significant.

In the third example, the coupling (Figure 3e) and perturbation (Figure 3f) energies for all residue pairs not in contact in the native state of a sequence with L = 27 on a 3 × 3 × 3 lattice are plotted as a function of their BWCF. Here, too, almost perfect correlations (r ≈ 1) are found between the perturbation energies and the BWCF for each given value of λ (Figure 3f) whereas the correlations for the coupling energies (Figure 3e) are excellent (r = 0.92, 0.85, 0.92, 0.58 and 0.97 for λ values of -1.25, -1, -0.75, -0.25 and 1, respectively, with P-values < 2 × 10-3 except for λ = -1.25 where the number of data points, n, is small (n = 4)) but not perfect as those in Figure 3f. In summary, therefore, the data depicted in Figure 3 for different types of lattice models (2D or 3D lattices with or without full enumeration of all the conformational states in the ensemble and for single sequences or averaged for a SBSS) support the general result described by Eq. (6) that the free energies of both direct (in contact in the native state) and indirect pairwise interactions are linearly dependent on their Boltzmann-weighted contact frequencies. It should be pointed out that only weak or no correlations are observed when pairwise energies taken directly from Table 1 are plotted against the BWCF, thereby providing further justification for the approach in this study that is based on the coupling or perturbation energies. The correlations in Figure 3 indicate that rare native contacts have more negative coupling energies than abundant native contacts. Likewise, rare non-contacting pairs have less positive coupling energies than abundant non-contacting pairs. Therefore, one may infer that native states can be stabilized by stabilizing contacts with low contact-frequency and destabilizing non-contacting pairs with a high contact-frequency.

Given that the interaction energy of a sequence in a specific lattice conformation is calculated by summing over all pairwise interactions between neighboring lattice points, it may seem surprising that non-direct interactions with significant positive coupling energies are found to exist (Figure 3). However, it has been pointed out that the strengths of pairwise interactions in the native state determined by DMC are always relative to the unfolded state [28]. Hence, the positive coupling energies observed here in the case of non-contacting pairs reflect, to a large extent, pairwise interactions in the non-native conformations in the ensemble. Surprisingly, however, positive coupling energies are also observed in the case of residue pairs such as P, H that have interaction energies of zero (Table 1) and, therefore, should not be coupled even when they are in contact in non-native conformations. These non-zero coupling energies arise owing to non-additivity in entropy calculations [29].

The correlations shown in Figure 3 can be understood more intuitively by considering several extreme cases and keeping in mind that the free energy of the native state is a function of both the energy of the native conformation and the energies of all the other non-native conformations in the ensemble (see Eq. (3)). For simplicity, the Boltzmann weights of the different states will be neglected in the discussion that follows and we will, therefore, refer to the contact-frequency (and not the BWCF) of residue pairs. The following four extreme cases of perturbations will be considered: (i) elimination of a native contact with a contact-frequency of 1/|Z|; (ii) elimination of a native contact with a contact-frequency that approaches one; (iii) elimination of a non-native contact with a contact frequency of 1/|Z|; and (iv) elimination of a non-native contact with a contact-frequency that approaches one.

In the first case, a contact that exists only in the native state is perturbed and, therefore, only the energy of the native state is affected. Hence, the gap between the energy of the native conformation and the energies of the non-native conformations is reduced (Figure 4, case (i)). Such a perturbation reduces ΔH by the value of the contact energy, Ec, and has no effect on ΔS (which is a function of the sum, Q', over all the non-native states). The perturbation energy, ΔΔGper, in this case is, therefore, equal to Ec.

Figure 4
figure 4

Effects of different perturbations on the energy spectrum of the native state and the ensemble of non-native conformations. The effects of four different extreme cases of perturbations are depicted. In case (i), a native contact with a contact-frequency of 1/|Z| (|Z| is the ensemble size) is eliminated, thereby causing the energy of the native state, En, to increase to E'n but not affecting the energies of the non-native states. The gap between the energy of the native state and the energies of all the non-native states is, therefore, reduced by E'n-En. In case (ii), a native contact with a contact-frequency value that approaches one is eliminated, thereby causing the energies of the native state and most of the non-native states to increase by E'n-En without changing the energy gap. In case (iii), a non-native contact with a contact frequency of 1/|Z| is eliminated without changing the energy gap as there is no change in the energies of the native state and most of the non-native states. In case (iv), a non-native contact with a contact-frequency value that approaches one is eliminated, thereby increasing the energies of most of the non-native states and also the gap in energy between these states and the native state.

In the second case, a contact that exists in both the native state and in most of the non-native states is perturbed and, therefore, the gap between the energy of the native conformation and the energies of the non-native conformations hardly changes (Figure 4, case (ii)). In this case, Q'm/Q'wt < 1 and the perturbation energy, ΔΔGper = Ec - kT ln(Q'm/Q'wt), therefore, increases (note that Ec is negative) in accordance with the plot in Figure 3a. Native contacts with a low contact-frequency, therefore, contribute more than those with a large contact-frequency to the gap between the energy of the native state and the energies of the non-native conformations, thereby explaining why they have more negative coupling energies (Figure 3a).

In the third case of a perturbation of a non-native contact with a low contact frequency, it is clear that the energies of the native state and most of the non-native states do not change and, therefore, the energy gap also remains unchanged (Figure 4, case (iii)). In the fourth case of a perturbation of a non-native contact with a high contact-frequency, most of the non-native conformations are destabilized but the energy of the native state is not affected and the gap between the energy of the native conformation and the energies of the non-native conformations, therefore, becomes larger (Figure 4, case (iv)). In cases such as (iii) and (iv), when a pairwise interaction between residues that are not in contact in the native state is removed, there is no effect on ΔH and the perturbation energy is given by: ΔΔGper = - kT ln(Q'm/Q'wt). If the contact-frequency of the removed interaction is low (case (iii)), then Q'm ≈ Q'wt and the perturbation energy will be equal to approximately zero. If the contact-frequency of the removed interaction is high (case (iv)), then Q'm/Q'wt < 1 and the value of the perturbation energy will increase in accordance with the plots in Figure 3. Non-native contacts with a high contact-frequency, therefore, contribute more than those with a low contact-frequency to the gap between the energy of the native state and the energies of the non-native conformations, thereby explaining why they have more positive coupling energies (Figure 3). The effects shown schematically in Figure 4 almost always result in an increase of the energy of either the native state (case (i)), the non-native states (case (iv)) or both (case (ii)) since non-favorable pairwise interactions (Table 1) are rare given the amino acid composition we used. It is clear, however, that protein evolution might favor non-favorable interactions in non-native conformations that would destabilize them relative to the native state. Such an evolutionary process termed 'negative design' [30–32] would be reflected in negative (favorable) coupling energies between residues that are not in contact in the native state.

How important is contact-frequency for protein stability? In order to obtain some insight into this question, we compared the stabilization achieved when optimizing a sequence for a particular native conformation using two different functions: (i) F1 (Eq. (8) that minimizes the energy of native contacts and maximizes the energy of non-native contacts ('negative design'); and (ii) F2 (Eq. (9) in which the contributions of native and non-native contacts is weighted by their contact-frequency. Both functions have an adjustable parameter, Wc, which determines the relative weight of the contributions of the native vs. non-native interactions to stability. It can be seen (Figure 5) that for sequences with L = 30 on a 5 × 6 lattice, greater stability is achieved when contact-frequency is taken into account across the entire range of Wc values. Similar results were obtained in cases of other lattice dimensions and sequence lengths when only the most compact conformations were considered. A more general scoring function will be needed for efficient design when the entire conformational space is considered.

Figure 5
figure 5

Stabilization of 2D-lattice model proteins by taking into consideration the contact frequency of residue pairs in contact and not in contact in the native state. The average free energy of folding of 100 sequences designed either with (○) or without (△) taking into account the contact frequency is plotted against the value of the contact weight, Wc (see Eqs. (8) and (9), used in the design. The results shown here are for sequences with L = 30 on a 5 × 6 lattice. Similar results were obtained in cases of other lattice dimensions and sequence lengths when only the most compact conformations were considered. For more details, see Methods.

Conclusion

It is shown in this study that long-range pairwise interactions are also present in simple lattice models of proteins despite the fact that the interaction energy of a sequence in a specific conformation is based solely on direct interactions (Eq. (1)). Double-mutant cycle analysis of these lattice models and a mathematical analysis show that the strength of both direct and indirect native interactions increases (i.e. their coupling free energy becomes more negative) in a linear fashion with decreasing contact-frequency that is an entropic term. Hence, proteins can be stabilized by introducing (or stabilizing) contacts in the native state with a low contact-frequency and removing (or destabilizing) contacts in non-native states with a high contact-frequency, as shown in Figure 5. Although manifestations of the latter strategy of 'negative design' have been recognized before [32] it has not been fully appreciated how the choice of interactions to be introduced (stabilized) or removed (destabilized) affects the extent of stabilization. Our findings are not dependent on sequence length and lattice dimensions that determine the conformational ensemble size and are, thus, likely to be relevant to the selection of folding pathways, folding rates and the design of real proteins. It may be possible to implement our findings using ensembles that are derived computationally (such as with COREX [33]) before experimentally characterized conformational ensembles become available. The new approach described here, that involves combining DMC analysis with lattice models, may also pave the way for a rigorous analysis of other complex aspects of protein behavior. For example, simulation of protein evolution by subjecting lattice models to rounds of mutagenesis followed by selection can be used to assess the contribution of correlated mutations at distant positions to protein folding, stability and allosteric communication. Employing lattice models to address this issue has the distinct advantage that it renders possible separating between correlated mutations due to common ancestry and those due to biophysical factors. Such studies may reveal relationships between contact-frequency, correlated mutations and other properties of proteins such as contact-order [34].

Methods

The lattice model of proteins

2D or 3D lattice models that are similar to the one described by Jacob and Unger [35] were used. In brief, the protein sequence consists of an alphabet of five amino acids: hydrophobic (H), neutral polar (P), positively charged (+), negatively charged (-) and blank (B) for the use of mutations. The pairwise interaction energies (eij) are taken from Table 1 and reflect in a qualitative manner the strengths of interactions between different types of amino acids. Similar results were obtained using other contact interaction matrices. The energies of all possible conformations of a given sequence on a particular lattice were calculated and the conformation with the lowest energy, if a single such one exists, was considered as its native conformation. A value of 1 was used for kT. It is important to note that the size of the ensemble, |Z|, is determined by the lattice dimensions and the same conformation of a given sequence may, therefore, have different values of ΔG due to different lattice dimensions.

Sequences of length (L) 16, 20, 25 and 30 were used for the 2D models and sequences with L = 27 for the 3D models. In the case of sequences with L = 16 or 20, all the respective 802,075 and 41,889,578 non-symmetric conformations were enumerated. In the case of sequences with L = 25 or L = 30 where the total number of conformations is too large to enumerate, we considered only the conformations that could be fitted into 5 × 5 or 5 × 6 lattices. Likewise, only the conformations that could be fitted into a 3 × 3 × 3 lattice were considered in the case of the 3D lattice models for sequences with L = 27. The numbers of all compact non-symmetric conformations of sequences with L = 25 on 5 × 5 and 5 × 6 lattices are 1081 and 377,779, respectively. The numbers of all compact non-symmetric conformations of sequences with L = 30 on a 5 × 6 lattice and L = 27 on a 3 × 3 × 3 lattice are 6431 and 103,346, respectively. The sequences were generated by random rearrangements of L residues with compositions of 44% H, 31% P, 12.5% (+) and 12.5% (-) in the case of sequences with L = 16, 40% H, 28% P, 16% (+) and 16% (-) in the case of sequences with L = 25, 42% H, 30% P, 14% (+) and 14% (-) in the case of sequences with L = 30 and 40% H, 30% P, 15% (+) and 15% (-) in the case of sequences with L = 20 or 27 (these compositions correspond roughly to those in the PDB).

Generation of structure-based sequence sets (SBSS)

SBSS that contained more than 40 different sequences of the same length and with the same native conformation were generated. These SBSS have a mean sequence identity that is only between 0.29–0.34 since (as described above) the sequences were generated by random rearrangements and, thus, represent a random sample of sequence space. Nine different SBSS corresponding to different native conformations were examined.

Calculation of coupling energies using double-mutant cycles

The strength of a pairwise interaction between residues i and j in the native conformation of a given sequence was evaluated by constructing a DMC that comprises the original wild-type sequence, two single mutants in which either residue i or j are replaced with the blank (B) residue and the corresponding double mutant in which both residues are replaced with this residue. The blank residue corresponds to alanine which is usually chosen as a reference state in experimental DMC since it is assumed that (i) replacement by this residue tends, in general, to reduce structural perturbations upon mutation and that (ii) interactions between alanine at one position and any other type of residue at the second position are minimal. The coupling energy, ΔΔGint, which is a measure of the strength of the pairwise interaction between residues i and j was calculated, as follows:ΔΔGint = ΔGi,j - ΔGi,B - ΔGB,j + ΔGB,B

where ΔGi,j, ΔGi,B, ΔGB,j and ΔGB,B are the respective free energies of folding of the wild-type protein, the two single mutants and the double mutant that are calculated using Eq. (2). The coupling energy is equal to the difference in the free energies of two parallel processes in the cycle, Δ G(i,j→B,j) and ΔG(i,B→B,B), that correspond to the effect of mutating residue i (or j) when the other residue is present or absent, respectively (Figure 1). In these calculations, negative and positive coupling energies reflect interactions that stabilize or destabilize the native state, respectively. We implemented such an experiment for each given pair of positions so that a coupling energy could be calculated for every possible pair of positions in each sequence.

Calculation of perturbation energies

We also calculated a perturbation energy, ΔΔGper, = ΔGwt - ΔGm, for every possible pair of positions in each sequence where ΔGwt and ΔGm are the respective free energies of the wild-type native conformation before and after a particular pairwise interaction is 'turned off' but without affecting any other interactions. Under ideal circumstances [18], the coupling energy, which can be determined experimentally or calculated as described above, provides a good estimate of the perturbation energy that can only be determined by computation.

Contact frequency-based protein stabilization

Sequences with a specific native conformation were generated by a Monte Carlo (MC) process that maximizes two design scores, F1 and F2, that either ignore the contact frequency or take it into account, respectively. The expressions for the scores are:

F 1 = W c 1 N c ∑ c e − E C + ( 1 − W c ) 1 N non ∑ non e + E non MathType@MTEF@5@5@+=feaagaart1ev2aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xI8qiVKYPFjYdHaVhbbf9v8qqaqFr0xc9vqFj0dXdbba91qpepeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGaciGaaiaabeqaaeqabiWaaaGcbaGaeeOray0aaSbaaSqaaiabigdaXaqabaGccqGH9aqpcqqGxbWvdaWgaaWcbaGaee4yamgabeaajuaGdaWcaaqaaiabigdaXaqaaiabb6eaonaaBaaabaGaee4yamgabeaaaaGcdaaeqbqaaiabbwgaLnaaCaaaleqabaGaeyOeI0Iaeeyrau0aaSbaaWqaaiabboeadbqabaaaaaWcbaGaee4yamgabeqdcqGHris5aOGaey4kaSIaeiikaGIaeGymaeJaeyOeI0Iaee4vaC1aaSbaaSqaaiabbogaJbqabaGccqGGPaqkjuaGdaWcaaqaaiabigdaXaqaaiabb6eaonaaBaaabaGaeeOBa4Maee4Ba8MaeeOBa4gabeaaaaGcdaaeqbqaaiabbwgaLnaaCaaaleqabaGaey4kaSIaeeyrau0aaSbaaWqaaiabb6gaUjabb+gaVjabb6gaUbqabaaaaaWcbaGaeeOBa4Maee4Ba8MaeeOBa4gabeqdcqGHris5aaaa@5AB0@
(8)
F 2 = W c 1 N c ∑ c ( 1 − f c ) e − E C + ( 1 − W c ) 1 N non ∑ non f c e + E non MathType@MTEF@5@5@+=feaagaart1ev2aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xI8qiVKYPFjYdHaVhbbf9v8qqaqFr0xc9vqFj0dXdbba91qpepeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGaciGaaiaabeqaaeqabiWaaaGcbaGaeeOray0aaSbaaSqaaiabikdaYaqabaGccqGH9aqpcqqGxbWvdaWgaaWcbaGaee4yamgabeaajuaGdaWcaaqaaiabigdaXaqaaiabb6eaonaaBaaabaGaee4yamgabeaaaaGcdaaeqbqaaiabcIcaOiabigdaXiabgkHiTiabbAgaMnaaBaaaleaacqqGJbWyaeqaaOGaeiykaKIaeeyzau2aaWbaaSqabeaacqGHsislcqqGfbqrdaWgaaadbaGaee4qameabeaaaaaaleaacqqGJbWyaeqaniabggHiLdGccqGHRaWkcqGGOaakcqaIXaqmcqGHsislcqqGxbWvdaWgaaWcbaGaee4yamgabeaakiabcMcaPKqbaoaalaaabaGaeGymaedabaGaeeOta40aaSbaaeaacqqGUbGBcqqGVbWBcqqGUbGBaeqaaaaakmaaqafabaGaeeOzay2aaSbaaSqaaiabbogaJbqabaGccqqGLbqzdaahaaWcbeqaaiabgUcaRiabbweafnaaBaaameaacqqGUbGBcqqGVbWBcqqGUbGBaeqaaaaaaSqaaiabb6gaUjabb+gaVjabb6gaUbqab0GaeyyeIuoaaaa@63ED@
(9)

where Wc is the contact weight, Nc and Nnon are the total number of contacts and non-contacts in the specific conformation, respectively, and fc is the contact-frequency. The values of Wc were varied between 0.05–0.95. For each value of Wc, 100 designed sequences were generated in 10,000 MC steps and the average free energy of folding was then calculated.

References

  1. Perutz MF: Mechanisms of co-operativity and allosteric regulation in proteins. Q Rev Biophys 1989, 22: 139–236. 10.1017/S0033583500003826

    Article  CAS  PubMed  Google Scholar 

  2. Horovitz A: Double-mutant cycles: a powerful tool for analysing protein structure and function. Fold & Des 1996, 1: R121-R126. 10.1016/S1359-0278(96)00056-9

    Article  CAS  Google Scholar 

  3. LiCata VJ, Ackers GK: Long-range, small magnitude nonadditivity of mutational effects in proteins. Biochemistry 1995, 34: 3133–3139. 10.1021/bi00010a001

    Article  CAS  PubMed  Google Scholar 

  4. Clarkson MW, Gilmore SA, Edgell MH, Lee AL: Dynamic coupling and allosteric behavior in a nonallosteric protein. Biochemistry 2006, 45: 7693–7699. 10.1021/bi060652l

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  5. Göbel U, Sander C, Schneider R, Valencia A: Correlated mutations and residue contacts in proteins. Proteins: Struct Funct Genet 1994, 18: 309–317. 10.1002/prot.340180402

    Article  Google Scholar 

  6. Neher E: How frequent are correlated changes in families of protein sequences? Proc Natl Acad Sci USA 1994, 91: 98–102. 10.1073/pnas.91.1.98

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  7. Lockless SW, Ranganathan R: Evolutionarily conserved pathways of energetic connectivity in protein families. Science 1999, 286: 295–299. 10.1126/science.286.5438.295

    Article  CAS  PubMed  Google Scholar 

  8. Kass I, Horovitz A: Mapping pathways of allosteric communication in GroEL by analysis of correlated mutations. Proteins: Struct Funct Genet 2002, 48: 611–617. 10.1002/prot.10180

    Article  CAS  Google Scholar 

  9. Dima RI, Thirumalai D: Determination of network of residues that regulate allostery in protein families using sequence analysis. Protein Sci 2006, 15: 258–268. 10.1110/ps.051767306

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  10. Ichiye T, Karplus M: Collective motions in proteins: a covariance analysis of atomic fluctuations in molecular dynamics and normal mode simulations. Proteins: Struct Funct Genet 1991, 11: 205–217. 10.1002/prot.340110305

    Article  CAS  Google Scholar 

  11. Rod TH, Radkiewicz JL, Brooks CL III: Correlated motion and the effect of distal mutations in dihydrofolate reductase. Proc Natl Acad Sci USA 2003, 100: 6980–6985. 10.1073/pnas.1230801100

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  12. Chennubhotla C, Rader AJ, Yang LW, Bahar I: Elastic network models for understanding biomolecular machinery: from enzymes to supramolecular assemblies. Phys Biol 2005, 2: S173-S180. 10.1088/1478-3975/2/4/S12

    Article  CAS  PubMed  Google Scholar 

  13. Ma J: Usefulness and limitations of normal mode analysis in modeling dynamics of biomolecular complexes. Structure 2005, 13: 373–380. 10.1016/j.str.2005.02.002

    Article  CAS  PubMed  Google Scholar 

  14. Pollock DD, Taylor WR, Goldman N: Coevolving protein residues: maximum likelihood identification and relationship to structure. J Mol Biol 1999, 287: 187–198. 10.1006/jmbi.1998.2601

    Article  CAS  PubMed  Google Scholar 

  15. Wollenberg KR, Atchley WR: Separation of phylogenetic and functional associations in biological sequences by using the parametric bootstrap. Proc Natl Acad Sci USA 2000, 97: 3288–3291. 10.1073/pnas.070154797

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  16. Larson SM, Di Nardo AA, Davidson AR: Analysis of covariation in an SH3 domain sequence alignment: applications in tertiary contact prediction and the design of compensating hydrophobic core substitutions. J Mol Biol 2000, 303: 433–446. 10.1006/jmbi.2000.4146

    Article  CAS  PubMed  Google Scholar 

  17. Noivirt O, Eisenstein M, Horovitz A: Detection and reduction of evolutionary noise in correlated mutation analysis. Protein Eng Des Sel 2005, 18: 247–253. 10.1093/protein/gzi029

    Article  CAS  PubMed  Google Scholar 

  18. Serrano L, Horovitz A, Avron B, Bycroft M, Fersht AR: Estimating the contribution of engineered surface electrostatic interactions to protein stability by using double-mutant cycles. Biochemistry 1990, 29: 9343–9352. 10.1021/bi00492a006

    Article  CAS  PubMed  Google Scholar 

  19. Sali A, Shakhnovich E, Karplus M: How does a protein fold? Nature 1994, 369: 248–251. 10.1038/369248a0

    Article  CAS  PubMed  Google Scholar 

  20. Hinds DA, Levitt M: Exploring conformational space with a simple lattice model for protein structure. J Mol Biol 1994, 243: 668–682. 10.1016/0022-2836(94)90040-X

    Article  CAS  PubMed  Google Scholar 

  21. Dill KA, Bromberg S, Yue K, Fiebig KM, Yee DP, Thomas PD, Chan HS: Principles of protein folding-a perspective from simple exact models. Protein Sci 1995, 4: 561–602.

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  22. Onuchic JN, Socci ND, Luthey-Schulten Z, Wolynes PG: Protein folding funnels: the nature of the transition state ensemble. Fold & Des 1996, 1: 441–450. 10.1016/S1359-0278(96)00060-0

    Article  CAS  Google Scholar 

  23. Unger R, Moult J: Local interactions dominate folding in a simple protein model. J Mol Biol 1996, 259: 988–994. 10.1006/jmbi.1996.0375

    Article  CAS  PubMed  Google Scholar 

  24. Govindarajan S, Goldstein RA: On the thermodynamic hypothesis of protein folding. Proc Natl Acad Sci USA 1998, 95: 5545–5549. 10.1073/pnas.95.10.5545

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  25. Mirny L, Shakhnovich E: Protein folding theory: from lattice to all-atom models. Annu Rev Biophys Biomol Struct 2001, 30: 361–396. 10.1146/annurev.biophys.30.1.361

    Article  CAS  PubMed  Google Scholar 

  26. Vendruscolo M, Mirny LA, Shakhnovich EI, Domany E: Comparison of two optimization methods to derive energy parameters for protein folding: perceptron and Z score. Proteins: Struct Funct Genet 2000, 41: 192–201. Publisher Full Text 10.1002/1097-0134(20001101)41:2<192::AID-PROT40>3.0.CO;2-3

    Article  CAS  Google Scholar 

  27. Bratko D, Blanch HW: Effect of secondary structure on protein aggregation: A replica exchange simulation study. J Chem Phys 2003, 118: 5185–5194. 10.1063/1.1546429

    Article  CAS  Google Scholar 

  28. Horovitz A, Fersht AR: Strategy for analysing the co-operativity of intramolecular interactions in peptides and proteins. J Mol Biol 1990, 214: 613–617. 10.1016/0022-2836(90)90275-Q

    Article  CAS  PubMed  Google Scholar 

  29. Mark AE, van Gunsteren WF: Decomposition of the free energy of a system in terms of specific interactions. Implications for theoretical and experimental studies. J Mol Biol 1994, 240: 167–176. 10.1006/jmbi.1994.1430

    Article  CAS  PubMed  Google Scholar 

  30. Hecht MH, Richardson JS, Richardson DC, Ogden RC: De novo design, expression, and characterization of Felix: a four-helix bundle protein of native-like sequence. Science 1990, 249: 884–891. 10.1126/science.2392678

    Article  CAS  PubMed  Google Scholar 

  31. Hellinga HW: Rational protein design: combining theory and experiment. Proc Natl Acad Sci USA 1997, 94: 10015–10017. 10.1073/pnas.94.19.10015

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  32. Berezovsky IN, Zeldovich KB, Shakhnovich EI: Positive and negative design in stability and thermal adaptation of natural proteins. PLoS Comput Biol 2007, 3: 498–507. 10.1371/journal.pcbi.0030052

    Article  CAS  Google Scholar 

  33. Vertrees J, Barritt P, Whitten S, Hilser VJ: COREX/BEST server: a web browser-based program that calculates regional stability variations within protein structures. Bioinformatics 2005, 21: 3318–3319. 10.1093/bioinformatics/bti520

    Article  CAS  PubMed  Google Scholar 

  34. Plaxco KW, Simons KT, Baker D: Contact order, transition state placement and the refolding rates of single domain proteins. J Mol Biol 1998, 277: 985–994. 10.1006/jmbi.1998.1645

    Article  CAS  PubMed  Google Scholar 

  35. Jacob E, Unger R: A tale of two tails: why are terminal residues of proteins exposed? Bioinformatics 2007, 23: e225-e230. 10.1093/bioinformatics/btl318

    Article  CAS  PubMed  Google Scholar 

Download references

Acknowledgements

We thank Profs. Gilad Haran, Michael Levitt and John Moult for useful comments on an earlier draft of this paper and Etai Jacob for providing us with source codes for lattice model calculations. This work was supported by grant 1339/08 of the Israel Science Foundation to R.U. O.N.-B. was supported in part by a Fellowship from the Kahn Family Research Center for Systems Biology of the Human Cell and the Kimmelman Center for Macromolecular Assembly.

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Ron Unger or Amnon Horovitz.

Additional information

Authors' contributions

ON carried out all the calculations. AH wrote the paper. All the authors analysed the data, helped draft the paper and read and approved the final manuscript.

Authors’ original submitted files for images

Rights and permissions

Open Access This article is published under license to BioMed Central Ltd. This is an Open Access article is distributed under the terms of the Creative Commons Attribution License ( https://creativecommons.org/licenses/by/2.0 ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Reprints and permissions

About this article

Cite this article

Noivirt-Brik, O., Unger, R. & Horovitz, A. Analysing the origin of long-range interactions in proteins using lattice models. BMC Struct Biol 9, 4 (2009). https://doi.org/10.1186/1472-6807-9-4

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1186/1472-6807-9-4

Keywords