Abstract
Backgrounds
It is increasingly recognized that protein functions often require intricate conformational dynamics, which involves a network of key amino acid residues that couple spatially separated functional sites. Tremendous efforts have been made to identify these key residues by experimental and computational means.
Results
We have performed a largescale evaluation of the predictions of dynamically important residues by a variety of computational protocols including three based on the perturbation and correlation analysis of a coarsegrained elastic model. This study is performed for two lists of test cases with >500 pairs of protein structures. The dynamically important residues predicted by the perturbation and correlation analysis are found to be strongly or moderately conserved in >67% of test cases. They form a sparse network of residues which are clustered both in 3D space and along protein sequence. Their overall conservation is attributed to their dynamic role rather than ligand binding or high network connectivity.
Conclusion
By modeling how the protein structural fluctuations respond to residuepositionspecific perturbations, our highly efficient perturbation and correlation analysis can be used to dissect the functional conformational changes in various proteins with a residue level of detail. The predictions of dynamically important residues serve as promising targets for mutational and functional studies.
Background
Protein conformational dynamics [1,2] is critically involved in many biochemical processes ranging from catalysis [3] to allostery [49] and signal transduction [10]. Protein dynamics spans a wide range of time scales (from picoseconds to seconds or minutes). Biologically relevant conformational motions of proteins are often collective (for example in the form of hingebending or shearing motions between rigid domains, see [11]). These highly coordinated motions are thought to involve a network of key amino acid residues that couple spatially separated functional sites [9]. The conservation and variation of protein functions are likely underscored by the conservation and coevolution of these dynamically important residues. The existence of a sparse network of allosterically coupled residues in various proteins has been revealed by the statistical coupling (or correlated mutation) analysis based on multiple sequence alignment (see [1214]). The discovery of dynamically important residues will lead to the following important applications: first, facilitate the mechanistic studies of a variety of biomolecular systems whose conformational dynamics plays a key role in function [1]; second, enable drug design that dynamically alters target proteins via small molecule binding [1518]; third, guide the engineering of new molecular devices with novel dynamic properties [19,20].
In complement with experimental efforts for probing protein dynamics at atomic resolution (such as NMR, see [21,22]; and timeresolved Xray crystallography, see [23]), structurebased computer simulations have promised to elucidate the fine details of protein conformational motions. When multiple crystal structures are available for a protein at different states, structural analysis can identify those residues involved in the functional conformational changes between these states (such as local motions in allosteric proteins, see [24]; conformational changes due to binding of small molecules and other proteins, see [25,26]). In one study, the analysis of local structural changes (such as changes in the pseudobond angles and pseudodihedral angles along the C_{α }trace of a protein) was employed to identify key residues involved in the lidclosing conformational change in adenylate kinase [27]. Hinge residues of protein domain motions can be identified by structural analysis tools such as DynDom [28] and Hingefind [29], or graph theory based method [30]. To further yield dynamic information from static crystal structures, atomistic molecular dynamics (MD) [31] and related methods (such as nonequilibrium MD, see [32]; targeted MD, see [33]; biased MD, see [34]) have been employed to identify key residues or structural elements involved in protein conformational fluctuations and transitions (see [32,35,36]). Nevertheless, the applications of atomistic MD simulations are limited by the high cost of simulating conformational dynamics beyond tens of nanoseconds [31].
To overcome the timescale barrier for atomistic MD simulations, a variety of coarsegrained (lowresolution) modeling techniques [37] have been developed to explore protein conformational motions more efficiently. For example, the Go model [38] has been recently used to simulate conformational transitions between known protein structures (see [39]). A structural thermodynamic model (COREX) based on "Golike" sampling of protein ensembles was developed to simulate coupling between local structural fluctuations, ligand binding and global conformational changes [40]. Of particular interest to the present study is the elastic network model (ENM) [4143] and its isotropic variation – Gaussian network model (GNM) [44,45], which represent a protein structure as a network of C_{α }atoms locally connected by springs with a uniform force constant [46]. The normal mode analysis (NMA) of ENM often yields a handful of lowfrequency modes that dominate the largescale domain motions observed between protein crystal structures [43,47,48]. Numerous studies have established ENM as an efficient means to tease out the functionally relevant conformational dynamics from protein structures with no limit in time scale or system size (for reviews, see [4951]). The ENM/GNMbased NMA has formed the basis of several recently developed computational methods for locating ligandbinding sites [52,53], predicting hinge residues using lowfrequency modes [5456], and modeling conformational transition pathways [5761]. Thanks to its high efficiency the ENM/GNMbased NMA has proven to be a powerful tool for bioinformatic analysis of protein structures and motions at a large scale [47,48,62].
In several recent studies, we have proposed and employed an ENMbased perturbation analysis to predict the dynamically important residues involved in the observed protein functional motions [6365]. This method analyzes how the "dominant mode" (the normal mode that dominates the observed protein functional motions) changes in response to residuepositionspecific perturbations to the ENM force constant that mimic the effects of point mutations (see Methods). By combining the perturbation analysis with the linear response theory, a correlation analysis was developed to locate hinge residues that control the structural fluctuations of the entire protein structure or its active site [66,67]. These novel methods are available at an NIHbased webserver http://enm.lobos.nih.gov webcite. In our previous studies of chaperonin GroEL, helicase, myosin and polymerases [6367], we have shown that the key residues involved in the functional motions of these proteins are highly conserved, and many of them were found to be functionally important by mutational studies [6365]. Nevertheless, the applicability of these methods and other coarsegrained methods to a large variety of proteins remains to be established. Additionally, the perturbation analysis was based on a single dominant mode, while the conformational changes in proteins often involve multiple modes [47,48]. So it is necessary to extend our method to study protein functional motions that are not dominated by a single normal mode.
In this study, we will first introduce a new fluctuationbased perturbation protocol (see Methods) that does not assume the existence of a single dominant mode. Then we will perform a comprehensive evaluation of our ENMbased methods together with several alternative ones for predicting dynamically important residues. For the lack of systematic data on protein dynamics and functions, we will use sequence conservation as the primary metric for method evaluation. The evaluation is performed on two lists of protein structure pairs – a short list with 25 pairs of protein structures which is compiled from previous works by us and others [43,68,69]; a long list with >500 pairs gleaned from Protein Data Bank by an automated procedure (see Methods). The above two lists are complementary in the following ways. The short list contains functionally relevant structural changes whose biological significance was established in literature (all of them are involved in the binding/release of a biologically relevant ligand), but the arbitrary selection of a small number of cases may introduce artificial bias and statistical uncertainty; the long list, however, may contain biologically irrelevant structural changes (for example, due to crystal packing), but it provides a less biased data set with smaller statistical errors.
We have found that the dynamically important residues predicted by the perturbation and correlation analysis are strongly or moderately conserved for >67% of the test cases. These key residues, which constitute a small fraction of all residues (~15%), are clustered both in 3D space and along protein sequence, and they dominate the structural dynamics of ENM. Together, they form a sparse network which may couple distant functional sites in a protein complex. The conservation of these key residues is mainly due to their dynamic role instead of ligand binding or high network connectivity. This largescale study, along with previous databasescale studies using NMA [47,62,7072], will pave the way for future bioinformatic analyses of protein structurefunction relationships via highthroughput dynamic modeling. These techniques promise to offer new biophysical insights that will enrich the existing databases of protein structures and motions [71].
Methods
1. Elastic Network Model (ENM)
In an ENM, a protein structure is represented as a network of C_{α }atoms of amino acid residues whose equilibrium coordinates are given by a crystal structure. A harmonic potential with a uniform force constant C accounts for elastic interactions between two C_{α }atoms that lie within a cutoff distance R_{c }(it is usually set within the range of 7 Å–20 Å). The potential energy function of ENM is [46]
where d_{ij }is the distance between the C_{α }atom i and j, is the equilibrium distance between the C_{α }atom i and j in the crystal structure, N is the number of C_{α }atoms, and θ (x) is the Heaviside function.
The above potential energy function is expanded to second order:
where δX = X  X_{0}, X is a 3Ndimensional vector representing the Cartesian coordinates of N C_{α }atoms, C_{α }gives the equilibrium C_{α }coordinates in the crystal structure, is the Hessian matrix, and the matrix element of H^{ij }is given by , where a and b are indices for x, y and z components of the Cartesian coordinates of C_{α }atoms i' and j'.
A normal mode analysis of the Hessian matrix yields 3N6 nonzero normal modes (excluding 6 zero modes corresponding to 3 rotations and 3 translations), which are numbered from 1 to 3N6 in order of ascending eigenvalue. To validate ENM, each normal mode is compared with the observed structural change between two crystal structures, which is represented by a 3Ndimensional vector ΔX_{obs }obtained by superimposing the two structures with minimal Root Mean Squared Deviation (RMSD). An overlap is calculated for mode m, where V_{m }is its eigenvector, ΔX_{obs} and V_{m} represent the amplitudes of ΔX_{obs }and V_{m}. The mode with the highest overlap is named the dominant mode, if its overlap value is high (I_{m} > 0.5).
To assess how collective the observed structural change is, we define the collectivity of ΔX_{obs }as [43], where and ΔX_{obs, i }is the 3D component of ΔX_{obs }at residue i.
2. Perturbation analysis based on a single dominant mode
To simulate the dynamic effect of a point mutation, we have introduced a perturbation to the local elastic interactions involving a given residue i as follows [63]:
where δC represents an arbitrary change to the force constant of those springs connecting residue i to its neighbors (residue j). This perturbation results in the following change in the Hessian matrix:
In the singlemodebased SPM proposed earlier [6365], we analyzed how much the above perturbations change the eigenvalue (or eigenvector) of the mode that dominates the protein functional motions:
where δλ_{m, i }(denoted as δω_{m }in [63]) is the resulting change in the eigenvalue of mode m, which is proportional to the elastic energy stored in the springs that connect residue i to its neighbors following a displacement of the protein structure in the direction of the eigenvector V_{m}. Because , if we set δC = C/2, then (named as normalized δλ_{m, i}) gives the fractional contribution of local interactions at residue i to mode m. was found to be a good indicator for the dynamic importance of residue positions [63].
3. Perturbation analysis based on overall structural fluctuations and structural fluctuations in the direction of observed conformational change
In addition to the perturbation analysis of individual normal modes [6365], we have also evaluated how much the perturbation in Eq. 3 changes the overall structural fluctuations of the entire protein structure [66] or its active site [67].
The change in the overall structural fluctuations caused by a perturbation at residue i (see Eq. 3) is given by the follow score [66]:
where , δH_{i }is given in Eq. 4, Tr represents the trace of a matrix. In practice, H^{1 }can be calculated efficiently by inverting H + εI using a sparse linear equation solver (I is the identity matrix and ε is a small number set to be 0.00001). The purpose of adding a small ε is to change the eigenvalues of six zero modes of H to positive values so that H can be inverted by a linear equation solver. The eigenvectors are not changed by the addition of ε. Because δH_{i }is translationally and rotationally invariant, the inclusion of translational and rotational modes in H^{1 }does not affect the calculations in Eq. 6 except for a small increase in the eigenvalues of all nonzero modes by ε.
Because , if we assume δC = C/2, then δ2, then δf_{i}/⟨(X  X_{0})^{2}⟩ (named as normalized δf_{i}) gives the fractional contribution of local interactions at residue i to the overall structural fluctuations of the protein structure. This score was previously used to identify the hinge residues in myosin motor domain [66].
To focus on the fluctuations in the direction of the crystallographically observed conformational change (ΔX_{obs}), we introduce a new fluctuationbased perturbation score:
Because , if we assume δC = C/2, then (named as normalized ) gives the fractional contribution of local interactions at residue i to the fluctuations in the direction of ΔX_{obs }Therefore, those residue positions with high are significantly involved in the structural fluctuations that sample the new structural state at X_{0 }+ ΔX_{obs }If the observed conformational change is dominated by a single mode m (ΔX_{obs}~V_{m}), is reduced to . Otherwise, can be calculated without assuming the existence of a single dominant mode.
4. Alternative scores for assessing dynamic importance of residue positions
For comparison with the ENMbased scores of dynamic importance defined in Eqs 5–7, we have also examined the following alternative scores based on structural analysis or GNM modes:
a. Two scores based on changes in the pseudobond angles and pseudodihedral angles associated with the observed conformational change (ΔX_{obs}): they are defined as follows
where δθ(i  l, i, i + l) is the change in the pseudobond angle of three consecutive C_{α }atoms i1, i, i+1; δφ (i2, i1, i, i+1) (δφ (i1, i, i+1, i+2)) is the change in the pseudodihedral angle of four consecutive C_{α }atoms i2, i1, i, i+1 (i1, i, i+1, i+2).
b. Strain energy score: it is defined as the elastic energy stored in the springs that connect residue i to its neighbors following the observed conformational change (ΔX_{obs}):
c. GNMbased fluctuation score: it is defined as follows using the lowest two nonzero GNM modes (following [55])
where V_{mi }is the i'th component of the eigenvector of the GNM mode m, and λ_{m }is the corresponding eigenvalue. Note that residues with high values have low mobility and correspond to hinge centers of a protein structure (see [55]).
5. Evaluation of predictions of dynamically important residue positions
Since the dynamically important residues are not completely known for the proteins in our lists, we will evaluate our predictions indirectly by assessing the conservation of the predicted key residues. As we proposed previously [63], the dynamically important residue positions are under functional constraints so they must be either conserved in amino acid type/property, or under coevolution with each other (See [12]). Therefore, they are expected to be more conserved than those residues not under functional constraints. Consequently, the quality of our predictions can be evaluated by the average conservation scores for the predicted key residue positions.
For a given score of dynamic importance (S_{i }= δλ_{m, i}, δf_{i}, , , , , or ), we rank and select the top 15% key residue positions (the choice of a different percentage between 10% and 20% does not qualitatively change our results).
Next we calculate the average conservation score for N_{key }= 15%·N key residue positions (), and for all residue positions (). The conservation scores (CS) are calculated based on the multiple sequence alignments of homologous protein sequences by ConSurf server (http://consurf.tau.ac.il/ webcite, [73]). The lowest (highest) conservation score represents the most (least) conserved residue position.
Then we calculate the following Z score to assess statistically how well the predicted residue positions are conserved than average:
where σ_{rand }is the squared root of the variation of the average conservation score for N_{key }randomly chosen residue positions calculated as follows:
where {X_{1}, ... } is a randomly selected subset of {CS_{1}, ... CS_{N}}.
The more negative Z is, the more significant is the overall conservation of the key residue positions.
The above evaluation is performed on two lists of test cases (see Table S1&S2 in Additional file 1). For each score, the average (⟨Z⟩) and standard deviation (σ_{Z}) of the Z scores are calculated for the test cases in both lists. A more negative ⟨Z⟩ is indicative of a better performance
Additional file 1. Supplementary tables. The data provided offer detailed information for the two lists of test cases (Tables S1&S2) and the predicted key residues in myosin and kinesin (Tables S3&S4).
Format: DOC Size: 90KB Download file
This file can be viewed with: Microsoft Word Viewer
6. Generation of a long list of PDB structure pairs
We follow an automated procedure to generate a long list of PDB structure pairs (represented as (pdb1, pdb2)):
a. Start from an initial list of 2039 highresolution protein structures (with sequence identity <30%, resolution <1.6 Å, and Rfactor < 0.25) obtained from culledpdb web site http://dunbrack.fccc.edu/ webcite;
b. Remove protein structures with <100 residues from the initial list;
c. For each protein structure (denoted as pdb1) from the initial list, generate a list of homologous structures (denoted as pdb2) that satisfy the following conditions: (i). sequence identity with pdb1 ≥ 90%; (ii). number of residues ≥ 100; (iii). RMSD between pdb1 and pdb2 ≥ 1 Å. Note: A larger list of homologous structures (denoted as pdb2') with only condition (i) satisfied is compiled to define ligandbinding residues of pdb1, which are either in heavyatomcontact with a hetero atom in pdb1 or mapped sequentially to residues of a pdb2' which are in heavyatomcontact with a hetero atom in pdb2';
d. Select the pdb2 with the maximal RMSD relative to pdb1 (denoted as pdb2_{max}) and add the structure pair (pdb1, pdb2_{max}) to the final list.
The final list contains 502 PDB structure pairs. Among them, 473 have ConSurf scores calculated for ≥ 50% of residue positions in pdb1, which are used for Z score calculations. For details see Table S2 in Additional file 1.
Results and discussion
To evaluate the performance of various scores for selecting dynamically important residues (δλ_{m, i}, δf_{i}, , , , , and , see Methods), we have assessed the conservation of the predicted residues (using a Z score, see Methods) for both a short and a long list of test cases (see Table S1&S2 in Additional file 1). The two lists contain a broad range of proteins with diverse biochemical functions (including signaling proteins, DNAbinding proteins, motor proteins and enzymes) and structural architectures (singledomain and multidomain proteins). The results are summarized in Table 1. Then we have analyzed factors (including ENM parameter, properties of protein structures and conformational changes, see Table 2) that may affect the performance of the ENMbased scores (δλ_{m, i}, δf_{i}, ), and their relationships with each other and alternative scores based on structural analysis (, ) and GNM (). Next, we have studied how the predicted key residues correlate with ligandbinding residues and hub residues with high network connectivity, and verified that they form a sparse spatially and sequentially clustered network within protein complexes. Finally, we have illustrated the application of perturbation and correlation protocols to two motor proteins (myosin and kinesin).
Table 1. Performance of various scores of dynamic importance assessed by the average (⟨Z⟩) and standard deviation (σ_{Z}) of Z scores for two lists of test cases (short list in row 2 and long list in row 3).
Table 2. Dependence of performance of perturbationbased scores on various properties of protein structures and conformational changes evaluated by the average (⟨Z⟩) and standard deviation (σ_{Z}) of the Z scores (statistics of the long list).
For the simplicity of evaluation, we have chosen the dynamically important residues as the top 15% residues ranked by a given score. The use of a small percentage like 15% is in line with the previous findings of a sparse network of allosterically coupled residues in various proteins (for example, the allosteric network consists of only 14% of all residues in GPCR, see [13]). It is conceivable that the percentage of dynamically important residues varies from protein to protein. A more sophisticated cutoff scheme will be developed in our future studies.
1. Conservation of dynamically important residues
We first examine the distribution of Z scores for the key residues predicted by the perturbationbased scores (δλ_{m, i}, δf_{i}, , see Figure 1). We have divided all test cases in the long list into three classes based on their Z scores: highly conserved (Z ≥ 2), moderately conserved (2<Z ≥ 1) and not conserved (1<Z). Assuming a normal distribution of the average conservation score for N_{key }randomly chosen residue positions (N_{key }is the number of key residues), a Z score ≥ 2 corresponds to 98% confidence level, and a Z score ≥ 1 corresponds to 84% confidence level. Using one of the three scores (δλ_{m, i}, δf_{i}, , see Figure 1), the predicted residues in 41–48% (20–26%) of cases are highly (moderately) conserved. Therefore, in most (>67%) of the cases, the dynamically important residues predicted by our perturbationbased scores are either highly or moderately conserved.
Figure 1. Distribution of Z scores for 473 cases in the long list using the following three perturbationbased scores. (a) : the percentage of highly conserved (Z ≤ 2) and moderately conserved (2<Z ≤ 1) cases is 47% and 20%; (b) δf_{i}: the percentage of highly conserved and moderately conserved cases is 48% and 20%; (c) δλ_{m, i}: the percentage of highly conserved and moderately conserved cases is 41% and 26%.
We have found large variations in the conservation of dynamically important residues (see Figure 1 and Table 1), which may be attributed to the following causes:
First, it is likely that those key residues coevolve with each other, which results in weaker conservation of them individually. A detailed analysis of correlated mutations (see [1214]) is needed to explore this possibility which is beyond the present study.
Second, the accuracy of ENM (with its simplified force field) in predicting the key residues may vary from case to case. We will further explore how the properties of protein structures and conformational changes affect the accuracy in Subsection 3.
Third, some crystallographically observed conformational changes in the long list may not be functionally relevant, therefore the residues involved in these changes are not under functional constraints. The relevance of this factor is supported by the observation that the average Z scores are significantly more negative for the short list (see Table 1) which includes manually selected protein conformational changes whose functional relevance is established in literature.
Judging from the average Z scores (⟨Z⟩), and δf_{i }perform slightly better than δλ_{m, i }(with more negative ⟨Z⟩, see Table 1), which is attributed to the use of more modes than the dominant one in the perturbation and correlation analysis. To show the statistical significance of the above improvement, we perform the following Z score estimation. The standard error of ⟨Z⟩ for the long list is estimated to be (σ_{Z }is given in Table 1, 473 is the number of protein structure pairs used for Z score calculations, see Methods). The significance of the difference in ⟨Z⟩ between and δλ_{m, i }is assessed by the following Z score:(1.8 + 1.60)/σ_{⟨Z⟩ }≈ 2.6 (⟨Z⟩ values are given in Table 1). This large negative Z score indicates that the improvement from δλ_{m, i }to is statistically significant. Similarly, we can show that the improvement from δλ_{m, i }to δf_{i }is also statistically significant.
2. Optimal performance of perturbation and correlation analysis at a low cutoff distance R_{c }= 7 Å
To explore how the performance of perturbationbased scores (δλ_{m, i}, δf_{i}, ) depends on ENM parameter (cutoff distance R_{c}), the Z score calculations are done for a range of R_{c }values (7 Å ≤ R_{c }≤ 15 Å) (we do not consider R_{c}<7 Å because it causes many zero modes to arise due to insufficient connectivity in ENM). For both the short and long list, we have found that the best performance (the most negative ⟨Z⟩) for δλ_{m, i}, δf_{i }and is attained at R_{c }= 7 Å (see Table 1). In addition, if the C_{α}C_{α }distance cutoff at 7 Å is combined with a heavyatomcontact criterion (namely, two residues are in contact if the minimal distance between their heavy atoms is <4 Å), the performance is further improved (see Table 1).
The above results suggest that the accuracy of ENMbased perturbation and correlation analysis is optimal when the range of C_{α}C_{α }interactions in ENM matches the center of the distribution of C_{α}C_{α }distances between residues in atomic contacts (4.4 Å ~12.8 Å, see [74]). This is contrary to previous studies that found better ENM performance for fitting crystallographic B factors at relatively high R_{c }(15 Å ≤ R_{c }≤ 24 Å) [62]. This is also at odds with our finding that the maximal overlap for the dominant mode is lower at R_{c }= 7 Å (0.35 ± 0.14 for the long list) than at 8 Å ≤ R_{c }≤ 11 Å (0.42 ± 0.16 for the long list). Interestingly, the finding of optimality at low R_{c }is roughly consistent with a recent study that found the optimal descriptions of structural fluctuations and conformational changes by a generalized anisotropic network model at R_{c }= 8 Å [75].
The above discrepancy may be explained as follows. Intuitively, the use of high R_{c }tends to suppress local motions (for example, in a dangling loop) that often arise as extra zero modes at low R_{c}. Therefore, high R_{c }helps to remove the tip effects (overly flexible pieces of proteins that protrude out of the main globular body, see [76]) and improve the description of collective domain motions observed crystallographically [11]. However, the introduction of additional elastic interactions beyond the range of physical interactions between contacting residues may compromise the accuracy of modeling local perturbations by point mutations, which explains the lower performance as R_{c }increases from 7 Å to 15 Å (Table 1). Therefore, the optimal choice of ENM parameter is applicationdependent. To simulate the dynamic effects of local perturbations to protein structures, it is preferable to use a relatively low R_{c}, which allows the structural fluctuations and lowfrequency modes of ENM to respond more sensitively to residuepositionspecific perturbations to local interactions (see Eq. 3 of Methods). Indeed, we have found that the percentage of structural fluctuations contributed by the top 15% key residues (ranked by δf_{i}) decreases significantly (from 59 ± 10% to 31 ± 18%, for cases in the long list) as R_{c }increases from 7 Å to 15 Å. Therefore, despite the findings that lowfrequency modes are robust to ENM parameters (see [77]), the use of perturbation methods to probe ENM dynamics [63,64,53,78] remains feasible (especially at low R_{c}). In this study, we will use ENM constructed with R_{c }= 7 Å and the heavyatomcontact criterion.
3. Dependence of performance on properties of protein structures and conformational changes
We now study how the performance of three perturbationbased scores (, δf_{i}, δλ_{m, i}) depends on four properties of protein structures and conformational changes in the long list. To reduce the statistical noise due to large variations in Z scores (large width in distribution of Z scores, see Figure 1), we have performed the following threetier averaging: for each property, all test cases are sorted by the property value and then divided equally into three tiers (the bottom 1/3, middle 1/3 and top 1/3 go into the low, medium and high tier, respectively); the average (σ_{Z}) and the standard deviation (⟨Z⟩) of the Z scores are calculated separately for each tier (see Table 2). The statistical significance of the observed differences in ⟨Z⟩ between tiers can be demonstrated following the same Z score estimation as given at the end of subsection 1.
a. Protein size (number of residues in a protein structure)
For all three scores (, δf_{i}, δλ_{m, i}), ⟨Z⟩ decreases significantly from lowsize, mediumsize to highsize tier. Thus a better performance is expected for larger proteins despite large variations in Z scores. This result agrees with the general understanding that ENM modeling is more suitable for large proteins whose lowfrequency dynamics is more dependent on the global shape, and less sensitive to inaccuracy in local interactions.
b. RMSD between two PDB structures
The dependence on RMSD differs between three scores – δλ_{m, i }has the weakest RMSDdependence; δf_{i }performs best in mediumRMSD tier; performs best in highRMSD tier. Therefore, δf_{i }and may be used selectively depending on the magnitude of conformational change. It is encouraging that is applicable to largescale conformational changes beyond thermal fluctuations.
c. Overlap of the dominant mode for the observed conformational change
A negative correlation (⟨Z⟩ decreases from low, medium to highoverlap tier) is found for all three scores, which is strongest for δλ_{m, i }and relatively weak for δf_{i }and . This is expected because δλ_{m, i }relies on the assumption of a single dominant mode while δf_{i }and do not. Therefore ENM remains valid for describing structural fluctuations and residues involved even if it fails to capture the observed conformational changes by a single dominant mode.
d. Collective nature of the observed conformational change
Previous studies have found that ENM performs better in describing protein conformational changes (see Yang 2007) with high collectivity (for definition, see Methods). We have explored the dependence of Z scores on the collectivity of the observed conformational change. Surprisingly, it is found that high/medium collectivity only improves the performance of δλ_{m, i }but not or δf_{i}. Therefore, unlike other ENM assessing metrics (such as the overlap of the dominant mode, see [48]), the accuracy of ENMbased predictions for dynamically important residues (using or δf_{i}) does not strongly depend on the collectivity of conformational changes.
In complement to the collectivity calculations, we have also determined how the performance depends on whether the observed conformational change can be approximated by rigidbody domain motions. To this end, we have applied DynDom [28] to the observed conformational changes in the long list, which are then divided into two subsets (success/failure) depending on whether DynDom succeeds/fails in dynamic domain partition. Domain partitions were made successfully by DynDom for 135 out of 502 cases. We have found that all three scores perform better if the conformational change involves domain motions. For those cases where domain partition succeeds, performs the best; for those cases where domain partition fails, δf_{i }performs the best.
In sum, the performance of the three perturbationbased scores depends more on the protein size than the properties of conformational changes (see Table 2). In particular, δf_{i }and perform well even when the observed conformational change is not collective or not dominated by a single mode. In fact, they outperform δλ_{m, i }under those conditions (see Table 2). Therefore the fluctuationbased perturbation and correlation analysis may be applied more broadly than the singlemodebased protocol to cases where many modes are involved in the functional motions.
4. Relationship between various scores of dynamic importance
To evaluate how much the key residues predicted by two scores of dynamic importance (denoted as S1 and S2) overlap with each other, we have calculated a correlation factor – it is defined as the enrichment in the probability of finding a key residue predicted by S2 within the set of key residues predicted by S1 relative to that expected if randomly selecting a residue from all residues in a protein. In practice, the correlation factor is calculated as , where N_{key12 }is the number of key residues predicted by both S1 and S2, N_{key1}(N_{key2}) is the number of key residues predicted by S1 (S2), and N is the total number of residues. A correlation factor >>1 indicates strong correlation between S1 and S2. Based on the statistics of the long list (same below in this subsection), the correlation factor between the key residues predicted by and δf_{i }(δλ_{m, i}) is 3.57 ± 1.13 (3.18 ± 1.39). In addition, 55 ± 17% (49 ± 21%) of the key residues predicted by and δf_{i}(δλ_{m, i}) overlap with each other. Namely, 55 ± 17% (49 ± 21%) of the key residues predicted by are also predicted by δf_{i}(δλ_{m, i}). Therefore, there are strong overlaps between the key residues predicted by the three perturbationbased scores (, δf_{i }and δλ_{m, i}). Many common residues are involved in both the equilibrium fluctuations (in all directions as probed by δf_{i}) and the large conformational change (in a particular direction as probed by ). Therefore, it may be possible to predict those residues involved in the slow conformational change from a known structural state toward an unknown structural state, by simulating the fast equilibrium fluctuations near the known state [27]. This useful strategy avoids the need for costly longtime MD simulations.
Compared with the above perturbationbased scores, the residues involved in the observed conformational changes as identified by alternative scores based on energetic or structural analysis (, , , see Methods) are much less conserved (see Table 1). Therefore, we infer that not all residues involved in the observed conformational changes are equally constrained by functions. From the point of view of structural transition, not all changes in local structures/interactions are equally important to the transition – those earlyoccurring changes are more likely to affect the transition rate than the lateoccurring ones (see [60]). It is conceivable that the early structural changes involve intrinsic motions described by the lowfrequency normal modes. Therefore, by capturing these early structural changes, the NMAbased perturbation and correlation analysis can predict dynamically important residues more accurately than standard structural and energetic analysis.
Following [55] we have also calculated a GNMbased score for predicting hinge residues with minimal mobility. This score () is based on the calculation of meansquared fluctuations due to the lowest two GNM modes (see Methods). It is found that the hinge residues (defined as the top 15% residues ranked by ) are significantly conserved (see Table 1), which supports their functional importance [55]. Nevertheless, they do not overlap significantly with the key residues predicted by the perturbation and correlation analysis. Indeed, the correlation factor between the key residues predicted by and is only 1.6 ± 0.95, and only 25 ± 14% of key residues predicted by and overlap with each other. Such a lack of correlation is not unexpected, because these methods are based on different principles: the GNM score depends on the distribution of structural fluctuations, while and δf_{i }probe how the overall fluctuations are coupled to local perturbations. Therefore, the two methods may complement each other in probing different aspects of protein conformational dynamics.
5. Correlation between dynamically important residues and ligandbinding residues
The correlation between conformational dynamics and ligand binding in enzymes has been explored computationally in several recent studies. In one study, Yang and Bahar found that catalytic residues are often colocalized with hinge residues identified by GNM [55]. Another study found that the ability to trigger large changes in conformational distribution makes a good score function for predicting ligandbinding site [52,53]. One can argue that evolution favors ligand binding to dynamically important regions of proteins to effectively trigger or regulate protein dynamics. Thus these key residues may be constrained by both conformational dynamics and ligand binding affinity.
To explore the correlation between ligandbinding residues and the dynamically important residues predicted by our perturbation and correlation analysis, we have evaluated if these two sets of residues colocalize in protein structures. To enable analysis of a large dataset, the identification of ligandbinding residues is automated using two criteria: first, they are highly conserved (with grade = 9 based on the ConSurf conservation scores, [73]); second, they are in heavyatomcontact with at least one hetero atom (from the HETATM records of PDB files, excluding waters and hydrogen atoms) in the PDB file of the given structure or a homologous protein structure with ≥ 90% sequence identity. The first criterion allows us to automatically filter out those unconserved residues that bind biologically irrelevant ligands or reagents used for crystallization (see [79]). For each test case of the long list, we have calculated a correlation factor (defined as the enrichment in the probability of finding a ligandbinding residue from the set of key residues relative to that expected if randomly selecting a residue from all residues in a protein). A high correlation factor (>>1) indicates a high level of colocalization between the two residue sets. The average correlation factors for the three perturbationbased scores (, δf_{i }and δλ_{m, i}) are 1.58 ± 1.32, 1.45 ± 1.38 and 1.47 ± 1.30, respectively. This result indicates a weak tendency of colocalization – there is ~1.5 times higher probability for ligands to bind with dynamically important residues than random. The ligandbinding residues only comprise a small fraction of the predicted key residues (the average percentage of key residues which are also ligandbinding is only ~6–7%). Therefore, we infer that the conservation of the predicted key residues is not primarily due to the conservation of ligandbinding residues. Indeed, we have reevaluated the performance of the perturbationbased scores after removing all highly conserved ligandbinding residues, the average Z scores are only slightly reduced (see Table 1). Therefore, most of the predicted key residues are evolutionarily constrained by their roles in conformational dynamics rather than specific interactions with ligands.
6. Correlation between dynamically important residues and hub residues
In an ENM, the hub residues with high connectivity (defined as the number of neighboring residues in heavyatomcontact with a given residue) are likely involved in both structural stability and dynamics [80]. Indeed, we have found a high level of conservation for the hub residues with top 15% connectivity (their Z scores are 3.35 ± 1.42 for cases in the long list). Because the residuepositionspecific perturbation (see Eq. 3 of Methods) is applied to the interactions between a given residue and its neighbors, it is natural to expect such perturbation to be larger for residues with more neighbors than those with fewer neighbors. To evaluate the connectivitydependence of the perturbationbased scores, we have assessed how much the predicted key residues overlap with the hub residues.
The correlation factors for the three perturbationbased scores (, δf_{i }and δλ_{m, i}) are 1.65 ± 0.45, 1.69 ± 0.48 and 1.65 ± 0.50, respectively (statistics of long list). This result indicates a relatively weak overlap between the two sets of residues – there is ~1.7 times higher probability to find dynamically important residues to be hub residues than random. The hub residues only comprise a minor fraction of the predicted key residues (the average percentage of key residues predicted by the three scores which are also hub residues is 34–35% for cases in the long list). Thus the majority (>65%) of the dynamically important residues are not hub residues, whose dynamic importance is not due to their high connectivity in ENM.
7. A sparse network formed by dynamically important residues
The three perturbationbased scores (after normalization, see Methods) give the fractional contribution of each residue position to either structural fluctuations (, δf_{i}) or elastic energy (δλ_{m, i}). For cases in the long list, on average 59–70% of the structural fluctuations or elastic energy are contributed by the top 15% key residues predicted by one of the three scores. Therefore the protein dynamics is dominated by a sparse set of key residues, which agrees with similar findings by sequencebased analysis (see [12,13]).
To further explore how the key residues are distributed in 3D space and along protein sequence, we have calculated a spatial (or sequential) correlation factor defined as the enrichment in the probability of finding two key residues in heavyatomcontact (or sequentially separated by <10 residues) relative to that expected if these residues are distributed randomly in a protein structure (or sequence). In practice, the spatial correlation factor is calculated as , where N_{key, par}(N_{par}) is the number of key residue pairs (all residue pairs) that are in heavyatomcontact. For cases in the long list, the spatial correlation factors for the three scores (, δf_{i }and δλ_{m, i}) are 3.69 ± 0.63, 3.61 ± 0.59 and 3.56 ± 0.69, respectively; the sequential correlation factors for the three scores are 2.14 ± 0.67, 1.99 ± 0.58 and 2.08 ± 0.66, respectively. Therefore, the dynamically important residues are clustered both in 3D space and along protein sequence (though to a less extent), which form a strongly coupled network. This finding complements a recent finding that the residues involved in local motions of allosteric proteins are correlated in 3D space and along sequence [24]. The high degree of spatial and sequential connectivity allows these key residues to mediate signal transmissions over long distances between spatially separated functional sites.
8. Case studies of dynamically important residues in myosin and kinesin
Finally, we will apply the above methods to the functional conformational changes in myosin and kinesin – two filamentbased motor proteins driven by ATP binding and hydrolysis. Myosin has been previously studied by the singlemodebased perturbation analysis [64] and correlation analysis [66]. Here we will focus on a comparison of the predictions of dynamically important residues made by various protocols (, δf_{i}, δλ_{m, i}, , connectivity and DynDom).
a. myosin
We have examined the large conformational change (see Figure 2d) from a prepowerstroke transitionstate structure (PDB: 1VOM) to a postpowerstroke nucleotidefree structure (PDB: 2AKA), which pertains to the force generation and hydrolysis product release in myosin [81]. This conformational change involves large rotations of the converter, the upper/lower 50 kDa subdomains (U50/L50) relative to the Nterminal subdomain (see Figure 2d). This collective conformational change is captured by the lowest two ENM modes calculated for 1VOM[66]. The key residues predicted by , δf_{i }and δλ_{m, i }are highly conserved (with Z scores of 8.35, 7.80, 7.70, respectively). The key residues predicted by overlap strongly with those predicted by δf_{i}(66% identical, see Figure 2a) and δλ_{m, i }(74% identical, see Figure 2b). These residues are clustered in 3D space, forming a sparse network that connects the active site to the forcegenerating component (converter) via two flexible connectors – relay helix and SH1 helix (see Figure 2a, b). Interestingly, a few key residues predicted by δf_{i }are located near the actinbinding cleft (see Figure 2a), which may allow actin binding to regulate the structural fluctuations of myosin [66].
Figure 2. Comparison of dynamically important residues in myosin predicted by various protocols. (a). key residues predicted by and δf_{i }are shown as red and green spheres, the common residues are colored in yellow, key components and sites in myosin are also labeled; (b). key residues predicted by and δλ_{m, i }are shown as red and blue spheres, the common residues are colored in purple; (c). key residues predicted by GNM score and high connectivity are shown as green and blue spheres, the overlapping residues are colored in cyan; (d). hinge residues identified by DynDom are shown as red spheres. Also shown in panel (d) is the observed conformational change from a prepowerstroke myosin structure (PDB: 1VOM, colored in silver) to a postpowerstroke myosin structure (PDB: 2AKA, colored in cyan) superimposed along the Nterminal subdomain (residues 80–186), including rotations of U50, L50 and converter subdomains (shown as arrows). For details of the predicted key residues, see Table S3 in Additional file 1.
For comparison, we have also shown the hinge residues predicted by the GNM score [55], the hub residues with high connectivity (see Figure 2c) and the hingebending residues identified by DynDom (see Figure 2d). They overlap insignificantly with the key residues predicted by (18% identical for the GNM score, 27% identical for the hub residues, 11% identical for DynDom). Unlike the key residues predicted by the perturbationbased scores, the hub residues are extensively distributed rather than clustered in space, while the hinge residues predicted by the GNM score are concentrated near the active site [55]. Therefore, they do not provide a signaling path that connects the active site to converter.
b. kinesin
We have studied the observed conformational change (see Figure 3d) from an ADPbound KIF1A structure (PDB: 1I5S) to an ATPanalogbound KIF1A structure (PDB: 1VFW) which pertains to ADP release followed by ATPbindinginduced force generation in kinesin [82,83]. This conformational change involves an en block translation and rotation of the switch II cluster (including α4, α5 helices and switch II at the active site, see Figure 3a, d) and local changes in the switch I region (see Figure 3a, d) [82,83]. The key residues predicted by , δf_{i }and δλ_{m, i }are highly conserved (with Z scores of 4.48, 2.09, 5.21, respectively). The key residues predicted by overlap moderately with those predicted by δf_{i}(44% identical, see Figure 3a) and δλ_{m, i }(30% identical, see Figure 3b). The divergence among the three perturbationbased scores may be due to the involvement of many modes in the structural fluctuations and the observed conformational changes. However, despite the lack of a dominant mode (the maximal overlap is only 0.23), the perturbationbased scores still predict a highly conserved network of residues that connect the active site to the forcegenerating neck linker via α4 helix (see Figure 3a, b, d).
Figure 3. Comparison of dynamically important residues in kinesin predicted by various protocols. (a). key residues predicted by and δf_{i }are shown as red and green spheres, the common residues are colored in yellow, key components and sites in kinesin are also labeled; (b). key residues predicted by and δλ_{m, i }are shown as red and blue spheres, the common residues are colored in purple; (c). key residues predicted by GNM score and high connectivity are shown as green and blue spheres, the overlapping residues are colored in cyan. Shown in panel (d) is the observed conformational change from an ADPbound KIF1A structure (PDB: 1I5S, colored in silver) to an ATPanalogbound KIF1A structure (PDB: 1VFW, colored in cyan), including rotations and translations of α4, α5 helices (shown as arrows). For details of the predicted key residues, see Table S4 in Additional file 1.
For comparison, we have also shown the hinge residues predicted by the GNM score, and the hub residues with high connectivity (see Figure 3c). They overlap moderately with the key residues predicted by (30% identical for the GNM score, 22% identical for the hub residues). Similar to myosin, the hub residues are extensively distributed while the hinge residues predicted by the GNM score are clustered spatially (see Figure 3c).
Conclusion
In this study, we have performed a largescale evaluation of the predictions of dynamically important residues by a variety of computational protocols including three based on the perturbation and correlation analysis of ENM, and alternative ones based on structural analysis or GNM. We have found that the two fluctuationbased scores ( and δf_{i}), by accounting for contributions from many ENM modes, outperform the previously proposed singlemodebased score (δλ_{m, i}, see [6365]) especially in cases where many modes are involved in the conformational changes. The dynamically important residues predicted by the perturbationbased scores (, δf_{i}, δλ_{m, i}) are strongly or moderately conserved, and they form a sparse network of residues which are clustered both in 3D space and along protein sequence. Their overall conservation is attributed to their dynamic role rather than ligand binding or high connectivity in ENM. Future functional studies are needed to dissect the detailed roles of these predicted residues.
As shown by numerous studies (see reviews [4951]), the coarsegrained ENM captures the essence of collective protein dynamics, which is largely determined by the global shape of and local packing in protein structures. Therefore, ENM provides a simple and adequate framework for exploring, with a residue level of details, how much a perturbation to local packing affects protein structural dynamics. Nevertheless, our perturbation methods are only useful for locating the residue positions of dynamic importance, but not for predicting the functional effects of a particular perturbation, which may require more accurate modeling of energetics and dynamics beyond coarsegrained modeling.
Compared with standard NMA, the major advantage of the fluctuationbased scores ( and δf_{i}) is their low computational cost (the CPU time for analyzing the long list of >500 pairs of structures is < 25 minutes on a quadcore Xeon workstation). The calculation does not require the solution of all ENM modes which is computationally expensive and memorydemanding for large proteins. Instead, it only requires the inversion of a sparse Hessian matrix (see Eq. 6 and 7 in Methods), which can be done much faster than NMA, thanks to the availability of highly efficient sparse linear equation solver [84]. The sparseness of the Hessian matrix is particularly high for low R_{c }where the perturbation and correlation analysis is optimal (see Table 1).
The perturbationbased protocols are different from and complementary to the GNMbased technique (see [55]) and various others (see [56] and references therein) that find hinge regions of protein domain motions. The dynamically important residues defined by our perturbation analysis are different from the hinge residues defined based on other criteria (see [56]). Our approach can be applied to protein conformational changes that are not rigidbody domain motions, where interdomain hinges are not well defined.
The finding of conservation of the key residues involved in the fluctuations toward the direction of a structural transition supports the functional importance of such fluctuations. For a biochemical transition from an apo to a ligandbound state, our finding supports the proposal that sampling of the ligandboundstate conformation in the absence of a ligand is essential for the transition (see [4]). The key residues predicted by are likely involved in coordinating such preexisting sampling, which call for future mutational and functional studies of these residues.
It is increasingly recognized that in many proteins the allosteric effects on function involve changes in dynamics in the absence of detectable structural changes [4,5,85]. The ENMbased perturbation and correlation analysis involves local perturbations that do not change the equilibrium structure, so they are well suited for simulating the allosteric effects via dynamic regulation and entropic changes [85]. These methods complement alternative computational techniques that simulate changes in equilibrium structure during biomolecular transitions, including several transition pathway modeling methods [5761]. Our perturbation methods also complement alternative allosterymodeling techniques, including those based on Markov propagation of information across the protein structure [8688].
The perturbation methods described here are similar in spirit to the dynamics perturbation analysis algorithm proposed by Ming and Wall [89]. Both are based on the first order perturbation theory, although the mathematic form of the perturbation to Hessian matrix differs.
Besides predicting dynamically important residues, our perturbation methods can also be used to parameterize ENM to fit crystallographic temperature factors, which depend on both the intrinsic dynamics of a protein structure and its crystalline environment (see [90]). We have also explored the use of perturbation and correlation analysis to probe allosteric couplings in F_{1 }ATPase [91] and the coupling between normal modes in myosin motor [92].
Authors' contributions
WZ conceived of the study, implemented and tested the methods, and drafted the paper. MT provided assistance with webserver applications and references preparation. Both authors read and approved the final manuscript.
Acknowledgements
This study is supported by the funding from SUNY Buffalo and a grant from AHA (0835292N).
References

HenzlerWildman K, Kern D: Dynamic personalities ofproteins.
Nature 2007, 450(2):964972. PubMed Abstract  Publisher Full Text

Gerstein M, Echols N: Exploring the range of protein flexibility, from a structural proteomics perspective.
Curr Opin Chem Biol 2004, 8(2):1419. PubMed Abstract  Publisher Full Text

Yon JM, Perahia D, Ghelis C: Conformational dynamics and enzyme activity.
Biochimie 1998, 80(2):3342. PubMed Abstract  Publisher Full Text

Kern D, Zuiderweg ERP: The role of dynamics in allosteric regulation.
Curr Opin Struct Biol 2003, 13(2):748757. PubMed Abstract  Publisher Full Text

Gunasekaran K, Ma BY, Nussinov R: Is allostery anintrinsic property of all dynamic proteins?
Proteins 2004, 57(2):433443. PubMed Abstract  Publisher Full Text

Changeux JP, Edelstein SJ: Allosteric mechanisms of signaltransduction.
Science 2005, 308(2):14241428. PubMed Abstract  Publisher Full Text

Swain JF, Gierasch LM: The changing landscape of protein allostery.
Curr Opin Struct Biol 2006, 16(2):102108. PubMed Abstract  Publisher Full Text

Cui Q, Karplus M: Allostery and cooperativity revisited.
Protein Sci 2008, 17(2):12951307. PubMed Abstract  Publisher Full Text

Goodey NM, Benkovic SJ: Allosteric regulation andcatalysis emerge via a common route.
Nat Chem Biol 2008, 4(2):474482. PubMed Abstract  Publisher Full Text

Rousseau F, Schymkowitz J: A systems biology perspective on protein structural dynamics and signal transduction.
Curr Opin Struct Biol 2005, 15(2):2330. PubMed Abstract  Publisher Full Text

Gerstein M, Lesk AM, Chothia C: Structural Mechanisms for Domain Movements in Proteins.
Biochemistry 1994, 33(2):67396749. PubMed Abstract

Lockless SW, Ranganathan R: Evolutionarily conserved pathways of energetic connectivity in protein families.
Science 1999, 286(2):295299. PubMed Abstract  Publisher Full Text

Suel GM, Lockless SW, Wall MA, Ranganathan R: Evolutionarily conserved networks of residues mediate allosteric communication in proteins (vol 10, pg 59, 2003).

Kass I, Horovitz A: Mapping pathways of allosteric communication in GroEL by analysis of correlated mutations.
Proteins 2002, 48(2):611617. PubMed Abstract  Publisher Full Text

Bertrand D, Gopalakrishnan M: Allosteric modulation of nicotinic acetylcholine receptors.
Biochem Pharmacol 2007, 74(2):11551163. PubMed Abstract  Publisher Full Text

Raddatz R, Schaffhauser H, Marino MJ: Allosteric approaches to the targeting of Gproteincoupled receptors for novel drug discovery: A critical assessment.
Biochem Pharmacol 2007, 74(2):383391. PubMed Abstract  Publisher Full Text

Ross RA: Allosterism and cannabinoid CB1 receptors: the shape of things to come.
Trends Pharmacol Sci 2007, 28(2):567572. PubMed Abstract  Publisher Full Text

Shi ZS, Resing KA, Ahn NG: Networks for the allosteric control of protein kinases.
Curr Opin Struct Biol 2006, 16(2):686692. PubMed Abstract  Publisher Full Text

Dueber JE, Mirsky EA, Lim WA: Engineering synthetic signaling proteins with ultrasensitive input/output control.
Nat Biotechnol 2007, 25(2):660662. PubMed Abstract  Publisher Full Text

Dueber JE, Yeh BJ, Chak K, Lim WA: Reprogramming control of an allosteric signaling switch through modular recombination.
Science 2003, 301(2):19041908. PubMed Abstract  Publisher Full Text

Kay LE: NMR studies of protein structure and dynamics.
J Magn Reson 2005, 173(2):193207. PubMed Abstract  Publisher Full Text

Boehr DD, Dyson HJ, Wright PE: An NMR perspective on enzyme dynamics.
Chem Rev 2006, 106(2):30553079. PubMed Abstract  Publisher Full Text

Schotte F, Lim MH, Jackson TA, Smirnov AV, Soman J, Olson JS, Phillips GN, Wulff M, Anfinrud PA: Watching a protein as it functions with 150ps timeresolved Xray crystallography.
Science 2003, 300(2):19441947. PubMed Abstract  Publisher Full Text

Daily MD, Gray JJ: Local motions in a benchmark of allosteric proteins.
Proteins 2007, 67(2):385399. PubMed Abstract  Publisher Full Text

Najmanovich R, Kuttner J, Sobolev V, Edelman M: Sidechain flexibility in proteins upon ligand binding.
Proteins 2000, 39(2):261268. PubMed Abstract  Publisher Full Text

Betts MJ, Sternberg MJ: An analysis of conformational changes on proteinprotein association: implications for predictive docking.
Protein Eng 1999, 12(2):271283. PubMed Abstract  Publisher Full Text

HenzlerWildman KA, Lei M, Thai V, Kerns SJ, Karplus M, Kern D: A hierarchy of timescales in protein dynamics is linked to enzyme catalysis.
Nature 2007, 450(2):913U927. PubMed Abstract  Publisher Full Text

Hayward S, Lee RA: Improvements in the analysis of domain motions in proteins from conformational change: DynDom version 1.50.
J Mol Graph Model 2002, 21(2):181183. PubMed Abstract  Publisher Full Text

Wriggers W, Schulten K: Protein domain movements: Detection of rigid domains and visualization of hinges in comparisons of atomic coordinates.
Proteins 1997, 29(2):114. PubMed Abstract  Publisher Full Text

Jacobs DJ, Rader AJ, Kuhn LA, Thorpe MF: Protein flexibility predictions using graph theory.
Proteins 2001, 44:150165. PubMed Abstract  Publisher Full Text

Karplus M, McCammon JA: Molecular dynamics simulations of biomolecules (vol 9, pg 646, 2002).

Ota N, Agard DA: Intramolecular signaling pathways revealed by modeling anisotropic thermal diffusion.
J Mol Biol 2005, 351(2):345354. PubMed Abstract  Publisher Full Text

Schlitter J, Engels M, Kruger P: Targeted MolecularDynamics – a New Approach for Searching Pathways of Conformational Transitions.
J Mol Graph 1994, 12(2):8489. PubMed Abstract

Paci E, Karplus M: Forced unfolding of fibronectin type 3 modules: An analysis by biased molecular dynamics simulations.
J Mol Biol 1999, 288(2):441459. PubMed Abstract  Publisher Full Text

Ma JP, Karplus M: Molecular switch in signal transduction: Reaction paths of the conformational changes in ras p21.
Proc Natl Acad Sci USA 1997, 94(2):1190511910. PubMed Abstract  Publisher Full Text  PubMed Central Full Text

Kong YF, Karplus M: Signaling pathways of PDZ2 domain: A molecular dynamics interaction correlation analysis.
Proteins 2009, 74(2):145154. PubMed Abstract  Publisher Full Text

Tozzini V: Coarsegrained models for proteins.
Curr Opin Struct Biol 2005, 15(2):144150. PubMed Abstract  Publisher Full Text

Ueda Y, Taketomi H, Go N: Studies on Protein Folding, Unfolding, and Fluctuations by ComputerSimulation .2. 3Dimensional Lattice Model of Lysozyme.

Koga N, Takada S: Foldingbased molecular simulations reveal mechanisms of the rotary motor F1ATPase.
Proc Natl Acad Sci USA 2006, 103(2):53675372. PubMed Abstract  Publisher Full Text  PubMed Central Full Text

Hilser VJ, GarciaMoreno B, Oas TG, Kapp G, Whitten ST: A statistical thermodynamic model of the protein ensemble.
Chem Rev 2006, 106(2):15451558. PubMed Abstract  Publisher Full Text

Hinsen K: Analysis of domain motions by approximate normal mode calculations.
Proteins 1998, 33(2):417429. PubMed Abstract  Publisher Full Text

Atilgan AR, Durell SR, Jernigan RL, Demirel MC, Keskin O, Bahar I: Anisotropy of fluctuation dynamics of proteins with an elastic network model.
Biophys J 2001, 80(2):505515. PubMed Abstract  Publisher Full Text  PubMed Central Full Text

Tama F, Sanejouand YH: Conformational change of proteins arising from normal mode calculations.
Protein Eng 2001, 14(2):16. PubMed Abstract  Publisher Full Text

Haliloglu T, Bahar I, Erman B: Gaussian dynamics of folded proteins.

Bahar I, Atilgan AR, Erman B: Direct evaluation of thermal fluctuations in proteins using a singleparameter harmonic potential.
Folding & Design 1997, 2(2):173181. PubMed Abstract

Tirion MM: Large Amplitude Elastic Motions in Proteins from a SingleParameter, Atomic Analysis.
Phys Rev Lett 1996, 77(2):19051908. PubMed Abstract  Publisher Full Text

Krebs WG, Alexandrov V, Wilson CA, Echols N, Yu HY, Gerstein M: Normal mode analysis of macromolecular motions in a database framework: Developing mode concentration as a useful classifying statistic.
Proteins 2002, 48(2):682695. PubMed Abstract  Publisher Full Text

Yang L, Song G, Jernigan RL: How well can we understand largescale protein motions using normal modes of elastic network models?
Biophys J 2007, 93(2):920929. PubMed Abstract  Publisher Full Text  PubMed Central Full Text

Ma JP: Usefulness and limitations of normal mode analysis in modeling dynamics of biomolecular complexes.
Structure 2005, 13(2):373380. PubMed Abstract  Publisher Full Text

Bahar I, Rader AJ: Coarsegrained normal mode analysis in structural biology.
Curr Opin Struct Biol 2005, 15(2):586592. PubMed Abstract  Publisher Full Text  PubMed Central Full Text

Tama F, Brooks CL: Symmetry, form, and shape: Guiding principles for robustness in macromolecular machines.
Annu Rev Biophys Biomol Struct 2006, 35:115133. PubMed Abstract  Publisher Full Text

Ming DM, Wall ME: Quantifying allosteric effects in proteins.
Proteins 2005, 59(2):697707. PubMed Abstract  Publisher Full Text

Ming DM, Wall ME: Interactions in native binding sites cause a large change in protein dynamics.
J Mol Biol 2006, 358(2):213223. PubMed Abstract  Publisher Full Text

Xu CY, Tobi D, Bahar I: Allosteric changes in protein structure computed by a simple mechanical model: Hemoglobin T <> R2 transition.
J Mol Biol 2003, 333(2):153168. PubMed Abstract  Publisher Full Text

Yang LW, Bahar I: Coupling between catalytic site and collective dynamics: A requirement for mechanochemical activity of enzymes.
Structure 2005, 13(2):893904. PubMed Abstract  Publisher Full Text  PubMed Central Full Text

Flores SC, Keating KS, Painter J, Morcos F, Nguyen K, Merritt EA, Kuhn LA, Gerstein MB: HingeMaster: normal mode hinge prediction approach and integration of complementary predictors.
Proteins 2008, 73(2):299319. PubMed Abstract  Publisher Full Text

Kim MK, Chirikjian GS, Jernigan RL: Elastic models of conformational transitions in macromolecules.
J Mol Graph Model 2002, 21(2):151160. PubMed Abstract  Publisher Full Text

Miyashita O, Onuchic JN, Wolynes PG: Nonlinear elasticity, proteinquakes, and the energy landscapes of functional transitions in proteins.
Proc Natl Acad Sci USA 2003, 100(2):1257012575. PubMed Abstract  Publisher Full Text  PubMed Central Full Text

Maragakis P, Karplus M: Large amplitude conformational change in proteins explored with a plastic network model: Adenylate kinase.
J Mol Biol 2005, 352(2):807822. PubMed Abstract  Publisher Full Text

Zheng WJ, Brooks BR, Hummer G: Protein conformational transitions explored by mixed elastic network models.
Proteins 2007, 69(2):4357. PubMed Abstract  Publisher Full Text

Franklin J, Koehl P, Doniach S, Delarue M: MinActionPath: maximum likelihood trajectory for largescale structural transitions in a coarsegrained locally harmonic energy landscape.
Nucleic Acids Res 2007, 35:W477482. PubMed Abstract  Publisher Full Text  PubMed Central Full Text

Eyal E, Yang LW, Bahar I: Anisotropic network model: systematic evaluation and a new web interface.
Bioinformatics 2006, 22(2):26192627. PubMed Abstract  Publisher Full Text

Zheng WJ, Brooks BR, Doniach S, Thirumalai D: Network of dynamically important residues in the open/closed transition in polymerases is strongly conserved.
Structure 2005, 13(2):565577. PubMed Abstract  Publisher Full Text

Zheng WJ, Brooks BR, Thirumalai D: Lowfrequency normal modes that describe allosteric transitions in biological nanomachines are robust to sequence variations.
Proc Natl Acad Sci USA 2006, 103(2):76647669. PubMed Abstract  Publisher Full Text  PubMed Central Full Text

Zheng W, Brooks BR, Thirumalai D: Allosteric transitions in the chaperonin GroEL are captured by a dominant normal mode that is most robust to sequence variations.
Biophys J 2007, 93(2):22892299. PubMed Abstract  Publisher Full Text  PubMed Central Full Text

Zheng W, Brooks B: Identification of dynamical correlations within the myosin motor domain by the normal mode analysis of an elastic network model.
J Mol Biol 2005, 346(2):745759. PubMed Abstract  Publisher Full Text

Zheng W, Liao JC, Brooks BR, Doniach S: Toward the mechanism of dynamical couplings and translocation in hepatitis C virus NS3 helicase using elastic network model.
Proteins 2007, 67(2):886896. PubMed Abstract  Publisher Full Text

Zheng WJ, Brooks BR: Normalmodesbased prediction of protein conformational changes guided by distance constraints.
Biophys J 2005, 88(2):31093117. PubMed Abstract  Publisher Full Text  PubMed Central Full Text

Zheng WJ, Brooks BR: Modeling protein conformational changes by iterative fitting of distance constraints using reoriented normal modes.
Biophys J 2006, 90(2):43274336. PubMed Abstract  Publisher Full Text  PubMed Central Full Text

Wako H, Kato M, Endo S: ProMode: a database of normal mode analyses on protein molecules with a fullatom model.
Bioinformatics 2004, 20(2):20352043. PubMed Abstract  Publisher Full Text

Echols N, Milburn D, Gerstein M: MolMovDB analysis and visualization of conformational change and structural flexibility.
Nucleic Acids Res 2003, 31(2):47882. PubMed Abstract  Publisher Full Text  PubMed Central Full Text

Yang LW, Liu X, Jursa CJ, Holliman M, Rader A, Karimi HA, Bahar I: iGNM: a database of protein functional motions based on Gaussian Network Model.
Bioinformatics 2005, 21(2):29782987. PubMed Abstract  Publisher Full Text  PubMed Central Full Text

Landau M, Mayrose I, Rosenberg Y, Glaser F, Martz E, Pupko T, BenTal N: ConSurf 2005: the projection of evolutionary conservation scores of residues on protein structures.
Nucleic Acids Res 2005, 33:W299W302. PubMed Abstract  Publisher Full Text  PubMed Central Full Text

Cieplak M, Hoang TX: Universality classes in folding times of proteins.
Biophys J 2003, 84(2):475488. PubMed Abstract  Publisher Full Text  PubMed Central Full Text

Zheng WJ: A unification of the elastic network model and the Gaussian network model for optimal description of protein conformational motions and fluctuations.
Biophys J 2008, 94(2):38533857. PubMed Abstract  Publisher Full Text  PubMed Central Full Text

Lu MY, Poon B, Ma JP: A new method for coarsegrained elastic normalmode analysis.

Kondrashov DA, Van Wynsberghe AW, Bannen RM, Cui Q, Phillips GN: Protein structural variation in computational models and crystallographic data.
Structure 2007, 15(2):169177. PubMed Abstract  Publisher Full Text  PubMed Central Full Text

Taly A, Corringer PJ, Grutter T, de Carvalho LP, Karplus M, Changeux JP: Implications of the quaternary twist allosteric model for the physiology and pathology of nicotinic acetylcholine receptors.
Proc Natl Acad Sci USA 2006, 103(2):1696516970. PubMed Abstract  Publisher Full Text  PubMed Central Full Text

Hu LG, Benson ML, Smith RD, Lerner MG, Carlson HA: Binding MOAD (Mother of All Databases).
Proteins 2005, 60(2):333340. PubMed Abstract  Publisher Full Text

Bode C, Kovacs IA, Szalay MS, Palotai R, Korcsmaros T, Csermely P: Network analysis of protein dynamics.
FEBS Lett 2007, 581(2):27762782. PubMed Abstract  Publisher Full Text

Reubold TF, Eschenburg S, Becker A, Kull FJ, Manstein DJ: A structural model for actininduced nucleotide release in myosin.
Nat Struct Biol 2003, 10(2):826830. PubMed Abstract  Publisher Full Text

Kikkawa M, Sablin EP, Okada Y, Yajima H, Fletterick RJ, Hirokawa N: Switchbased mechanism of kinesin motors.
Nature 2001, 411(2):439445. PubMed Abstract  Publisher Full Text

Nitta R, Kikkawa M, Okada Y, Hirokawa N: KIF1A alternately uses two loops to bind microtubules.
Science 2004, 305(2):678683. PubMed Abstract  Publisher Full Text

Chen Y, Davis TA, Hager WW, Rajamanickam S: Algorithm 8xx CHOLMOD, supernodal sparse Cholesky factorization and update/downdate.

Tsai CJ, del Sol A, Nussinov R: Allostery: Absence of a change in shape does not imply that allostery is not at play.
J Mol Biol 2008, 378(2):111. PubMed Abstract  Publisher Full Text  PubMed Central Full Text

Chennubhotla C, Bahar I: Markov propagation of allosteric effects in biomolecular systems: application to GroELGroES.
Mol Syst Biol 2006, 2:36. PubMed Abstract  Publisher Full Text  PubMed Central Full Text

Chennubhotla C, Bahar I: Signal propagation in proteins and relation to equilibrium fluctuations.
PLoS Comput Biol 2007, 3(2):171626. PubMed Abstract  Publisher Full Text  PubMed Central Full Text

Lu H, Liang J: Perturbationbased Markovian TransmissionModel for Macromolecular Machinery in Cell.
Engineering in Medicine and Biology Society, 29th Annual International Conference of the IEEE 2007, 50295034.

Ming D, Cohn JD, Wall ME: Fast dynamics perturbation analysis for prediction of protein functional sites.
BMC Struct Biol 2008, 8:5. PubMed Abstract  Publisher Full Text  PubMed Central Full Text

Riccardi D, Cui Q, Phillips GN Jr: Application of elastic network models to proteins in the crystalline state.
Biophys J 2009, 96(2):46475. PubMed Abstract

Zheng W: Normal mode based modeling of allosteric couplings that underlie cyclic conformational transition in F_{1 }ATPase.
Proteins 2009, 76(2):747762. PubMed Abstract  Publisher Full Text

Zheng W, Thirumalai D: Coupling between normal modes drives protein conformational dynamics: illustrations using allosteric transitions in myosin II.
Biophys J 2009, 96(2):21282137. PubMed Abstract  Publisher Full Text