Skip to main content

Modeling protein conformational transitions by a combination of coarse-grained normal mode analysis and robotics-inspired methods

Abstract

Background

Obtaining atomic-scale information about large-amplitude conformational transitions in proteins is a challenging problem for both experimental and computational methods. Such information is, however, important for understanding the mechanisms of interaction of many proteins.

Methods

This paper presents a computationally efficient approach, combining methods originating from robotics and computational biophysics, to model protein conformational transitions. The ability of normal mode analysis to predict directions of collective, large-amplitude motions is applied to bias the conformational exploration performed by a motion planning algorithm. To reduce the dimension of the problem, normal modes are computed for a coarse-grained elastic network model built on short fragments of three residues. Nevertheless, the validity of intermediate conformations is checked using the all-atom model, which is accurately reconstructed from the coarse-grained one using closed-form inverse kinematics.

Results

Tests on a set of ten proteins demonstrate the ability of the method to model conformational transitions of proteins within a few hours of computing time on a single processor. These results also show that the computing time scales linearly with the protein size, independently of the protein topology. Further experiments on adenylate kinase show that main features of the transition between the open and closed conformations of this protein are well captured in the computed path.

Conclusions

The proposed method enables the simulation of large-amplitude conformational transitions in proteins using very few computational resources. The resulting paths are a first approximation that can directly provide important information on the molecular mechanisms involved in the conformational transition. This approximation can be subsequently refined and analyzed using state-of-the-art energy models and molecular modeling methods.

Background

Conformational transitions in proteins are generally related to their capacity to interact with other molecules. Their study is therefore essential for the understanding of protein functions. Unfortunately, it is very difficult to obtain this type of dynamic information at the atomic scale using experimental techniques. Modeling protein conformational transitions with conventional computational methods is also challenging because, in many cases, these transitions are rare, slow events. Standard molecular dynamics (MD) simulations with current computational resources cannot be applied in practice to model large-amplitude (slow time-scale) conformational transitions. Such simulations require variants of MD methods that enhance sampling of rare events or that bias the exploration in a given direction (e.g. [15]), or, alternatively, to have access to outstanding computational power [6].

Modeling conformational transitions in proteins has motivated the development of specific methods, computationally more efficient than MD simulations. Many of these methods (e.g. [79]) are based on the deformation of a trivial initial path between the two given conformations toward the minimum energy path connecting them. Consequently, the performance of these methods is strongly conditioned by the suitability of the initial path. In recent years, methods to model conformational transitions have also been developed on the basis of robot motion planning algorithms [1013]. Most of these robotics-inspired methods are aimed at providing qualitative information about the conformational transition using few computational resources. For this, they exploit the efficiency of sampling-based exploration algorithms applied to simplified molecular models.

The high dimensionality of the space to be explored is the main difficulty that all computational methods to model protein conformational transitions have to face. Therefore, several approaches have been developed to reduce the dimensionality of the problem (e.g. [1416]). Normal mode analysis (NMA) [17] is a particularly interesting tool in this regard, since a small number of low-frequency normal modes provide a good hint of the direction of large-amplitude conformational changes [1821]. Several recent works apply this property of NMA to improve the performance of conformational exploration methods.

The approach presented in this paper was originally introduced in [22]. The basic principle is to use NMA to bias the conformational exploration performed by a Rapidly-exploring Random Tree (RRT) algorithm [23], aiming to efficiently compute conformational transition paths. The main novelty presented in the present work is the introduction of a multi-scale model for the protein. In this model, an elastic network is defined considering only a single node (called a particle) per tripeptide. Motion directions provided by NMA of such a coarse-grained elastic network are then applied to the all-atom model for a more accurate conformational exploration. The introduction of this multi-scale model has important outcomes. First, the number of normal modes is largely reduced thanks to the use of the coarse-grained model, which significantly reduces the time required to compute them. In addition, generating the all-atom model from the coarse-grained model can be accurately and efficiently achieved using methods from robot kinematics [24], avoiding the need of artifacts such as the RTB approach (rotations-translations of blocks) [25].

Next section presents the overall method, and explains each of its elementary components: elastic network normal mode analysis, tripeptide-based multi-scale protein modeling, and motion-planning-based conformational exploration. Then, several types of results aimed to validate the approach and to show its good computational performance are presented for a set of proteins with different sizes and topologies. A more detailed analysis of results is presented for adenylate kinase (ADK). Finally, together with the conclusions, we discuss possible directions for future work. Note that a preliminary version of this work was presented in [26]. Compared to this previous version, this paper includes more detailed explanations of the method, a more exhaustive presentation of results, with additional figures and tables, as well as additional results for the ADK protein. In addition, some movies that illustrate results obtained with the proposed method are included as supplementary material.

Methods

This section presents a new method to model protein conformation transitions. It builds on the combination of several components inside an iterative algorithm. One of these components is NMA performed on a coarse-grained elastic network model of the protein, which enables very fast computation of normal modes. Indeed, a single particle of the elastic network is considered for each group of three consecutive amino-acid residues (i.e. one particle per tripeptide). The all-atom model, which is used to accept or reject sampled states during the conformational exploration, is accurately reconstructed from the coarse-grained one using closed-form inverse kinematics. The RRT algorithm is applied to explore linear combinations of normal modes computed from intermediate conformations along the path. All these elementary components of the method are further explained below.

Elastic networks and normal mode analysis

Based on a harmonic approximation of the potential energy, normal mode analysis provides information about the directions and frequencies of vibration of a molecule from a minimum-energy conformation. Each mode represents a motion pattern, in which all the atoms move with the same frequency and phase. Low-frequency normal modes correspond to collective motions (e.g. domain motions), whereas high-frequency normal modes correspond to local fluctuations [19, 27].

Normal modes are calculated by diagonalizing the Hessian matrix of the potential energy of the molecule. For reducing the computational cost of this operation, several works propose to use simplified potentials and coarse-grained models. An extensively used simplified potential is based on the elastic network model (ENM) [28], which represents the molecule as a set of particles connected by virtual springs. All the protein atoms can be considered as particles in the elastic network. However, a coarse-grained representation that only considers C α atoms (i.e. a single particle per amino-acid residue) is often applied [19, 20]. Moreover, particles are connected by virtual springs only if they are closer than a user-defined cut-off distance d cut .

The potential energy function of such an elastic network takes the following form:

E = d i j 0 < d c u t C 2 ( d i j - d i j 0 ) 2

where d ij is the distance between particle i and particle j, d i j 0 is the distance between the two particles at the equilibrium state and C is the elastic constant. This type of simplified potential has been used in many works and for very different applications [2932].

In this work, we investigate a further simplification of the ENM. Indeed, the ENM is built using a coarser model based on tripeptides, instead of using C α atoms. Figure 1 illustrates the approach. Note that coarse-grained NMA approaches considering more than one residue per particle have already been proposed [25, 33, 34]. However, these approaches, which are mainly devised to analyze motions of very large systems made of protein assemblies, consider rigid-body motions of groups of residues. In contrast, the approach presented here preserves full flexibility of the protein, which leads to a more accurate simulation of conformational transitions.

Figure 1
figure 1

Illustration of the different models on the ADK protein. (a) Representation of the all-atom model, (b) the particles of the coarse-grained tripeptide-based model, (c) representation of the elastic network model.

Several works show that using a simplified ENM does not necessarily imply a loss of accuracy in the prediction of large-amplitude motion directions [20, 25]. However, it certainly leads to a computational performance gain. This issue is further discussed in the results section, where the performance of NMA using tripeptide-based models and Cα-based models is compared.

The anisotropic network model (ANM) approach, as described in [27, 35], is adopted in this work to construct the Hessian matrix from the positions of the particles of the tripeptide-based model. Each 3 × 3 sub-matrix corresponding to the interaction between two particles is computed as follows:

H i j = - C d i j 2 ( x j - x i ) ( x j - x i ) ( x j - x i ) ( y j - y i ) ( x j - x i ) ( z j - z i ) ( y j - y i ) ( x j - x i ) ( y j - y i ) ( y j - y i ) ( y j - y i ) ( z j - z i ) ( z j - z i ) ( x j - x i ) ( z j - z i ) ( y j - y i ) ( z j - z i ) ( z j - z i )
H i j = - j | j i H i j

If the distance between particles i and j is more than the cut-off distance d cut , then the whole 3 × 3 matrix is replaced by zeros. The Hessian matrix is then diagonalized to compute the eigenvalues and eigenvectors. Each eigenvalue and eigenvector pair corresponds to one normal mode, where the eigenvalue defines the mode frequency and the eigenvector defines the motion direction for each particle in the elastic network.

Multi-scale model

Tripeptide-based model

The multi-scale modeling approach applied in this work is based on a decomposition of the protein chain into fragments of three amino acid residues, which we refer to as tripeptides. The reason for choosing such a subdivision is that, assuming fixed bond lengths, bond angles and peptide bond torsions, the backbone of a tripeptide involves 6 degrees of freedom (three pairs of angles φ, ψ), and thus, an analogy can be made with a 6R mechanism like a robotic manipulator [24]. Two Cartesian reference frames attached to the N atom in the backbone of the first residue and to the C atom in the last residue define respectively the base-frame and the end-frame of the 6R mechanism. Since tripeptides are linked through rigid peptide bonds, the location of the end-frame of tripeptide i can be determined from the base-frame of tripeptide i + 1 by a constant 3D transformation. Given the location of the base-frame and the end-frame, the conformation of a tripeptide backbone can be obtained by inverse kinematics. Consequently, the conformation of the whole protein backbone can be determined from the pose of a single reference frame attached to each tripeptide (this is true for all the protein backbone except two short fragments at the N-terminal and C-terminal ends of the chain, which require a particular treatment). In the following, we will refer to these reference frames as (oriented) particles. They are the particles in the coarse-grained ENM.

Reconstructing the all-atom model

The interest of the decomposition of the protein into tripeptides explained above is that closed-form inverse kinematics (IK) can be applied to reconstruct the all-atom protein model from the coordinates of the particles. The IK solver applied in this work has been adapted from the method developed by Renaud [36]. This solver is based on algebraic elimination theory, and develops an ad-hoc resultant formulation inspired by the work of Lie and Liang [37]. Starting from a system of equations representing the IK problem, the elimination procedure leads to an 8-by-8 quadratic polynomial matrix in one variable. The problem can then be treated as a generalized eigenvalue problem, as proposed in [38], for which efficient and robust methods such as the Schur factorization can be applied. Note however that our approach is not dependent on this solver, so that other IK methods (e.g. [38, 39]) could be applied.

In general, the IK problem for a 6R serial kinematic chain has a finite number of solutions (up to 16 in the most general case). All the solutions correspond to geometrically valid conformations of the tripeptide backbone with fixed ends defined by the pose of the particles. However, when the goal is to simulate continuous motions, the closest conformation to the previous one (i.e. the one before a perturbation applied to the particles) has to be selected in order to avoid jumps in the conformational space. All IK solutions are rejected if none of them remains within a distance threshold that depends on the perturbation step-size.

The explanations above concern only the reconstruction of the all-atom model of the protein backbone from the coarse-grained tripeptide-based model. Side-chains are treated separately, using a simple method based on energy minimization as explained below.

Path finding algorithm

The path finding method works by iteratively generating short portions of the transition between two given conformations of a protein, which we will refer to as q init and q goal . Algorithm 1 presents the pseudo-code with the main steps of the method. At each iteration, normal modes are computed for a root conformation q root . Note that q root = q init for the first iteration. Then, the RRT algorithm is applied to explore motions corresponding to linear combinations of normal modes. RRT is run until the protein moves a predefined distance toward the target conformation q goal . The conformational exploration performed by the RRT algorithm is further explained below. Once the RRT exploration is stopped, the closest node q close in the tree to q goal is searched. The path between q root and q close is then extracted and saved. All the conformations in this path are guaranteed to have a collision-free backbone (including C β atoms) which generally implies getting acceptable energy values after a short minimization to rearrange side-chain conformations. Such an energy minimization procedure is performed on q close , which will be the root conformation in the next iteration. The algorithm keeps iterating until a predefined distance d target to q goal is reached. The resulting path is defined by the sequence of minimized conformations q close at each iteration. If a finer-grained path is required, other intermediate conformation can be extracted from the sub-paths computed at each iteration. These conformations may require energy minimization to rearrange side-chains, as it is done for q close .

Algorithm 1: COMPUTE _PATHWAY

input : Initial conformation q init , final conformation q goal , minimum distance to target d target

output : The transition path p

begin

    q root ← q init ;

    while RMSD(q root , q goal ) >d target do

        a ← COMPUTE _NORMAL _MODES(q root );

        t ← BUILD _RRT(q root , q goal , a);

        q close CLOSEST _TO _TARGET(t, q goal );

        q root ← MINIMIZE(q close );

        p ← CONCATENATE(p, q root );

end

Algorithm 2: BUILD _RRT

input : Initial conformation q root , final conformation q goal , normal modes a

output : The tree t

begin

    t ← INIT _TREE(q root );

    while not STOP _CONDITION(t, q goal ) do

        q rand SAMPLE(t, a);

        q near BEST _NEIGHBOR(t, q rand );

        q new EXPAND _TREE(q near , q rand );

        if IS VALID (q new ) then

            ADD _NEW _NODE(t, q new );

            ADD _NEW _EDGE(t, q near , q new );

end

Implementation details

The RRT algorithm, iteratively applied in Algorithm 1, performs the same steps as the basic RRT [23]. The steps are sketched in Algorithm 2. At each iteration, a conformation q rand is randomly sampled. Note that q rand is not required to be a feasible conformation. Then, the tree is searched for the closest conformation to q rand , called q near . A new conformation, q new , is generated by moving from q near towards q rand with a predefined short step size. The new conformation is added to the tree if it does not violate feasibility constraints, which in the present work are limited to geometric constrains related to no atom overlapping and no bond breaking. The difference with respect to the basic RRT algorithm concerns the implementation of the methods for sampling conformations, searching the nearest neighbor, and expanding the tree. These methods, which are further explained below, are specific to the present framework because of the multi-scale protein model and the application of NMA to bias the exploration.

Sampling random conformations

The idea is to randomly sample conformations q rand using information given by the normal modes. The coarse-grained tripeptide-based model is used at this level. Hence, q rand is not an all-atom conformation, but an array of particle positions. Random particle positions are generated by moving them from their initial positions, defined by q root , using a linear combination of normal modes with randomly sampled weights. More precisely:

  • A sequence of 3n random weights w j are sampled in the range [-1, 1], where n is the number of particles, being 3n the number of normal modes (actually, the number of normal modes is 3n − 6, since 6 degrees of freedom correspond to rigid-body motions of the whole set of particles).

  • The new positions of the n particles are computed by a linear combination of all the randomly weighted modes as follows:

    q r a n d = q r o o t + 3 n f * w j * a j

where a j refers to each normal mode, and f is an amplification factor used to push the sampled conformation away from q root (this factor is the same for all the normal modes). Note that, since the normal modes are not normalized, low frequency modes have larger norm. Thus, they contribute more significantly in the sum.

Finding nearest neighbors

Nearest neighbor search is also performed using the coarse-grained model. Indeed, the computed distance is based on the root mean squared deviation (RMSD) of the particle positions. In the current implementation, the distance is biased to pull the exploration towards the target conformation as follows:

d ( q , q r a n d ) = RMSD ( q , q r a n d ) RMSD ( q , q g o a l ) RMSD ( q i n i t , q g o a l ) .

In this work, we have implemented a simple brute-force algorithm to find q near . More sophisticated nearest neighbor search algorithms could be used to reduce the number of performed distance computations. Note, however, that currently used algorithms based on space partitioning techniques (e.g. kd-trees) do not perform well in high-dimensional spaces [40]. A computationally efficient solution would require the implementation of an approximate nearest neighbor search algorithm.

Generating new conformations

For generating q new , all particle positions in q near are linearly interpolated towards q rand with a predefined step size k. Given these new particle positions, the all-atom model corresponding to q new is obtained by solving an IK problem for every tripeptide. The implemented method proceeds iteratively. If no IK solution is found for a tripeptide t i (the tripeptide between particles p i and pi+1) or if the solution involves atom collisions, the pose (position and orientation) of particle pi+1is slightly perturbed and the IK problem is solved again. This process is repeated until a collision-free IK solution is found or a maximum number of trials is reached. If this process fails to find a collision-free IK solution for any tripeptide, failure is reported and the RRT algorithm goes back to the random sampling step.

Once the treatment of all tripeptides has been completed, the conformation of the two terminal fragments is generated. For this, the pose of these fragments is updated with respect to the new poses of the first and last tripeptides. Random perturbations can be applied to these end fragments in order to remove possible collisions with the rest of the protein.

Protein conformations q new generated using the aforementioned process are guaranteed to satisfy geometric constraints: correct bond geometry and no overlap betweew backbone atoms. However, in order to speed-up computations, side-chains are not treated at this stage (only C β atoms are considered for collision avoidance). This is because side-chains are known to be very flexible, and resolving possible collisions along the conformational transition path can be done in a post-processing stage. Indeed, side-chain collisions are resolved during the minimization step at the end of each short RRT execution.

Results and discussion

This section discusses several experiments aimed to validate the proposed method and to evaluate its performance. First, the question concerning the accuracy of the tripeptide-based elastic network model is addressed. Then, results are presented on conformational transitions computed for a set of ten proteins with different sizes and topologies. Finally, further results on adenylate kinase are presented and compared to available data on the transition between the open and closed forms of this protein.

Validating the coarse-grained ENM

Previous works (e.g. [19, 20]) have shown that simple ENMs built using C α atoms perform as well as ENMs built using the all-atom model when studying the dynamic properties of proteins with NMA. Here, we compare the performance of the proposed tripeptide-based model with the C α -based model for predicting directions of conformational transitions. A set of seven proteins listed in Table 1 was used for this comparison. These proteins were also used in related work [20] for the validation of the C α -based ENM.

Table 1 Proteins used in the overlap experiments

For evaluating the capability of normal modes to predict directions of conformational transitions, we use the notion of overlap as proposed in related work [20]. The overlap I j between a normal mode j and an experimentally observed conformational change between two conformations (open and closed) qoand qcis defined as a measure of similarity between the conformational change and the direction given by the normal mode j. It can be computed as follows:

I j = 3 n a i j Δ q i 3 n a i j 2 3 n Δ q i 2 1 / 2

where Δ q i = q i o - q i c measures the difference between the particle coordinates in conformations qoand qc, a ij corresponds to the ith coordinate of the normal mode j, and n is the number of particles. A value of 1 for the overlap means that the direction given by the normal mode matches exactly the conformational change, whereas a value around 0.2 or less means that the normal mode is unable to provide any meaningful prediction.

Before conducting the comparative analysis, we need to determine an optimal cutoff distance for the tripeptide-based ENM. A good cutoff distance should create an elastic network that correctly captures the topology of the protein. For C α -based models, 8 Å is generally used, since this cutoff distance has been empirically shown to provide the best results in most cases. It can be intuitively inferred that the same cutoff distance may not be the optimal choice in our case, because distances between particles of the tripeptide-based model are larger than distances between C α atoms. Moreover, defining the optimal cutoff value theoretically is not straightforward. Therefore, we have measured and compared the overlap values for the seven proteins with cutoff distances between 8 and 34 Å in order to empirically determine the most suitable range of cutoff values. Figure 2 shows the overlap value for each cutoff distance averaged over the seven proteins. Note that, for each protein, overlap values were computed for all the normal modes, and the best value was considered for the average. As clearly shown in the figure, the best overlap values are for cutoff distances of 15, 16 and 17 Å.

Figure 2
figure 2

Average overlap over the seven proteins of Table 1. Lines are drawn between the 25th and the 75th percentiles of the overlap values. Average overlap values are indicated with dots.

The tripeptide-based ENMs for four of the proteins in Table 1, using a cutoff distance of 16 Å, are represented in Figure 3. The figure shows that the main topological features of the proteins appear in the coarse-grained model.

Figure 3
figure 3

Tripeptide-based elastic network models. Representation of the all-atom models and the tripeptide-based ENMs for four different proteins.

Table 2 compares overlap values of tripeptide-based ENMs using a cutoff distance of 16 Å with those presented in [20] for C α -based ENM using a cutoff distance of 8 Å. In the table, columns labeled "Open" correspond to the open-to-closed conformation and columns labeled "Closed" are for the opposite case. The similar overlap values show that the coarse-garined, tripeptide-based ENM is also able to capture the topological information required to compute normal modes that correctly predict directions of large-amplitude motions. Importantly, such a similar performance in terms of overlap is obtained with much less computational cost. Since the computational complexity of the Hessian matrix diagonalization is O ( n 3 ) , the reduction of n by a factor 3 (a tripeptide involves 3 C α atoms) provides a theoretical gain of more than one order of magnitude. This theoretical gain has been confirmed with some experiments. In summary, the time required to compute the normal modes with the tripetide-based model ranges from 0.05 seconds to 0.9 seconds, whereas several minutes may be necessary using the C α model.

Table 2 Comparison between overlap values for Cα-based ENMs and tripeptide-based ENMs

Finding conformational transitions

Experimental setup

The proposed method was applied to compute conformational transition paths for the ten proteins listed in Table 3, and represented in Figure 4. For each protein, at least two experimental structures corresponding to different conformations are available in the Protein Data Bank (PDB) [41]. The difference between these conformations involves large-amplitude domain motions. The ten proteins are varied in size and topology, as well as in the type of domain motions they undergo. This heterogeneity is important to analyze the reliability and scalability of the method.

Table 3 Proteins used in the experiments
Figure 4
figure 4

The ten proteins used in the experiment. Representation of open and closed forms of these proteins available in the PDB (IDs are provided in Table 3).

Each iteration of the algorithm that computes the transition path performs a short RRT exploration, as mentioned in the previous section. In the current implementation, such a local exploration runs until the protein moves 0.3 Å C α -RMSD towards the goal. This distance is gradually reduced to 0.15 Å as the distance to the target conformation decreases. The reason is that the speed of convergence tends to decrease when approaching the target conformation, and recomputing normal modes more frequently provides better results in this situation. If the distance stopping condition is not reached first, the exploration stops after a pre-defined number of iterations (4000 in our case). This additional stopping condition prevents too long runs of RRT in case of blocking situations.

At the end of the RRT exploration, the closest conformation to the goal is identified and submitted to an energy minimization procedure aimed at generating better side-chain conformations. In this work, we have used the AMBER software package [42] for energy minimization.

Results

Table 4 summarizes the results achieved by the proposed method for the set of ten proteins. In this table, Cα-RMSD end is the distance between the goal conformation and the conformation obtained at the end of the iterative path finding process. Time total is overall computing time, which includes the RRT running time (TimeRRT ) and the time for computing the normal modes and running minimizations at the end of each iteration. The number of iterations of the main algorithm (i.e. the number of NMA calculations) is also indicated in the table. Note that, in all the experiments, the RRT exploration takes more than 90% of the total computing time, which corresponds to runs on a single core of an AMD Opteron 148 processor at 2.6 GHz.

Table 4 Performance of the method on ten proteins (cf. Table 3)

In all cases, the method was able to compute the conformational transition, reaching conformations very close to the given goal conformations. Figure 5 shows superimposed structures (structure superimpositions and images have been done using PyMOL [43]) of open and closed forms of the proteins (q init and q goal ), and of the closed form and the last conformation of the computed transition path (q goal and q final ). The distances between the final and goal conformations are below 2 Å (measured using C α -RMSD) for all the tested proteins with the exception of DDT and GroEL. Note that 2 Å RMSD corresponds to the current accuracy of experimental methods for high-resolution protein structure determination. As can be seen in Figure 5 the superimpositions of the final and goal conformations is very good, even for DDT and GroEL. Note that the method could have reached closer conformations to the goal with a higher number of iterations. Nevertheless, the strategy applied in these experiments was to stop iterating when the distance to the goal reached a very slow rate of convergence.

Figure 5
figure 5

Superimposed structures and final conformations of the computed transition path. For each protein, the left image shows the open form (in red) and the closed form (in black), and the right image shows the closed form (in black) and the final conformations of the computed path (in red).

We also conducted experiments to analyze the relationship between the computing time and the size of the protein. Since the lengths of the transition paths for the different test systems is variable, we measured the computing time to move 1Å along these paths. The results of these experiments, presented in Table 5 and Figure 6, show a linear relationship between the computing time and the protein size. This scalability is an interesting property of the method. Note that the performance of the method seems not be (or only slightly) affected by the topology of the protein. This is an important advantage over the method presented in [22], which experienced some difficulties in dealing with relative motions of domains connected by several linkers, mainly because of the internal-coordinate representation of proteins used in this previous work.

Table 5 Relationship between the size of the protein and the computing time
Figure 6
figure 6

Plot of the results in Table 5. The plot shows a linear relationship between the size of the protein and the time required to compute the conformational transition path.

Finally, we did a profiling of the algorithm to identify possible bottlenecks and points to be improved to enhance computational efficiency. Table 6 gives values of the percentage of the time spent in the most time-consuming operations within the RRT exploration: nearest neighbor search (NN), collision checking (CC), inverse kinematics (IK) and random sampling (RS). Surprisingly, nearest neighbor search takes around 60% of the overall computing time. This is due to the brute-force algorithm applied in the current implementation. As mentioned before, a more sophisticated nearest neighbor algorithm should be implemented. The performance of the method could also be enhanced by applying simplified distance metrics (e.g. [16, 44]). The use of an appropriate simplified distance metric could reduce computing time while preserving good exploration properties of the algorithm.

Table 6 Percentage of the time spent performing the main operations in RRT

A closer look at adenylate kinase

Adenylate kinase (ADK) [45] is a widely studied protein involved in signal transduction. The structure of ADK is composed of three domains known as: LID, CORE and NMPbind. Several works tend to show that the LID and NMPbind domains undergo large-amplitude conformational changes with respect to the CORE domain, which remains stable [46, 47]. Some of these works (e.g. [47]) also suggest that the conformational transition between open and closed states of ADK proceeds in two steps: (1) the LID domain moves more clearly than the NMPbind domain at the beginning of the open-to-close transition; (2) then NMPbind domain moves at a faster pace towards the end of the transition path.

The open conformation of ADK (PDB ID 4AKE), the closed conformations (PDB ID 1AKE) of ADK, and several intermediate conformations obtained with our method are represented in Figure 7 The figure shows significant conformational changes of the LID and NMPbind domains, as expected. The motion of these two regions is also illustrated in Figure 8, which represents the displacement of the residues along the conformational transition. Two darker regions, involving residues 20-60 and 130-160, indicate the parts of the protein that undergo larger displacements. These regions correspond to the NMPbind domain and LID domain, approximately. Figure 8 also shows that residues 20-60, corresponding to the NMPbind domain, start moving more significantly near the end of the transition path, whereas residues 130-160, corresponding to the LID domain, start moving at an earlier stage. This reflects the two-step nature of the conformational transition discussed earlier, and shows that our method provides results that are qualitatively comparable with those presented in previous work on ADK.

The open conformation of ADK (PDB IDs 4AKE), the closed conformations (PDB IDs 1AKE) of ADK, and several intermediate conformations obtained with our method are represented in Figure 7 The figure shows significant conformational changes of the LID and NMPbind domains, as expected. The motion of these two regions is also illustrated in Figure 8, which represents the displacement of the residues along the conformational transition. Two darker regions, involving residues 20-60 and 130-160, indicate the parts of the protein that undergo larger displacements. These regions correspond to the NMPbind domain and LID domain, approximately. Figure 8 also shows that residues 20-60, corresponding to the NMPbind domain, start moving more significantly near the end of the transition path, whereas residues 130-160, corresponding to the LID domain, start moving at an earlier stage. This reflects the two-step nature of the conformational transition discussed earlier, and shows that our method provides results that are qualitatively comparable with those presented in previous work on ADK.

Figure 7
figure 7

Different conformations of ADK along the studied conformational transition. The LID domain is shown in blue and the NMPbind domain is shown in red. Images (a) and (f) represent the start and goal conformations respectively. Images (b) to (e) show intermediate conformations generated by our method.

Figure 8
figure 8

Displacement of the residues along the conformational transition of ADK. The plot shows, using a gray-scale, the displacement of each residue at each iteration relative to the previous iterations. Darker regions represent larger displacements.

We have also compared intermediate conformations in the computed transition path of the ADK to a small number of other experimentally solved structures of this protein. These structures correspond to homolog proteins or mutants with very high sequence identity, and some of them are known to be intermediate structures between open and closed forms of the protein. Interestingly, four of these structures are very close to conformations along the transition path. Table 7 shows the distance between each of these structures and the closest conformation in the transition path. The table also shows the position of this conformation in the path. More precisely, the table shows the corresponding iteration number and the percentage of the path length. 2RH5 (A) is very close to the conformation generated by the first iteration, whereas 1E4Y (A) is close to the conformation generated by iteration 27 (near the closed structure). 1DVR (A) is also very close to a conformation toward the beginning of the path (near the open structure), whereas 2RH5 (B) is a slightly less open structure. These results are comparable to those provided by previous studies [12, 48], which further validates the proposed method.

Table 7 Known intermediate structures and their distances to the closest conformation in the computed transition path.

Conclusions

This paper has presented an efficient approach for computing large-amplitude conformational transitions in proteins. It exploits the ability of normal modes to predict directions of collective, large-amplitude motions and the efficiency of the RRT algorithm to explore large spaces. The proposed approach also relies on a multi-scale representation of the protein, based on a decomposition into tripeptides, which significantly contributes to the good performance of the method.

Interestingly, first results presented in the paper show that using an ENM based on the coarse-grained tripeptide-based model instead of a Cα-based model preserves the ability of NMA to predict directions of large-amplitude motions, while significantly reducing computing time.

The proposed method was applied to simulate large-amplitude conformational transitions in proteins of different sizes and topologies. Results show a good performance of the method in all the cases. Computing time scales linearly with the number of residues. It ranges from a few hours for medium-size proteins to a few days for very large ones. This computational performance could be significantly improved by the implementation of more sophisticated methods to perform the most time-consuming operations within the RRT algorithm, in particular, nearest neighbor search.

A deeper analysis of the conformational transition between open and closed forms of ADK shows that results provided by the proposed method are qualitatively consistent with results obtained with other computational methods and with experimental data. Nevertheless, it is important to note that the resulting paths are a first approximation, which cannot be used directly for an accurate evaluation of energy variations along conformational transitions. This would require a subsequent refinement and analysis using state-of-the-art energy models and molecular modeling methods. It could also be possible to integrate energy evaluations within the RRT exploration with the aim of obtaining better-quality solutions, at the expense of additional computational cost. An interesting extension that could be investigated is to use T-RRT [49, 50], instead of RRT, to compute paths that follow more accurately the valleys of the conformational energy landscape.

In this work, we have shown the ability of the proposed method to compute transition paths between two given conformations of a protein. Nevertheless, the approach could also be applied to more challenging problems, such as the prediction of other (meta-)stable states reachable from a given protein conformation, or the discrimination between probable and improbable transitions. This would require some extensions, mainly in the definition of energy/scoring functions to identify interesting intermediate and meta-stable states, as well as high-energy barriers, during the conformational exploration.

Additional files

Abbreviations

MD:

molecular dynamics

NMA:

normal mode analysis

RRT:

rapidly-exploring random tree

RTB:

rotations-translations of blocks

ENM:

elastic network model

ANM:

anisotropic network model

IK:

inverse kinematics

RMSD:

root mean squared deviation

PDB:

Protein Data Bank.

References

  1. Voter AF: A method for accelerating the molecular dynamics simulation of infrequent events. J Chem Phys 1997, 106(11):4665–4677. 10.1063/1.473503

    Article  CAS  Google Scholar 

  2. Izrailev S, Stepaniants S, Isralewitz B, Kosztin D, Lu H, Molnar F, Wriggers W, Schulten K: Steered molecular dynamics In Computational Molecular Dynamics: Challenges, Methods, Ideas. Springer-Verlag; 1998:39–65.

    Google Scholar 

  3. Sørensen RR, Voter AF: Temperature-accelerated dynamics for simulation of infrequent events. J Comput Phys 2000, 112: 9599–9606.

    Google Scholar 

  4. Laio A, Parrinello M: Escaping free-energy minima. Proc Natl Acad Sci USA 2002, 99(20):12562–12566. 10.1073/pnas.202427399

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  5. Hamelberg D, Morgan J, McCammon JA: Accelerated molecular dynamics: a promising and efficient simulation method for biomolecules. J Chem Phys 2004, 120: 11919–11929. 10.1063/1.1755656

    Article  CAS  PubMed  Google Scholar 

  6. Shaw DE, Maragakis P, Lindorff-Larsen K, Piana S, Dror RO, Eastwood MP, Bank JA, Jumper JM, Salmon JK, Shan Y, Wriggers W: Atomic-level characterization of the structural dynamics of proteins. Science 2010, 330(6002):341–346. 10.1126/science.1187409

    Article  CAS  PubMed  Google Scholar 

  7. Mills G, Jónsson H: Quantum and thermal effects in H2 dissociative adsorption: Evaluation of free energy barriers in multidimensional quantum systems. Phys Rev Lett 1994, 72: 1124–1127. 10.1103/PhysRevLett.72.1124

    Article  CAS  PubMed  Google Scholar 

  8. E W, Ren W, Vanden-Eijnden E: String method for the study of rare events. Phys Rev B. 2002, 66: 052301.

    Article  Google Scholar 

  9. Bolhuis PG, Chandler D, Dellago C, Geissler PL: Transition path sampling and the calculation of rate constants. Annu Rev Phys Chem. 2002, 53: 291–318. 10.1146/annurev.physchem.53.082301.113146

    Article  CAS  PubMed  Google Scholar 

  10. Cortés J, Siméon T, Ruiz de Angulo V, Guieysse D, Remaud-Siméon M, Tran V: A path planning approach for computing large-amplitude motions of flexible molecules. Bioinformatics 2005, 21(suppl 1):i116-i125. 10.1093/bioinformatics/bti1017

    Article  PubMed  Google Scholar 

  11. Raveh B, Enosh A, Schueler-Furman O, Halperin D: Rapid sampling of molecular motions with prior information constraints. PLoS Comput Biol. 2009, 5(2):e1000295. 10.1371/journal.pcbi.1000295

    Article  PubMed Central  PubMed  Google Scholar 

  12. Haspel N, Moll M, Baker M, Chiu W, Kavraki LE: Tracing conformational changes in proteins. BMC Struct Biol. 2010, 10(Suppl 1):S1. 10.1186/1472-6807-10-S1-S1

    Article  PubMed Central  PubMed  Google Scholar 

  13. Al-Bluwi I, Siméon T, Cortés J: Motion planning algorithms for molecular simulations: a survey. Comput Sci Rev. 2012, 6(4):125–143. 10.1016/j.cosrev.2012.07.002

    Article  Google Scholar 

  14. Kim MK, Jernigan RL, Chirikjian GS: Rigid-cluster models of conformational transitions in macro-molecular machines and assemblies. Biophys J 2005, 89: 43–55. 10.1529/biophysj.104.044347

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  15. Thomas S, Tang X, Tapia L, Amato NM: Simulating protein motions with rigidity analysis. J Comput Biol. 2007, 14(6):839–855. 10.1089/cmb.2007.R019

    Article  CAS  PubMed  Google Scholar 

  16. Plaku E, Stamati H, Clementi C, Kavraki L: Fast and reliable analysis of molecular motion using proximity relations and dimensionality reduction. Proteins 2007, 67(4):897–907. 10.1002/prot.21337

    Article  CAS  PubMed  Google Scholar 

  17. Cui Q, Bahar I: Normal mode analysis: theory and applications to biological and chemical systems. Chapman and Hall/CRC mathematical and computational biology series, Chapman & Hall/CRC; 2006.

    Google Scholar 

  18. Brooks B, Karplus M: Normal modes for specific motions of macromolecules: application to the hinge-bending mode of lysozyme. Proc Natl Acad Sci USA 1985, 82(15):4995–4999. 10.1073/pnas.82.15.4995

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  19. Hinsen K: Analysis of domain motions by approximate normal mode calculations. Proteins 1998, 33(3):417–429. 10.1002/(SICI)1097-0134(19981115)33:3<417::AID-PROT10>3.0.CO;2-8

    Article  CAS  PubMed  Google Scholar 

  20. Tama F, Sanejouand YH: Conformational change of proteins arising from normal mode calculations. Protein Eng 2001, 14: 1–6. 10.1093/protein/14.1.1

    Article  CAS  PubMed  Google Scholar 

  21. Alexandrov V, Lehnert U, Echols N, Milburn D, Engelman D, Gerstein M: Normal modes for predicting protein motions: a comprehensive database assessment and associated Web tool. Protein Sci 2005, 14(3):633–643. 10.1110/ps.04882105

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  22. Kirillova S, Cortés J, Stefaniu A, Siméon T: An NMA-guided path planning approach for computing large-amplitude conformational changes in proteins. Proteins 2008, 70: 131–143.

    Article  CAS  PubMed  Google Scholar 

  23. LaValle SM, Kuffner JJ: Rapidly-exploring random trees: progress and prospects. In Algorithmic and Computational Robotics: New Directions. Edited by: Donald B, Lynch K, Rus D, Boston. A.K Peters; 2001:293–308.

    Google Scholar 

  24. Siciliano B, Khatib O: Springer Handbook of Robotics. Springer; 2008.

    Book  Google Scholar 

  25. Tama F, Gadea FX, Marques O, Sanejouand YH: Building-block approach for determining low-frequency normal modes of macromolecules. Proteins 2000, 41: 1–7. 10.1002/1097-0134(2000)41:4+<1::AID-PROT10>3.0.CO;2-2

    Article  CAS  PubMed  Google Scholar 

  26. Al-Bluwi I, Vaisset M, Siméon T, Cortés J: Coarse-grained elastic networks, normal mode analysis and robotics-inspired methods for modeling protein conformational transitions. Bioinformatics and Biomedicine Workshops (BIBMW), 2012 IEEE International Conference on: 4–7 October 2012 2012, 40–47. 10.1109/BIBMW.2012.6470359

    Chapter  Google Scholar 

  27. Atilgan AR, Durell SR, Jernigan RL, Demirel MC, Keskin O, Bahar I: Anisotropy of fluctuation dynamics of proteins with an elastic network model. Biophys J. 2001, 80: 505–515. 10.1016/S0006-3495(01)76033-X

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  28. Tirion MM: Large amplitude elastic motions in proteins from a single-parameter, atomic analysis. Phys Rev Lett 1996, 77(9):1905–1908. 10.1103/PhysRevLett.77.1905

    Article  CAS  PubMed  Google Scholar 

  29. Kim MK, Jernigan RL, Chirikjian GS: Efficient generation of feasible pathways for protein conformational transitions. Biophys J 2002, 83(3):1620–1630. 10.1016/S0006-3495(02)73931-3

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  30. Tama F, Miyashita O, Brooks CL III: Normal mode based flexible fitting of high-resolution structure into low-resolution experimental data from cryo-EM. J Struc Biol 2004, 147(3):315–326. 10.1016/j.jsb.2004.03.002

    Article  CAS  Google Scholar 

  31. Cavasotto CN, Kovacs JA, Abagyan RA: Representing receptor flexibility in ligand docking through relevant normal modes. J Am Chem Soc 2005, 127(26):9632–9640. 10.1021/ja042260c

    Article  CAS  PubMed  Google Scholar 

  32. Mouawad L, Perahia D: Motions in hemoglobin studied by normal mode analysis and energy minimization: evidence for the existence of tertiary T-like, quaternary R-like intermediate structures. J Mol Biol 1996, 258(2):393–410. 10.1006/jmbi.1996.0257

    Article  CAS  PubMed  Google Scholar 

  33. Schuyler AD, Chirikjian GS: Efficient determination of low-frequency normal modes of large protein structures by cluster-NMA. J Mol Graph Model 2005, 24: 46–58. 10.1016/j.jmgm.2005.05.002

    Article  CAS  PubMed  Google Scholar 

  34. Demerdash ONA, Mitchell JC: Density-cluster NMA: a new protein decomposition technique for coarse-grained normal mode analysis. Proteins 2012, 80(7):1766–1779.

    CAS  PubMed  Google Scholar 

  35. Eyal E, Yang LW, Bahar I: Anisotropic network model: systematic evaluation and a new web interface. Bioinformatics 2006, 22(21):2619–2627. 10.1093/bioinformatics/btl448

    Article  CAS  PubMed  Google Scholar 

  36. Renaud M: A simplified inverse kinematic model calculation method for all 6R type manipulators. In Current Advances in Mechanical Design and Production VII. Edited by: Hassan MF, Megahed SM. New York: Pergamon; 2000:57–66.

    Google Scholar 

  37. Lee HY, Liang CG: A new vector theory for the analysis of spatial mechanisms. Mech Mach Theory 1988, 23(3):209–217. 10.1016/0094-114X(88)90106-1

    Article  Google Scholar 

  38. Manocha D, Canny JF: Efficient inverse kinematics for general 6R manipulators. IEEE Trans Robot Autom. 1994, 10(5):648–657. 10.1109/70.326569

    Article  Google Scholar 

  39. Coutsias EA, Seok C, Jacobson MP, Dill KA: A kinematic view of loop closure. J Comput Chem. 2004, 25(4):510–528. 10.1002/jcc.10416

    Article  CAS  PubMed  Google Scholar 

  40. Plaku E, Kavraki LE: Quantitative analysis of nearest-neighbors search in high-dimensional sampling-based motion planning. In Algorithmic Foundations of Robotics VII. Edited by: Akella S, Amato NM, Huang WH, Mishra B,. Berlin: Springer-Verlag; 2008:3–18.

    Chapter  Google Scholar 

  41. Research Collaboratory for Structural Bioinformatics PDB[http://www.rcsb.org/pdb/]

  42. Case DA, Darden TA, Cheatham TE III, Simmerling CL, Wang J, Duke RE, Luo R, Merz KM, Pearlman DA, Crowley M, et al.: AMBER 9. San Francisco: University of California; 2006.

    Google Scholar 

  43. The PyMOL Molecular Graphics System, Version 1.5, Schrödinger, LLC

  44. Shehu A, Olson B: Guiding the search for native-like protein conformations with an ab-initio tree-based exploration. Int J Robot Res 2010, 29(8):1106–1127. 10.1177/0278364910371527

    Article  Google Scholar 

  45. Müller CW, Schulz GE: Structure of the complex between adenylate kinase from Escherichia coli and the inhibitor Ap5A refined at 1.9 Å resolution. A model for a catalytic transition state. J Mol Biol 1992, 224: 159–177. 10.1016/0022-2836(92)90582-5

    Article  PubMed  Google Scholar 

  46. Müller CW, Schlauderer GJ, Reinstein J, Schulz GE: Adenylate kinase motions during catalysis: an energetic counterweight balancing substrate binding. Structure 1996, 4(2):147–156. 10.1016/S0969-2126(96)00018-4

    Article  PubMed  Google Scholar 

  47. Maragakis P, Karplus M: Large amplitude conformational change in proteins explored with a plastic network model: adenylate kinase. J Mol Biol 2005, 352: 807–822. 10.1016/j.jmb.2005.07.031

    Article  CAS  PubMed  Google Scholar 

  48. Feng Y, Yang L, Kloczkowski A, Jernigan RL: The energy profiles of atomic conformational transition intermediates of adenylate kinase. Proteins 2009, 77(3):551–558. 10.1002/prot.22467

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  49. Jaillet L, Cortés J, Siméon T: Sampling-based path planning on configuration-space costmaps. IEEE Trans Robot. 2010, 26(4):635–646.

    Article  Google Scholar 

  50. Jaillet L, Corcho FJ, Pérez JJ, Cortés J: Randomized tree construction algorithm to explore energy landscapes. J Comput Chem. 2011, 32(16):3464–3474. 10.1002/jcc.21931

    Article  CAS  PubMed  Google Scholar 

Download references

Acknowledgements

This work has been partially supported by the French National Research Agency (ANR) under project ProtiCAD (project number ANR-12-MONU-0015-01).

Declarations

The publication costs for this article were funded by the French National Research Agency (ANR) underproject ProtiCAD (project number ANR-12-MONU-0015-01).

This article has been published as part of BMC Structural Biology Volume 13 Supplement 1, 2013: Selected articles from the Computational Structural Bioinformatics Workshop 2012. The full contents of the supplement are available online at http://www.biomedcentral.com/bmcstructbiol/supplements/13/S1.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Juan Cortés.

Additional information

Competing interests

The authors declare that there are no competing interests.

Authors' contributions

TS and JC designed this research and supervised the work. IA implemented the method and carried out experiments. MV participated in the software design and implementation. IA and JC wrote the manuscript. All authors have read and approved the manuscript.

Electronic supplementary material

Rights and permissions

Open Access This article is published under license to BioMed Central Ltd. This is an Open Access article is distributed under the terms of the Creative Commons Attribution License ( https://creativecommons.org/licenses/by/2.0 ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver ( https://creativecommons.org/publicdomain/zero/1.0/ ) applies to the data made available in this article, unless otherwise stated.

Reprints and permissions

About this article

Cite this article

Al-Bluwi, I., Vaisset, M., Siméon, T. et al. Modeling protein conformational transitions by a combination of coarse-grained normal mode analysis and robotics-inspired methods. BMC Struct Biol 13 (Suppl 1), S2 (2013). https://doi.org/10.1186/1472-6807-13-S1-S2

Download citation

  • Published:

  • DOI: https://doi.org/10.1186/1472-6807-13-S1-S2

Keywords