Abstract
Background
The correlated mutations concept is based on the assumption that interacting protein residues coevolve, so that a mutation in one of the interacting counterparts is compensated by a mutation in the other. Approaches based on this concept have been widely used for protein contacts prediction since the 90s. Previously, we have shown that watermediated interactions play an important role in protein interfaces. We have observed that current "dry" correlated mutations approaches might not properly predict certain interactions in protein interfaces due to the fact that they are watermediated.
Results
The goal of this study has been to analyze the impact of including solvent into the concept of correlated mutations. For this purpose we use linear combinations of the predictions obtained by the application of two different similarity matrices: a standard "dry" similarity matrix (DRY) and a "wet" similarity matrix (WET) derived from all watermediated protein interfacial interactions in the PDB. We analyze two datasets containing 50 domains and 10 domain pairs from PFAM and compare the results obtained by using a combination of both matrices. We find that for both intra and interdomain contacts predictions the introduction of a combination of a "wet" and a "dry" similarity matrix improves the predictions in comparison to the "dry" one alone.
Conclusion
Our analysis, despite the complexity of its possible general applicability, opens up that the consideration of water may have an impact on the improvement of the contact predictions obtained by correlated mutations approaches.
Background
The correlated mutations concept was introduced in the 90s [14] and has been widely used for protein contacts prediction [5]. The method is based on the assumption that interacting protein residues coevolve, so that a mutation in one of the interacting counterparts is compensated by a mutation in the other. Therefore, it is possible to introduce an exchange matrix or other measures of similarity for each sequence position in a multiple sequence alignment and to use covariance (correlation coefficient) between two positions to predict if the residues at these positions may establish physical contact in 3D space, and develop contact maps. Several different similarity measures and algorithms have been implemented in the concept of correlated mutations [57]. Most exchange matrices are based either on physicochemical properties of amino acids or on statistical data on the substitutions obtained from multiple sequence alignments [8]. Statistically it is clear that the distribution of distances between the residues at highly correlated positions is shifted towards lower values compared to the distance distribution of all residues. This has been demonstrated in the study of correlated mutations for residues within one protein domain (intradomain), for residues from different domains in multidomain proteins (interdomain intraprotein) [9,10] and in transmembrane proteins [11]. At the same time, attempts to use the concept of correlated mutations to predict thermodynamically coupled residues have suggested that the method is successful only for residues in evolutionary constrained positions [12].
The concept of correlated mutations has been intensively developed recently. The implementation of neural nets into algorithms of contact predictions has allowed to substantially improve the accuracy of the methods in a number of studies [1316]. Also the application of filtering procedures such as the similarity of sequences in a dataset and the number of sequences in multiple sequence alignments, introduction of weights for physicochemical properties of the residue pairs and creation of submultiple sequence alignments were successfully used to increase a true positive ratio of contact predictions [17]. Nowadays, different correlated mutations based approaches yield predictions accuracies in the range of 0.1–0.4 [17] but they are still of little use in the ab initio prediction of protein structure [7].
Previously, we have shown that watermediated interactions play an important role in protein interfaces [18,19]. In particular, we observed that the interfacial residues interacting only through one water molecule (wet spots) are more similar in terms of dynamic and energetic properties to residues in the core of proteins than to residues on the protein surface. Moreover, in our studies interfacial water molecules show significantly longer residence times than water molecules on the protein surface or in bulk solvent, and have been shown to give an indispensable energetic impact on complex formation [19]. In other studies it has been demonstrated that inclusion of solvent term into the Hamiltonian of protein systems has improved folding predictions compared to in vacuo folding models [20]. Also consideration of solvent explicitly in protein docking approaches has recently shown promising results [21]. In addition, we have observed that water molecules in protein interfaces may contribute to the conservation of interactions by allowing more sequence variability in the interacting partners. In particular, we have observed watermediated interactions in protein complex interfaces that are not predicted by "dry" correlated mutations approaches [19]. Interestingly, in one of the recent studies on correlated mutations, protein contacts prediction has been shown to be more accurate for protein cores than for the whole protein [22]. This could be partly explained by a higher conservation of residue contacts in protein cores, especially the hydrophobic ones [23] and probably also by the fact that the participation of solvent in protein contacts is being ignored.
The goal of this study has been to analyze the impact of including solvent into the concept of correlated mutations. For this purpose, we use a linear combination of predictions obtained by the use of two similarity matrices: a standard and widely used "dry" similarity matrix (DRY) [24] and a "wet" similarity matrix (WET) derived from data on all watermediated proteinprotein interfacial interactions in the PDB [25]. We compare the predictive results obtained with different combinations of these two similarity matrices in terms of number of correctly predicted contacts, accuracy, improvement ratio over random prediction for intradomain contacts and distributions of distances between residues in interdomain pairs.
Our results show that, despite a partial interdependence of both WET and DRY matrices, there is a clear trend pointing that a combination of these two matrices yields improved predictions over the single use of the DRY matrix for both intra and interdomain contacts. The results obtained in this work underline the importance of watermediated interactions in the description of proteinprotein interactions, and that implementing combinations of "dry" and "wet" matrices could possibly improve the results obtained by correlated mutationsbased approaches.
Results and discussion
Residuesolvent relations in proteins
Independently of residue types, we calculated the average ratios between the number of residues found to be in contact with water and all residues in Xray PDB structures. A negligible difference was found between these ratios for interfaces and the whole protein (0.33 and 0.35, respectively). The ratios by residue type (Figure 1 and see additional file 1) correlate with an adjusted squared correlation coefficient R^{2 }= 0.90 (pvalue~10^{10}) and there is also a clear trend of residue ratios distribution in interfaces, which relates to their hydrophilic properties. This agrees with observations obtained from other datasets not including the whole PDB [26]. The better correlation between the ratios and the hydrophilicity index for interfaces compared to the whole protein (R^{2 }= 0.62 pvalue~10^{5 }and R^{2 }= 0.44 pvalue~10^{3}, respectively) could be explained by the fact that the whole protein includes many residues in the core that are not accessible to water. This further supports the evidence that residuesolvent relations in protein interfaces are different from the ones in the proteins as a whole [18,19].
Figure 1. Water contacts of residues in PDB. Fractions of residues found to be in contact with water in protein interfaces (white) and in whole proteins (grey) in the PDB.
Additional file 1. Probabilities for residues to be in contact with water in protein interfaces. Probabilities for residues to be in contact with water in protein interfaces. The probabilities are derived from SCOWLP data for protein interfaces.
Format: DOC Size: 44KB Download file
This file can be viewed with: Microsoft Word Viewer
Relations between the DRY and WET similarity matrices
Both DRY and WET similarity matrices are created in a way that each column or row is a vector, which coordinates correspond to the similarity between certain amino acid residue type and other residue types. It is possible to define whether these vectors are interdependent for both matrices by application of linear regression analysis. The data obtained and averaged for all types of residues are presented in Table 1. High degree of correlation is observed for some vectors, which correspond to hydrophilic residues (excluding Thr and Tyr) and for Ile, Leu, Met, Val, suggesting that these vectors in the matrices are close to be collinear in 20dimensional space. This can be explained by the properties of these residues. In particular, hydrophilic residues interact by electrostatic forces through their polar atoms, and water mediation in this case can only change the electrostatic forces by introducing water dipoles oriented in a way to weaken the initial electric field. For hydrophilic residues there is a correlation between hydrophilicity indexes and colinearity of the corresponding vectors in the DRY and WET matrices, which explains also relatively low colinearity for Tyr and Thr residues in comparison to other hydrophilic residues (additional file 2). Direct and watermediated interactions formed by main chains of Ile, Leu, Met and Val in interfaces have been previously shown to be especially important, whereas other residues that present no correlation have been shown to predominantly participate in sidechain interactions in interfaces [18]. We conclude that the DRY and WET similarity matrices contain partially interdependent information for some of amino acid residues, and the found similarities can be explained by the physicochemical properties of these residues.
Table 1. Correlation between vectors per residue type in the DRY and WET matrices.
Additional file 2. Hydrophilicity index vs correlation for the DRY and WET matrices per residue type. The grey shading highlights two areas resulting from the different trends.
Format: TIFF Size: 14.4MB Download file
Intradomain contacts prediction
Our dataset for intradomain contacts prediction consisted of domains of 50 PFAM protein families (Table 2). The lengths of the reference sequences varied from 30 to 195 residues. Initially we analyzed L, L/2, L/3, L/5 and L/10 best correlated contacts for each family (L is the length of the reference sequence). The number of sequences considered for the multiple sequence alignments was in the range of 20 to 295 sequences. Previous studies have shown that accuracy (ratio between the number of correctly predicted contacts and the number of total predicted contacts) and improvement ratio over random prediction (ratio between accuracy and the probability of predicting a contact by chance) decrease with the increase of the number of analyzed contacts [46]. Table 3 shows accuracy and improvement ratio over random prediction for α = 0.5 (weight for WET matrix prediction when for DRY is 1), which corresponds to the average best accuracy obtained for different numbers of analyzed predicted contacts. The results obtained for other α values followed the same trend (data not shown). Independent of the number of analyzed contacts the best predictions in average did not correspond to α = 0. The obtained values for accuracy and improvement ratio over random prediction are within the ranges obtained by other correlated mutations approaches [17,22]. However, direct quantitative comparison of these methods is not appropriate because of their substantial differences in their residue contacts definitions. In particular, some of these approaches utilize for contact definition (see contact definition in Methods section) a chosen distance cutoff of 6–8 Å between atoms [4,16,17], whereas we use physicochemical properties of protein residues, which results in a ≤ 4 Å cutoff [27].
Table 2. Dataset used for intradomain contact predictions.
Table 3. Prediction parameters dependence on the number of analyzed contacts.
We compared the dependences on α of: i) accuracy, ii) improvement ratio over random prediction, iii) number of correctly predicted contacts (C_{corr}); and, since our dataset is heterogeneous (see high standard deviations in Table 3), we normalized these parameters by the corresponding values at α = 0 (wet prediction ratio). For the purpose of wet prediction ratio comparison at different values of α we found L/2 to be the most appropriate number of contacts. This choice is explained by the fact that the changes in prediction results influenced by α variation become hardly detectable if a smaller number of contacts (C_{total}) is considered for analysis since these changes are limited by low values of C_{total }and, consequently, of correctly predicted contacts (C_{corr}). On the other hand, the increase of C_{total }generally leads to decrease of prediction accuracy and to negligible differences in prediction results corresponding to different α values. Only in 2 out of the 50 families of our dataset best predictions correspond to α = 0 values (Table 2). Maximum values for wet prediction ratio and relative X_{d }(harmonic weighted difference statistic) averaged for the whole dataset are obtained when α = 0.5 and α = 1 (1.19 and 1.29, respectively; Figure 2A, B). This means that, for these values of α, introduction of the WET similarity matrix improves prediction by 20–30% on average. Noticeably, the high values of α ∈ {10, 20} still make the predictions on average better than by the single use of the DRY matrix. For optimal value α = 0.5, absolute values of accuracy and improvement ratio over random prediction averaged for all 50 families increase by 1.4% and 0.19, respectively, in comparison to the single use of the DRY similarity matrix. For each family in the dataset there is an essentially higher increase of accuracy and improvement ratio over random prediction than on average. In some families, wet prediction ratio is improved more than twice (reference structures 1AF7, 1PDA, 8PAZ, 1DMR, 1AS0) and even 4.5 times (reference structure 1WVH) when α > 0. Our results show a significant improvement (20–30% of increase in wet prediction ratio) in predictions by the introduction of the WET similarity matrix in comparison to the single use of the DRY matrix within a correlated mutations approach. We observe that for sequence separations ij > 6, 12, 24 our results follow the same trend. The obtained results for α = 0.5 for different number of contacts (L, L/2, L/3, L/5, L/10) are shown in Table 4. We observe that the best predictions correspond to α = 0.2 and 0.5 for most of sequence separation values and number of contacts. Wet prediction ratios for the whole range of analyzed α are presented in a figure in supplementary material (additional file 3). In all cases, independently of sequence separation and number of contacts, the best predictions correspond to α > 0.
Figure 2. Dependence on α of relative prediction characteristics for the intradomain dataset. A) Wet prediction ratio. B) Relative harmonic weighted difference statistic (X_{d}).
Table 4. Accuracy, improvement ratio over random prediction and wet prediction ratio for different sequence separations.
Additional file 3. Dependence on α of wet prediction ratio for the intradomain dataset with sequence separation. Sequence separation: A) 6. B) 12. C) 24.
Format: TIFF Size: 3.3MB Download file
Interdomain contacts prediction
The interdomain dataset used for our studies consisted of 10 different pairs of interacting domains (Table 5). From the analysis of the (L_{1}+L_{2})/2 predicted interdomain residue contacts (L_{1 }and L_{2 }are the lengths of the sequences in each of the two domains) we observed that in 9 out of 10 cases best predictions in terms of X_{d }were obtained when both the WET and DRY matrices were used. Relative X_{d }averaged for the whole dataset reaches a maximum value of 1.32 at α = 0.2 and then decreases with the further increase of α (Figure 3). In one of the examples (SH2SH3 domains interaction) the differences of distance distributions for different α values are dramatic (Figure 4). In this case the X_{d }value for predicted contacts at α = 0 and α = 0.2 changes almost twice (Table 5). These results point out that the use of the WET similarity matrix might improve the statistic X_{d }in comparison to the single use of the DRY similarity matrix.
Figure 3. Predictions for interdomain dataset. Relative harmonic weighted difference statistic (X_{d}) dependence on α.
Figure 4. Proportion of residue pairs at distance bins for the interaction SH2SH3. All residue pairs are shown in black, correlated pairs with α = 0 in white, and correlated pairs with α = 0.2 in grey. Reference structure used is PDB ID 2SRC.
Table 5. Dataset used for interdomain contact predictions.
Dependence of relative average X_{d }on α for interdomain contacts prediction (Figure 3) resembles the one obtained for intradomain prediction (Figure 2B) but they differ in the optimal α and in the X_{d }corresponding to the higher α values. While in predictions of intradomain contacts all values of α > 0 lead to the improvement of contact predictions, in the case of interdomain contacts prediction the use of the WET similarity matrix yields higher X_{d }than the DRY alone when α ∈ {0.1,0.2}. This might be due to the differences in distance distributions between the analyzed pairs of residues, which are closer to each other in the case of intradomain contacts. Nevertheless, introduction of the WET similarity matrix improves contact prediction compared to the single use of the DRY similarity matrix for both intra and interdomain contacts. Although there are still significant limitations for practical use of the correlated mutations approach for interdomain contacts prediction, also mentioned by other authors [5,9], we believe that consideration of water by the use of "wet" similarity matrices could improve the results obtained by correlated mutations approaches.
Conclusion
This study is the first investigating the impact of inclusion of solvent into the concept of correlated mutations. With this work we further demonstrate our previous observations that relations between solvent and protein residues in protein interfaces differ from those in the whole protein. Recent work on bond preferences in inter versus intraprotein interactions highlights the different architecture of protein interfaces and their unique bond preferences [28].
Two similarity matrices have been used in this work: the McLachlan matrix as the DRY similarity matrix and a WET similarity matrix derived by statistical analysis of the frequency of water contacts by residue type in protein interfaces in the whole PDB. Analysis of the DRY and WET similarity matrices shows that they are interdependent for some residue types, which could be explained by physicochemical properties of individual amino acid residues. We analyze two datasets containing 50 domains and 10 domain pairs belonging to PFAM families. We sum the predictions obtained by the use of both matrices with different weight coefficients and find optimal combinations for best predictions. Our datasets are heterogeneous to propose one best weight value to be able to apply the optimized method to all domain families; however, the prediction of contacts obtained by the introduction of the WET similarity matrix is improved for most of the families in the datasets (for both intra and interdomain) as well as on average (by 20–30%). Our analysis of solvent impact on contact prediction in proteins suggests that further development of the correlated mutations concept would benefit from taking into account solvent as an active participant in proteinprotein interactions, which is usually overlooked in these studies.
Methods
Dataset and multiple sequence alignments
We based the generation of our dataset on previous similar studies [4,9,22]. Our dataset includes 50 domains and 10 domain pairs extracted from the PFAM database [29]. Consecutive increase of the size of our dataset for intradomain contacts did not significantly change our results.
For most of the families, only seed sequences were used, except for the cases when the number of seed sequences was less than 20. Datasets with a smaller number of sequences are not supposed to be useful in correlated mutations analysis [22]. The reference sequence (corresponding to the structure used for predictions evaluation) was added to the set of sequences, if this did not already contain it, following the same procedure that Eyal and coworkers used for obtaining a substitution matrix for protein structure prediction purposes [22]. Multiple sequence alignments were obtained with CLUSTALW [30]. Sequences with more than 95% of identity were not taken into account.
For the interdomain dataset the sequences from the two domain families were aligned independently. Except for the case of immunoglobulins, where light and heavy chains were used as two interacting domains, all interdomain entries in the dataset contained pairs of two different PFAM domains. Reference structures had resolution ≤ 2.0 Å except for five of them (1BU1 and 1A19 taken from the Eyal et al dataset and 2HB2, 1WMG, 1ZWW taken into account to enrich the dataset with bigger domains and highly represented families).
Source and analysis of atomic data on protein structures
An inhouse relational database of protein structures (XMLRPDB) and the SCOWLP database [25,27] were used to obtain interaction information including solvent from Xray structures in the PDB.
Contact definition
Residue contacts in a reference structure were defined by following the physicochemical criteria from SCOWLP [27]. We considered a 3.2 Å donoracceptor distance for hydrogen bonds, 4 Å for salt bridges, and van der Waals radii for van der Waals interactions.
Similarity matrices
We used the McLachlan similarity matrix (based on structural and genetic similarities of amino acids) as a "dry" matrix (DRY) [24]. To build a "wet" matrix (WET) we extracted information on protein interfacial residues and solvent from all available Xray PDB structures using the SCOWLP database [25,27]. In this database, three classes of interacting residues are defined based on their interactions: dry (direct interaction), dual (direct and watermediated interactions), and wet spots (residues interacting only through one water molecule). For each type of amino acid residue the probability of participation in watermediated interactions (by establishing hydrogen bond by main chain or side chain) in protein interfaces was calculated as:
p_{i }= N_{i, w}/N_{i, total }(Figure 1), where i corresponds to any of the 20 amino acids; N_{i, w }is the number of the residues of this type forming wet spots or dual interactions; and N_{i, total}is the total number of residues of this type participating in interfaces in all PDB structures. Each element of the WET similarity matrix was then defined as:
WET_{ij }= 1p_{i}p_{j}, where i and j correspond to any of the 20 amino acids.
The fact that for the creation of the wet matrix we take low resolution structures containing either none or few water molecules into account when considering the whole PDB does not bias the WET matrix because it affects each probability proportionally.
Correlation coefficient calculations
For both DRY and WET similarity matrices the corresponding covariance matrices were calculated as previously described (Göbel et al 1994) using the formula:
, where N is the number of sequences; i and j are sequence position numbers; S_{ikl }is a value from the similarity matrix (DRY or WET); S_{i }is the mean of S_{ikl}; σ_{i }is the standard deviation of S_{ikl}; and W_{kl }is a weight matrix defined as:
, where L is the sequence length; R_{ik }and R_{il }are the residue types at position i in the sequences k and l, respectively; and δ is Kronecker delta [31].
For the interdomain dataset the weight matrix W_{kl }was calculated as an average for the domains and weighted by sequence length. The positions with more than 10% of gaps as well as completely conserved positions were not included in the calculations (zero was assigned to the corresponding correlation coefficient). After calculating covariance matrices based on the DRY and WET similarity matrices, we built their linear combinations:
r_{ij }= r_{ij DRY }+ α·r_{ij WET}, where α takes values from {0, 0.1, 0.2, 0.5, 1, 2, 4, 10, 20}, so that the weight ratio between the impact of DRY and WET represents the range from completely dry (α = 0) to extremely WETbiased covariance (α = 20).
Evaluation of intradomain predictions
For evaluation of intradomain contacts predictions we used previously described methodology [4]. Sequence separation of 0, 6, 12 and 24 was used. Prediction accuracy was defined as the ratio between the number of correctly predicted contacts (C_{corr}) and total number of predicted contacts (C_{tot}). Random accuracy corresponds to the probability of correct prediction of the contact by chance and is equal to the ratio between experimentally observed contacts (C_{obs}) and maximum number of possible contacts. The ratio between accuracy and random accuracy was introduced as improvement ratio over random prediction. Wet prediction ratio is equal to accuracy normalized by the accuracy obtained by using only the DRY matrix (α = 0). For the reference structures C_{corr }was taken as the number of contacts defined by SCOWLP criteria (see the Contact definition section in Methods).
Distance calculation and harmonic average (X_{d})
In the analysis of interdomain contacts the accuracy calculated in the same way as for intradomain contacts (typical value C_{obs}~10^{2}) is expected to be at least one order of magnitude lower (typical value C_{obs}~10^{1}). That is why comparison of accuracy, improvement ratio over random prediction and C_{corr }as functions of α is not appropriate in this case. It has been shown that the distribution of distances between the correlated pairs is shifted to lower values compared to the distribution of distances for all residue pairs in two domains [9]. In our study we use a harmonic weighted difference statistic X_{d }described before [9]:
, where n is the number of distance bins; d_{i }is the upper limit for each bin normalized to the maximum value of the distributed distances; P_{ic }is the percentage of the analyzed correlated pairs at the distances between d_{i }and d_{i1}; and P_{ia }is the same percentage for all pairs of residues. The width of bin was 4 Å. The higher the X_{d }value, the more successful a prediction is.
Different definitions for the distance between residues resulted in all cases in the same trends and quantitatively only slightly affected X_{d }values. For interdomain pairs we used distances between the centers of mass of residues in order not to be biased to either mainchain or sidechain contacts.
For X_{d }calculations we took the best L/2 contacts for intradomain and (L_{1}+L_{2})/2 contacts for interdomain contact predictions, where L_{1 }and L_{2 }are the reference sequences of the two interacting domains.
Although both the wet prediction ratio and X_{d }characterize the predictive power of the method, it is irrelevant to compare the results obtained for these parameters with each other. The same applies to α values corresponding to best predictions.
Statistical analysis
Statistical analysis of data was carried out with the Rpackage [32].
Authors' contributions
SAS developed and implemented the WET similarity matrix and performed all the analysis. JT obtained the data from SCOWLP used for this work. GA obtained the data from XMLRPDB used for this work. SAS and MTP wrote the manuscript. MTP designed and supervised the project. All authors have read and approved the final manuscript.
Acknowledgements
Our group is funded by the Klaus Tschira Stiftung (KTS).
References

Gregoret L, Sauer R: Additivity of Mutant Effects Assessed by Binomial Mutagenesis.
PNAS 1993, 90(9):42464250. PubMed Abstract  Publisher Full Text  PubMed Central Full Text

Lee C, Levitt M: Accurate prediction of the stability and activity effects of sitedirected mutagenesis on a protein core.
Nature 1991, 352(6334):448451. PubMed Abstract  Publisher Full Text

Wells JA: Additivity of mutational effects in proteins.
Biochemistry 1990, 29(37):85098517. PubMed Abstract  Publisher Full Text

Göbel U, Sander C, Schneider R, Valencia A: Correlated mutations and residue contacts in proteins.
Proteins 1994, 18(4):309317. PubMed Abstract  Publisher Full Text

Halperin I, Wolfson H, Nussinov R: Correlated mutations: Advances and limitations. A study on fusion proteins and on the CohesinDockerin families.
Proteins 2006, 60(2):832845. Publisher Full Text

Fodor AA, Aldrich RW: Influence of conservation on calculations of amino acid covariance in multiple sequence alignments.
Proteins 2004, 56(2):211221. PubMed Abstract  Publisher Full Text

Horner D, Pirovano W, Pesole G: Correlated substitution analysis and the prediction of amino acid structural contacts.
Brief Bioinform 2007, 9(1):4656. PubMed Abstract  Publisher Full Text

Pokarowski P, Kloczkowski A, Nowakowski S, Pokarowska M, Jernigan R, Kolinski A: Ideal amino acid exchange forms for approximating substitution matrices.
Proteins: Structure, Function, and Bioinformatics 2007, 69(2):379393. Publisher Full Text

Pazos F, HelmerCitterich M, Ausiello G, Valencia A: Correlated mutations contain information about proteinprotein interaction.
J Mol Biol 1997, 271(4):511523. PubMed Abstract  Publisher Full Text

PerezJimenez R, GodoyRuiz R, ParodyMorreale A, IbarraMolero B, SanchezRuiz JM: A simple tool to explore the distance distribution of correlated mutations in proteins.
Biophys Chem 2006, 119(3):240246. PubMed Abstract  Publisher Full Text

Fuchs A, MartinGaliano A, Kalman M, Fleishman S, BenTal N, Frishman D: Coevolving residues in membrane proteins.
Bioinformatics 2007, 23(24):33123319. PubMed Abstract  Publisher Full Text

Fodor AA, Aldrich RW: On evolutionary conservation of thermodynamic coupling in proteins.
J Biol Chem 2004, 279(18):1904619050. PubMed Abstract  Publisher Full Text

Nagl S: Can correlated mutations in protein domain families be used for protein design?
Brief Bioinform 2001, 2(3):279288. PubMed Abstract  Publisher Full Text

Fariselli P, Olmea O, Valencia A, Casadio R: Progress in predicting interresidue contacts of proteins with neural networks and correlated mutations.
Proteins: Structure, Function, and Genetics 2001, 45(S5):157162. Publisher Full Text

Shackelford G, Karplus K: Contact prediction using mutual information and neural nets.
Proteins: Structure, Function, and Bioinformatics 2007, 68(8):159164. Publisher Full Text

Xue B, Faraggi E, Zhou Y: Predicting residueresidue contact maps by a twolayer, integrated neuralnetwork method.

Kundrotas P, Alexov E: Predicting residue contacts using pragmatic correlated mutations method: reducing the false positives.
BMC Bioinformatics 2006, 7:503. PubMed Abstract  BioMed Central Full Text  PubMed Central Full Text

Teyra J, Pisabarro MT: Characterization of interfacial solvent in protein complexes and contribution of wet spots to the interface description.
Proteins: Structure, Function, and Bioinformatics 2007, 67(4):10871095. Publisher Full Text

Samsonov S, Teyra J, Pisabarro T: A molecular dynamics approach to study the importance of solvent in protein interactions.
Proteins: Structure, Function, and Bioinformatics 2008, 73(2):515525. Publisher Full Text

Papoian GA, Ulander J, Eastwood MP, LutheySchulten Z, Wolynes PG: Water in protein structure prediction.
Proc Natl Acad Sci USA 2004, 101(10):33523357. PubMed Abstract  Publisher Full Text  PubMed Central Full Text

van Dijk ADJ, Bonvin AMJJ: Solvated docking: introducing water into the modelling of biomolecular complexes.
Bioinformatics 2006, 22(9):23402347. PubMed Abstract  Publisher Full Text

Eyal E, FrenkelMorgenstern M, Sobolev V, Pietrokovski S: A pairtopair amino acids substitution matrix and its applications for protein structure prediction.
Proteins 2007, 67(1):142153. PubMed Abstract  Publisher Full Text

SchuelerFurman O, Baker D: Conserved residue clustering and protein structure prediction.
Proteins 2003, 52(2):225235. PubMed Abstract  Publisher Full Text

McLachlan AD: Tests for comparing related aminoacid sequences. Cytochrome c and cytochrome c551.
Journal of Molecular Biology 1971, 61(2):409424. PubMed Abstract  Publisher Full Text

Teyra J, PaszkowskiRogacz M, Anders G, Pisabarro T: SCOWLP classification: Structural comparison and analysis of protein binding regions.
BMC Bioinformatics 2008, 9:9. PubMed Abstract  BioMed Central Full Text  PubMed Central Full Text

Yan C, Wu F, Jernigan RL, Dobbs D, Honavar V: Characterization of proteinprotein interfaces.
The protein journal 2008, 27(1):5970. PubMed Abstract  Publisher Full Text  PubMed Central Full Text

Teyra J, Doms A, Schroeder M, Pisabarro MT: SCOWLP: a webbased database for detailed characterization and visualization of protein interfaces.
BMC Bioinformatics 2006., 7(1) PubMed Abstract  Publisher Full Text  PubMed Central Full Text

Cohen M, Reichmann D, Neuvirth H, Schreiber G: Similar chemistry, but different bond preferences in inter versus intraprotein interactions.
Proteins 2008, 72(2):741753. PubMed Abstract  Publisher Full Text

Finn RD, Mistry J, SchusterBäckler B, GriffithsJones S, Hollich V, Lassmann T, Moxon S, Marshall M, Khanna A, Durbin R, et al.: Pfam: clans, web tools and services.
Nucleic Acids Res 2006, 34(Database issue):D247D251. PubMed Abstract  Publisher Full Text  PubMed Central Full Text

Thompson JD, Higgins DG, Gibson TJ: CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, positionspecific gap penalties and weight matrix choice.
Nucleic Acids Res 1994, 22(22):46734680. PubMed Abstract  Publisher Full Text  PubMed Central Full Text

Sander C, Schneider R: Database of homologyderived protein structures and the structural meaning of sequence alignment.
Proteins 1991, 9(1):5668. PubMed Abstract  Publisher Full Text

Rpackage Development Core Team: R: a language and environment for statistical computing. Vienna, Austria; 2006.