Pairwise covariance adds little to secondary structure prediction but improves the prediction of non-canonical local structure
1 Departments of Biology and Computer Science, Center for Biotechnology and Interdisciplinary Studies, Rensselaer Polytechnic Institute, Troy NY, USA
2 Computational Biology and Bioinformatics, Pacific Northwest National Laboratory, Richland, WA, USA
BMC Bioinformatics 2008, 9:429 doi:10.1186/1471-2105-9-429Published: 10 October 2008
Amino acid sequence probability distributions, or profiles, have been used successfully to predict secondary structure and local structure in proteins. Profile models assume the statistical independence of each position in the sequence, but the energetics of protein folding is better captured in a scoring function that is based on pairwise interactions, like a force field.
I-sites motifs are short sequence/structure motifs that populate the protein structure database due to energy-driven convergent evolution. Here we show that a pairwise covariant sequence model does not predict alpha helix or beta strand significantly better overall than a profile-based model, but it does improve the prediction of certain loop motifs. The finding is best explained by considering secondary structure profiles as multivariant, all-or-none models, which subsume covariant models. Pairwise covariance is nonetheless present and energetically rational. Examples of negative design are present, where the covariances disfavor non-native structures.
Measured pairwise covariances are shown to be statistically robust in cross-validation tests, as long as the amino acid alphabet is reduced to nine classes. An updated I-sites local structure motif library that provides sequence covariance information for all types of local structure in globular proteins and a web server for local structure prediction are available at http://www.bioinfo.rpi.edu/bystrc/hmmstr/server.php webcite.