Log on / register
Feedback | Support | My details
Open AccessMethodology article

Protein structure similarity from principle component correlation analysis

Xiaobo Zhou1,2 email, James Chou3 email and Stephen TC Wong1,2 email

Harvard Center for Neurodegeneration and Repair – Center for Bioinformatics, Harvard Medical School, 1249 Boylston Street, Boston, MA 02215, USA

Functional and Molecular Imaging Center, Radiology Department, Brigham and Women's Hospital, One Brigham Circle, 1620 Tremont Street, Boston, MA 02121, USA

Department of Biological Chemistry and Molecular Pharmacology, Harvard Medial School, 240 Longwood Avenue, Boston, MA 02115, USA

author email corresponding author email

BMC Bioinformatics 2006, 7:40doi:10.1186/1471-2105-7-40

Published: 25 January 2006

Abstract

Background

Owing to rapid expansion of protein structure databases in recent years, methods of structure comparison are becoming increasingly effective and important in revealing novel information on functional properties of proteins and their roles in the grand scheme of evolutionary biology. Currently, the structural similarity between two proteins is measured by the root-mean-square-deviation (RMSD) in their best-superimposed atomic coordinates. RMSD is the golden rule of measuring structural similarity when the structures are nearly identical; it, however, fails to detect the higher order topological similarities in proteins evolved into different shapes. We propose new algorithms for extracting geometrical invariants of proteins that can be effectively used to identify homologous protein structures or topologies in order to quantify both close and remote structural similarities.

Results

We measure structural similarity between proteins by correlating the principle components of their secondary structure interaction matrix. In our approach, the Principle Component Correlation (PCC) analysis, a symmetric interaction matrix for a protein structure is constructed with relationship parameters between secondary elements that can take the form of distance, orientation, or other relevant structural invariants. When using a distance-based construction in the presence or absence of encoded N to C terminal sense, there are strong correlations between the principle components of interaction matrices of structurally or topologically similar proteins.

Conclusion

The PCC method is extensively tested for protein structures that belong to the same topological class but are significantly different by RMSD measure. The PCC analysis can also differentiate proteins having similar shapes but different topological arrangements. Additionally, we demonstrate that when using two independently defined interaction matrices, comparison of their maximum eigenvalues can be highly effective in clustering structurally or topologically similar proteins. We believe that the PCC analysis of interaction matrix is highly flexible in adopting various structural parameters for protein structure comparison.


© 1999-2009 BioMed Central Ltd unless otherwise stated. Part of Springer Science+Business Media.