Comparative mapping of sequence-based and structure-based protein domains
1 Physical Biosciences Division, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA
2 Computational Research Division, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA
3 School of Information Sciences and Technology, The Pennsylvania State University, University Park, PA 16802, USA
BMC Bioinformatics 2005, 6:77 doi:10.1186/1471-2105-6-77Published: 25 March 2005
Protein domains have long been an ill-defined concept in biology. They are generally described as autonomous folding units with evolutionary and functional independence. Both structure-based and sequence-based domain definitions have been widely used. But whether these types of models alone can capture all essential features of domains is still an open question.
Here we provide insight on domain definitions through comparative mapping of two domain classification databases, one sequence-based (Pfam) and the other structure-based (SCOP). A mapping score is defined to indicate the significance of the mapping, and the properties of the mapping matrices are studied.
The mapping results show a general agreement between the two databases, as well as many interesting areas of disagreement. In the cases of disagreement, the functional and evolutionary characteristics of the domains are examined to determine which domain definition is biologically more informative.