Email updates

Keep up to date with the latest news and content from BMC Bioinformatics and BioMed Central.

Open Access Research article

Motif kernel generated by genetic programming improves remote homology and fold detection

Tony Håndstad1, Arne JH Hestnes1 and Pål Sætrom12*

Author Affiliations

1 Department of Computer and Information Science, Norwegian University of Science and Technology, NO-7052, Trondheim, Norway

2 Interagon AS, Laboratoriesenteret, NO-7006 Trondheim, Norway

For all author emails, please log on.

BMC Bioinformatics 2007, 8:23  doi:10.1186/1471-2105-8-23

Published: 25 January 2007

Abstract

Background

Protein remote homology detection is a central problem in computational biology. Most recent methods train support vector machines to discriminate between related and unrelated sequences and these studies have introduced several types of kernels. One successful approach is to base a kernel on shared occurrences of discrete sequence motifs. Still, many protein sequences fail to be classified correctly for a lack of a suitable set of motifs for these sequences.

Results

We introduce the GPkernel, which is a motif kernel based on discrete sequence motifs where the motifs are evolved using genetic programming. All proteins can be grouped according to evolutionary relations and structure, and the method uses this inherent structure to create groups of motifs that discriminate between different families of evolutionary origin. When tested on two SCOP benchmarks, the superfamily and fold recognition problems, the GPkernel gives significantly better results compared to related methods of remote homology detection.

Conclusion

The GPkernel gives particularly good results on the more difficult fold recognition problem compared to the other methods. This is mainly because the method creates motif sets that describe similarities among subgroups of both the related and unrelated proteins. This rich set of motifs give a better description of the similarities and differences between different folds than do previous motif-based methods.