Email updates

Keep up to date with the latest news and content from BMC Bioinformatics and BioMed Central.

Open Access Methodology article

MOTIPS: Automated Motif Analysis for Predicting Targets of Modular Protein Domains

Hugo YK Lam1, Philip M Kim28*, Janine Mok39, Raffi Tonikian45, Sachdev S Sidhu45, Benjamin E Turk6, Michael Snyder103 and Mark B Gerstein127*

Author Affiliations

1 Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT 06520, USA

2 Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT 06520, USA

3 Department of Molecular, Cellular and Developmental Biology, Yale University, New Haven, CT 06520, USA

4 Department of Molecular Genetics, University of Toronto, Toronto, Ontario, M5S 1A8, Canada

5 Banting and Best Department of Medical Research, University of Toronto, Toronto, Ontario, M5G 1L6, Canada

6 Department of Pharmacology, Yale University, New Haven, CT 06520, USA

7 Department of Computer Science, Yale University, New Haven, CT 06520, USA

8 Current Address: Terrence Donnelly Centre for Cellular and Biomolecular Research, University of Toronto, Toronto, Ontario, M5S 3E1, Canada

9 Current Address: Stanford Genome Technology Center, Department of Biochemistry, Stanford University, Palo Alto, CA 94304, USA

10 Current Address: Department of Genetics, Stanford University, Palo Alto, CA 94305, USA

For all author emails, please log on.

BMC Bioinformatics 2010, 11:243  doi:10.1186/1471-2105-11-243

Published: 11 May 2010

Abstract

Background

Many protein interactions, especially those involved in signaling, involve short linear motifs consisting of 5-10 amino acid residues that interact with modular protein domains such as the SH3 binding domains and the kinase catalytic domains. One straightforward way of identifying these interactions is by scanning for matches to the motif against all the sequences in a target proteome. However, predicting domain targets by motif sequence alone without considering other genomic and structural information has been shown to be lacking in accuracy.

Results

We developed an efficient search algorithm to scan the target proteome for potential domain targets and to increase the accuracy of each hit by integrating a variety of pre-computed features, such as conservation, surface propensity, and disorder. The integration is performed using naïve Bayes and a training set of validated experiments.

Conclusions

By integrating a variety of biologically relevant features to predict domain targets, we demonstrated a notably improved prediction of modular protein domain targets. Combined with emerging high-resolution data of domain specificities, we believe that our approach can assist in the reconstruction of many signaling pathways.