HoxPred: automated classification of Hox proteins using combinations of generalised profiles
1 Belgian EMBnet Node, Université Libre de Bruxelles – CP 257, Bd du Triomphe, B-1050 Brussels, Belgium
2 Laboratory for Cell Genetics, Vrije Universiteit Brussel, Pleinlaan 2, B-1050 Brussels, Belgium
BMC Bioinformatics 2007, 8:247 doi:10.1186/1471-2105-8-247Published: 12 July 2007
Correct identification of individual Hox proteins is an essential basis for their study in diverse research fields. Common methods to classify Hox proteins focus on the homeodomain that characterise homeobox transcription factors. Classification is hampered by the high conservation of this short domain. Phylogenetic tree reconstruction is a widely used but time-consuming classification method.
We have developed an automated procedure, HoxPred, that classifies Hox proteins in their groups of homology. The method relies on a discriminant analysis that classifies Hox proteins according to their scores for a combination of protein generalised profiles. 54 generalised profiles dedicated to each Hox homology group were produced de novo from a curated dataset of vertebrate Hox proteins. Several classification methods were investigated to select the most accurate discriminant functions. These functions were then incorporated into the HoxPred program.
HoxPred shows a mean accuracy of 97%. Predictions on the recently-sequenced stickleback fish proteome identified 44 Hox proteins, including HoxC1a only found so far in zebrafish. Using the Uniprot databank, we demonstrate that HoxPred can efficiently contribute to large-scale automatic annotation of Hox proteins into their paralogous groups. As orthologous group predictions show a higher risk of misclassification, they should be corroborated by additional supporting evidence. HoxPred is accessible via SOAP and Web interface http://cege.vub.ac.be/hoxpred/ webcite. Complete datasets, results and source code are available at the same site.