Log on / register
Feedback | Support | My details

This article is part of the supplement: Selected Proceedings of Machine Learning in Systems Biology: MLSB 2007 .

Open AccessProceedings

Towards a semi-automatic functional annotation tool based on decision-tree techniques

Jérôme Azé1* email, Lucie Gentils1* email, Claire Toffano-Nioche1 email, Valentin Loux2 email, Jean-François Gibrat2 email, Philippe Bessières2 email, Céline Rouveirol3 email, Anne Poupon4 email and Christine Froidevaux1 email

LRI – CNRS UMR 8623 – University Paris-Sud 11, F-91405 Orsay Cedex, France

INRA, Unité Mathématique, Informatique et Génome UR1077, F-78352 Jouy-en-Josas, France

LIPN – UMR CNRS 7030 – Institut Galilée – University Paris-Nord, F-93430 Villetaneuse, France

IBBMC – CNRS UMR 8619 – University Paris-Sud 11, F-91405 Orsay Cedex, France

author email corresponding author email* Contributed equally

BMC Proceedings 2008, 2(Suppl 4):S3

Published: 17 December 2008

Abstract

Background

Due to the continuous improvements of high throughput technologies and experimental procedures, the number of sequenced genomes is increasing exponentially. Ultimately, the task of annotating these data relies on the expertise of biologists. The necessity for annotation to be supervised by human experts is the rate limiting step of the data analysis. To face the deluge of new genomic data, the need for automating, as much as possible, the annotation process becomes critical.

Results

We consider annotation of a protein with terms of the functional hierarchy that has been used to annotate Bacillus subtilis and propose a set of rules that predict classes in terms of elements of the functional hierarchy, i.e., a class is a node or a leaf of the hierarchy tree. The rules are obtained through two decision-trees techniques: first-order decision-trees and multilabel attribute-value decision-trees, by using as training data the proteins from two lactic bacteria: Lactobacillus sakei and Lactobacillus bulgaricus. We tested the two methods, first independently, then in a combined approach, and evaluated the obtained results using hierarchical evaluation measures. Results obtained for the two approaches on both genomes are comparable and show a good precision together with a high prediction rate. Using combined approaches increases the recall and the prediction rate.

Conclusion

The combination of the two approaches is very encouraging and we will further refine these combinations in order to get rules even more useful for the annotators. This first study is a crucial step towards designing a semi-automatic functional annotation tool.


© 1999-2009 BioMed Central Ltd unless otherwise stated. Part of Springer Science+Business Media.