Email updates

Keep up to date with the latest news and content from BMC Genomics and BioMed Central.

Open Access Research article

Classifying genes to the correct Gene Ontology Slim term in Saccharomyces cerevisiae using neighbouring genes with classification learning

Heather A Amthauer1* and Costas Tsatsoulis2

Author Affiliations

1 Department of Computer Science, Frostburg State University, Frostburg, Maryland, USA

2 Department of Computer Science and Engineering, University of North Texas, Denton, Texas, USA

For all author emails, please log on.

BMC Genomics 2010, 11:340  doi:10.1186/1471-2164-11-340

Published: 28 May 2010

Abstract

Background

There is increasing evidence that gene location and surrounding genes influence the functionality of genes in the eukaryotic genome. Knowing the Gene Ontology Slim terms associated with a gene gives us insight into a gene's functionality by informing us how its gene product behaves in a cellular context using three different ontologies: molecular function, biological process, and cellular component. In this study, we analyzed if we could classify a gene in Saccharomyces cerevisiae to its correct Gene Ontology Slim term using information about its location in the genome and information from its nearest-neighbouring genes using classification learning.

Results

We performed experiments to establish that the MultiBoostAB algorithm using the J48 classifier could correctly classify Gene Ontology Slim terms of a gene given information regarding the gene's location and information from its nearest-neighbouring genes for training. Different neighbourhood sizes were examined to determine how many nearest neighbours should be included around each gene to provide better classification rules. Our results show that by just incorporating neighbour information from each gene's two-nearest neighbours, the percentage of correctly classified genes to their correct Gene Ontology Slim term for each ontology reaches over 80% with high accuracy (reflected in F-measures over 0.80) of the classification rules produced.

Conclusions

We confirmed that in classifying genes to their correct Gene Ontology Slim term, the inclusion of neighbour information from those genes is beneficial. Knowing the location of a gene and the Gene Ontology Slim information from neighbouring genes gives us insight into that gene's functionality. This benefit is seen by just including information from a gene's two-nearest neighbouring genes.