Email updates

Keep up to date with the latest news and content from BMC Bioinformatics and BioMed Central.

Open Access Highly Accessed Methodology article

Multiclass classification of microarray data samples with a reduced number of genes

Elizabeth Tapia12*, Leonardo Ornella1, Pilar Bulacio12 and Laura Angelone12

Author Affiliations

1 CIFASIS-Conicet Institute, Bv. 27 de Febrero 210 Bis, Rosario, Argentina

2 Facultad de Cs. Exactas e Ingeniería, Riobamba 245 Bis, National University of Rosario, Argentina

For all author emails, please log on.

BMC Bioinformatics 2011, 12:59  doi:10.1186/1471-2105-12-59

Published: 22 February 2011

Abstract

Background

Multiclass classification of microarray data samples with a reduced number of genes is a rich and challenging problem in Bioinformatics research. The problem gets harder as the number of classes is increased. In addition, the performance of most classifiers is tightly linked to the effectiveness of mandatory gene selection methods. Critical to gene selection is the availability of estimates about the maximum number of genes that can be handled by any classification algorithm. Lack of such estimates may lead to either computationally demanding explorations of a search space with thousands of dimensions or classification models based on gene sets of unrestricted size. In the former case, unbiased but possibly overfitted classification models may arise. In the latter case, biased classification models unable to support statistically significant findings may be obtained.

Results

A novel bound on the maximum number of genes that can be handled by binary classifiers in binary mediated multiclass classification algorithms of microarray data samples is presented. The bound suggests that high-dimensional binary output domains might favor the existence of accurate and sparse binary mediated multiclass classifiers for microarray data samples.

Conclusions

A comprehensive experimental work shows that the bound is indeed useful to induce accurate and sparse multiclass classifiers for microarray data samples.