Email updates

Keep up to date with the latest news and content from BMC Bioinformatics and BioMed Central.

This article is part of the supplement: Selected articles from the Eighth Asia-Pacific Bioinformatics Conference (APBC 2010)

Open Access Research

A multi-filter enhanced genetic ensemble system for gene selection and sample classification of microarray data

Pengyi Yang12*, Bing B Zhou1, Zili Zhang34 and Albert Y Zomaya1567

Author Affiliations

1 School of Information Technologies (J12), The University of Sydney, NSW 2006, Australia

2 NICTA, Australian Technology Park, Eveleigh, NSW 2015, Australia

3 Faculty of Computer and Information Science, Southwest University, CQ 400715, PR China

4 School of Information Technology, Deakin University, VIC 3217, Australia

5 Sydney Bioinformatics, The University of Sydney, NSW 2006, Australia

6 Centre for Mathematical Biology, The University of Sydney, NSW 2006, Australia

7 Centre for Distributed and High Performance Computing, The University of Sydney, NSW 2006, Australia

For all author emails, please log on.

BMC Bioinformatics 2010, 11(Suppl 1):S5  doi:10.1186/1471-2105-11-S1-S5

Published: 18 January 2010

Abstract

Background

Feature selection techniques are critical to the analysis of high dimensional datasets. This is especially true in gene selection from microarray data which are commonly with extremely high feature-to-sample ratio. In addition to the essential objectives such as to reduce data noise, to reduce data redundancy, to improve sample classification accuracy, and to improve model generalization property, feature selection also helps biologists to focus on the selected genes to further validate their biological hypotheses.

Results

In this paper we describe an improved hybrid system for gene selection. It is based on a recently proposed genetic ensemble (GE) system. To enhance the generalization property of the selected genes or gene subsets and to overcome the overfitting problem of the GE system, we devised a mapping strategy to fuse the goodness information of each gene provided by multiple filtering algorithms. This information is then used for initialization and mutation operation of the genetic ensemble system.

Conclusion

We used four benchmark microarray datasets (including both binary-class and multi-class classification problems) for concept proving and model evaluation. The experimental results indicate that the proposed multi-filter enhanced genetic ensemble (MF-GE) system is able to improve sample classification accuracy, generate more compact gene subset, and converge to the selection results more quickly. The MF-GE system is very flexible as various combinations of multiple filters and classifiers can be incorporated based on the data characteristics and the user preferences.