Email updates

Keep up to date with the latest news and content from BMC Genomics and BioMed Central.

This article is part of the supplement: The 2007 International Conference on Bioinformatics & Computational Biology (BIOCOMP'07)

Open Access Research

Improving prediction accuracy of tumor classification by reusing genes discarded during gene selection

Jack Y Yang1, Guo-Zheng Li23*, Hao-Hua Meng2, Mary Qu Yang4 and Youping Deng5

Author affiliations

1 Harvard Medical School, Harvard University, Cambridge, Massachusetts 02140-0888 USA

2 School of Computer Engineering & Science, Shanghai University, Shanghai 200072, China

3 Institute of Systems Biology, Shanghai University, Shanghai 200072, China

4 National Human Genome Research Institute, National Institutes of Health, U.S. Department of Health and Human Services, Bethesda, MD 20892, USA

5 Department of Biological Sciences, University of Southern Mississippi, Hattiesburg, MS 39406, USA

For all author emails, please log on.

Citation and License

BMC Genomics 2008, 9(Suppl 1):S3  doi:10.1186/1471-2164-9-S1-S3

Published: 20 March 2008

Abstract

Background

Since the high dimensionality of gene expression microarray data sets degrades the generalization performance of classifiers, feature selection, which selects relevant features and discards irrelevant and redundant features, has been widely used in the bioinformatics field. Multi-task learning is a novel technique to improve prediction accuracy of tumor classification by using information contained in such discarded redundant features, but which features should be discarded or used as input or output remains an open issue.

Results

We demonstrate a framework for automatically selecting features to be input, output, and discarded by using a genetic algorithm, and propose two algorithms: GA-MTL (Genetic algorithm based multi-task learning) and e-GA-MTL (an enhanced version of GA-MTL). Experimental results demonstrate that this framework is effective at selecting features for multi-task learning, and that GA-MTL and e-GA-MTL perform better than other heuristic methods.

Conclusions

Genetic algorithms are a powerful technique to select features for multi-task learning automatically; GA-MTL and e-GA-MTL are shown to to improve generalization performance of classifiers on microarray data sets.