Email updates

Keep up to date with the latest news and content from BMC Bioinformatics and BioMed Central.

This article is part of the supplement: Symposium of Computations in Bioinformatics and Bioscience (SCBB07)

Open Access Research

Dimension reduction with redundant gene elimination for tumor classification

Xue-Qiang Zeng1, Guo-Zheng Li12*, Jack Y Yang3, Mary Qu Yang4 and Geng-Feng Wu1

Author Affiliations

1 School of Computer Engineering & Science, Shanghai University, Shanghai 200072, China

2 State Key Laboratory for Novel Software Technology, Nanjing University, Nanjing 210093, China

3 Harvard Medical School, Harvard University, Cambridge, Massachusetts 02140 USA

4 National Human Genome Research Institute National Institutes of Health (NIH) U.S., Department of Health and Human Services Bethesda, MD 20852 USA

For all author emails, please log on.

BMC Bioinformatics 2008, 9(Suppl 6):S8  doi:10.1186/1471-2105-9-S6-S8

Published: 28 May 2008

Abstract

Background

Analysis of gene expression data for tumor classification is an important application of bioinformatics methods. But it is hard to analyse gene expression data from DNA microarray experiments by commonly used classifiers, because there are only a few observations but with thousands of measured genes in the data set. Dimension reduction is often used to handle such a high dimensional problem, but it is obscured by the existence of amounts of redundant features in the microarray data set.

Results

Dimension reduction is performed by combing feature extraction with redundant gene elimination for tumor classification. A novel metric of redundancy based on DIScriminative Contribution (DISC) is proposed which estimates the feature similarity by explicitly building a linear classifier on each gene. Compared with the standard linear correlation metric, DISC takes the label information into account and directly estimates the redundancy of the discriminative ability of two given features. Based on the DISC metric, a novel algorithm named REDISC (Redundancy Elimination based on Discriminative Contribution) is proposed, which eliminates redundant genes before feature extraction and promotes performance of dimension reduction. Experimental results on two microarray data sets show that the REDISC algorithm is effective and reliable to improve generalization performance of dimension reduction and hence the used classifier.

Conclusion

Dimension reduction by performing redundant gene elimination before feature extraction is better than that with only feature extraction for tumor classification, and redundant gene elimination in a supervised way is superior to the commonly used unsupervised method like linear correlation coefficients.