Log on / register
Feedback | Support | My details
Open AccessMethodology article

Using the information embedded in the testing sample to break the limits caused by the small sample size in microarray-based classification

Manli Zhu1 email and Aleix M Martinez1,2 email

Department of Electrical and Computer Engineering, The Ohio State University, Columbus, OH 43210, USA

Department of Biomedical Engineering, The Ohio State University, Columbus, OH 43210, USA

author email corresponding author email

BMC Bioinformatics 2008, 9:280doi:10.1186/1471-2105-9-280

Published: 14 June 2008

Abstract

Background

Microarray-based tumor classification is characterized by a very large number of features (genes) and small number of samples. In such cases, statistical techniques cannot determine which genes are correlated to each tumor type. A popular solution is the use of a subset of pre-specified genes. However, molecular variations are generally correlated to a large number of genes. A gene that is not correlated to some disease may, by combination with other genes, express itself.

Results

In this paper, we propose a new classiification strategy that can reduce the effect of over-fitting without the need to pre-select a small subset of genes. Our solution works by taking advantage of the information embedded in the testing samples. We note that a well-defined classification algorithm works best when the data is properly labeled. Hence, our classification algorithm will discriminate all samples best when the testing sample is assumed to belong to the correct class. We compare our solution with several well-known alternatives for tumor classification on a variety of publicly available data-sets. Our approach consistently leads to better classification results.

Conclusion

Studies indicate that thousands of samples may be required to extract useful statistical information from microarray data. Herein, it is shown that this problem can be circumvented by using the information embedded in the testing samples.


© 1999-2010 BioMed Central Ltd unless otherwise stated. Part of Springer Science+Business Media.