This article is part of the supplement: The 2010 International Conference on Bioinformatics and Computational Biology (BIOCOMP 2010): Genomics

Open Access Open Badges Research article

A gene selection method for GeneChip array data with small sample sizes

Zhongxue Chen1*, Qingzhong Liu2, Monnie McGee3*, Megan Kong4, Xudong Huang5, Youping Deng6 and Richard H Scheuermann4

Author Affiliations

1 Biostatistics Epidemiology Research Design Core, Center for Clinical and Translational Sciences, The University of Texas Health Science Center at Houston, Houston, TX 77030, USA

2 Department of Computer Science, Sam Houston State University, Huntsville, Texas 77341, USA

3 Statistical Science Department, Southern Methodist University, Dallas, TX 75275, USA

4 Department of Pathology, University of Texas Southwestern Medical Center, Dallas, TX 75390, USA

5 Conjugate and Medicinal Chemistry Laboratory, Division of Nuclear Medicine and Molecular Imaging, Department of Radiology, Brigham and Women's Hospital and Harvard Medical School, Boston, MA 02115, USA

6 Rush University Cancer Center, Rush University Medical Center, Chicago, IL 60612, USA

For all author emails, please log on.

BMC Genomics 2011, 12(Suppl 5):S7  doi:10.1186/1471-2164-12-S5-S7

Published: 23 December 2011



In microarray experiments with small sample sizes, it is a challenge to estimate p-values accurately and decide cutoff p-values for gene selection appropriately. Although permutation-based methods have proved to have greater sensitivity and specificity than the regular t-test, their p-values are highly discrete due to the limited number of permutations available in very small sample sizes. Furthermore, estimated permutation-based p-values for true nulls are highly correlated and not uniformly distributed between zero and one, making it difficult to use current false discovery rate (FDR)-controlling methods.


We propose a model-based information sharing method (MBIS) that, after an appropriate data transformation, utilizes information shared among genes. We use a normal distribution to model the mean differences of true nulls across two experimental conditions. The parameters of the model are then estimated using all data in hand. Based on this model, p-values, which are uniformly distributed from true nulls, are calculated. Then, since FDR-controlling methods are generally not well suited to microarray data with very small sample sizes, we select genes for a given cutoff p-value and then estimate the false discovery rate.


Simulation studies and analysis using real microarray data show that the proposed method, MBIS, is more powerful and reliable than current methods. It has wide application to a variety of situations.