Email updates

Keep up to date with the latest news and content from BMC Bioinformatics and BioMed Central.

Open Access Highly Accessed Methodology article

Ranking analysis of F-statistics for microarray data

Yuan-De Tan12, Myriam Fornage2 and Hongyan Xu3*

Author Affiliations

1 College of Life Sciences, Hunan Normal University, Changsha, 410081, China

2 Institute of Molecular Medicine, the University of Texas-Houston, Houston, Texas 77030, USA

3 Department of Biostatistics, Medical College of Georgia, Augusta, Georgia, 30912, USA

For all author emails, please log on.

BMC Bioinformatics 2008, 9:142  doi:10.1186/1471-2105-9-142

Published: 6 March 2008

Abstract

Background

Microarray technology provides an efficient means for globally exploring physiological processes governed by the coordinated expression of multiple genes. However, identification of genes differentially expressed in microarray experiments is challenging because of their potentially high type I error rate. Methods for large-scale statistical analyses have been developed but most of them are applicable to two-sample or two-condition data.

Results

We developed a large-scale multiple-group F-test based method, named ranking analysis of F-statistics (RAF), which is an extension of ranking analysis of microarray data (RAM) for two-sample t-test. In this method, we proposed a novel random splitting approach to generate the null distribution instead of using permutation, which may not be appropriate for microarray data. We also implemented a two-simulation strategy to estimate the false discovery rate. Simulation results suggested that it has higher efficiency in finding differentially expressed genes among multiple classes at a lower false discovery rate than some commonly used methods. By applying our method to the experimental data, we found 107 genes having significantly differential expressions among 4 treatments at <0.7% FDR, of which 31 belong to the expressed sequence tags (ESTs), 76 are unique genes who have known functions in the brain or central nervous system and belong to six major functional groups.

Conclusion

Our method is suitable to identify differentially expressed genes among multiple groups, in particular, when sample size is small.