Log on / register
Feedback | Support | My details

This article is part of the supplement: Proceedings of the Fifth Annual MCBIOS Conference. Systems Biology: Bridging the Omics .

Open AccessProceedings

The balance of reproducibility, sensitivity, and specificity of lists of differentially expressed genes in microarray studies

Leming Shi1 email, Wendell D Jones2 email, Roderick V Jensen3 email, Stephen C Harris1 email, Roger G Perkins4 email, Federico M Goodsaid5 email, Lei Guo1 email, Lisa J Croner6 email, Cecilie Boysen7 email, Hong Fang4 email, Feng Qian4 email, Shashi Amur5 email, Wenjun Bao8 email, Catalin C Barbacioru9 email, Vincent Bertholet10 email, Xiaoxi Megan Cao4 email, Tzu-Ming Chu8 email, Patrick J Collins11 email, Xiao-hui Fan1,12 email, Felix W Frueh5 email, James C Fuscoe1 email, Xu Guo13 email, Jing Han14 email, Damir Herman15 email, Huixiao Hong4 email, Ernest S Kawasaki16 email, Quan-Zhen Li17 email, Yuling Luo18 email, Yunqing Ma18 email, Nan Mei1 email, Ron L Peterson19 email, Raj K Puri14 email, Richard Shippy20 email, Zhenqiang Su1 email, Yongming Andrew Sun9 email, Hongmei Sun4 email, Brett Thorn4 email, Yaron Turpaz12 email, Charles Wang21 email, Sue Jane Wang5 email, Janet A Warrington13 email, James C Willey22 email, Jie Wu4 email, Qian Xie4 email, Liang Zhang23 email, Lu Zhang24 email, Sheng Zhong25 email, Russell D Wolfinger8 email and Weida Tong1 email

National Center for Toxicological Research, US Food and Drug Administration, 3900 NCTR Road, Jefferson, AR 72079, USA

Expression Analysis Inc., 2605 Meridian Parkway, Durham, NC 27713, USA

University of Massachusetts Boston, Department of Physics, 100 Morrissey Boulevard, Boston, MA 02125, USA

Z-Tech Corporation, an ICF International Company at NCTR/FDA, 3900 NCTR Road, Jefferson, AR 72079, USA

Center for Drug Evaluation and Research, US Food and Drug Administration, 10903 New Hampshire Avenue, Silver Spring, MD 20993, USA

Biogen Idec Inc., 5200 Research Place, San Diego, CA 92122, USA

ViaLogy Inc., 2400 Lincoln Avenue, Altadena, CA 91001, USA

SAS Institute Inc., SAS Campus Drive, Cary, NC 27513, USA

Applied Biosystems, 850 Lincoln Centre Drive, Foster City, CA 94404, USA

10  Eppendorf Array Technologies, rue du Séminaire 20a, 5000 Namur, Belgium

11  Agilent Technologies Inc., 5301 Stevens Creek Boulevard, Santa Clara, CA 95051, USA

12  Pharmaceutical Informatics Institute, Zhejiang University, Hangzhou 310027, China

13  Affymetrix Inc., 3420 Central Expressway, Santa Clara, CA 95051, USA

14  Center for Biologics Evaluation and Research, US Food and Drug Administration, 8800 Rockville Pike, Bethesda, MD 20892, USA

15  National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, 8600 Rockville Pike, Bethesda, MD 20894, USA

16  National Cancer Institute Advanced Technology Center, 8717 Grovemont Circle, Gaithersburg, MD 20877, USA

17  University of Texas Southwestern Medical Center, 6000 Harry Hines Boulevard, Dallas, TX 75390, USA

18  Panomics Inc., 6519 Dumbarton Circle, Fremont, CA 94555, USA

19  Novartis Institutes for Biomedical Research, 250 Massachusetts Avenue, Cambridge, MA 02139, USA

20  GE Healthcare, 7700 S River Parkway, Tempe, AZ 85284, USA

21  UCLA David Geffen School of Medicine, Transcriptional Genomics Core, Cedars-Sinai Medical Center, 8700 Beverly Boulevard, Los Angeles, CA 90048, USA

22  Ohio Medical University, 3000 Arlington Avenue, Toledo, OH 43614, USA

23  CapitalBio Corporation, 18 Life Science Parkway, Changping District, Beijing 102206, China

24  Solexa Inc., 25861 Industrial Boulevard, Hayward, CA 94545, USA

25  University of Illinois at Urbana-Champaign, Department of Bioengineering, 1304 W. Springfield Avenue, Urbana, IL 61801, USA

author email corresponding author email

BMC Bioinformatics 2008, 9(Suppl 9):S10doi:10.1186/1471-2105-9-S9-S10

Published: 12 August 2008

Abstract

Background

Reproducibility is a fundamental requirement in scientific experiments. Some recent publications have claimed that microarrays are unreliable because lists of differentially expressed genes (DEGs) are not reproducible in similar experiments. Meanwhile, new statistical methods for identifying DEGs continue to appear in the scientific literature. The resultant variety of existing and emerging methods exacerbates confusion and continuing debate in the microarray community on the appropriate choice of methods for identifying reliable DEG lists.

Results

Using the data sets generated by the MicroArray Quality Control (MAQC) project, we investigated the impact on the reproducibility of DEG lists of a few widely used gene selection procedures. We present comprehensive results from inter-site comparisons using the same microarray platform, cross-platform comparisons using multiple microarray platforms, and comparisons between microarray results and those from TaqMan – the widely regarded "standard" gene expression platform. Our results demonstrate that (1) previously reported discordance between DEG lists could simply result from ranking and selecting DEGs solely by statistical significance (P) derived from widely used simple t-tests; (2) when fold change (FC) is used as the ranking criterion with a non-stringent P-value cutoff filtering, the DEG lists become much more reproducible, especially when fewer genes are selected as differentially expressed, as is the case in most microarray studies; and (3) the instability of short DEG lists solely based on P-value ranking is an expected mathematical consequence of the high variability of the t-values; the more stringent the P-value threshold, the less reproducible the DEG list is. These observations are also consistent with results from extensive simulation calculations.

Conclusion

We recommend the use of FC-ranking plus a non-stringent P cutoff as a straightforward and baseline practice in order to generate more reproducible DEG lists. Specifically, the P-value cutoff should not be stringent (too small) and FC should be as large as possible. Our results provide practical guidance to choose the appropriate FC and P-value cutoffs when selecting a given number of DEGs. The FC criterion enhances reproducibility, whereas the P criterion balances sensitivity and specificity.


© 1999-2009 BioMed Central Ltd unless otherwise stated. Part of Springer Science+Business Media.