Email updates

Keep up to date with the latest news and content from BMC Bioinformatics and BioMed Central.

This article is part of the supplement: Proceedings of the Fifth Annual MCBIOS Conference. Systems Biology: Bridging the Omics

Open Access Proceedings

The balance of reproducibility, sensitivity, and specificity of lists of differentially expressed genes in microarray studies

Leming Shi1*, Wendell D Jones2, Roderick V Jensen3, Stephen C Harris1, Roger G Perkins4, Federico M Goodsaid5, Lei Guo1, Lisa J Croner6, Cecilie Boysen7, Hong Fang4, Feng Qian4, Shashi Amur5, Wenjun Bao8, Catalin C Barbacioru9, Vincent Bertholet10, Xiaoxi Megan Cao4, Tzu-Ming Chu8, Patrick J Collins11, Xiao-hui Fan112, Felix W Frueh5, James C Fuscoe1, Xu Guo13, Jing Han14, Damir Herman15, Huixiao Hong4, Ernest S Kawasaki16, Quan-Zhen Li17, Yuling Luo18, Yunqing Ma18, Nan Mei1, Ron L Peterson19, Raj K Puri14, Richard Shippy20, Zhenqiang Su1, Yongming Andrew Sun9, Hongmei Sun4, Brett Thorn4, Yaron Turpaz12, Charles Wang21, Sue Jane Wang5, Janet A Warrington13, James C Willey22, Jie Wu4, Qian Xie4, Liang Zhang23, Lu Zhang24, Sheng Zhong25, Russell D Wolfinger8 and Weida Tong1

Author Affiliations

1 National Center for Toxicological Research, US Food and Drug Administration, 3900 NCTR Road, Jefferson, AR 72079, USA

2 Expression Analysis Inc., 2605 Meridian Parkway, Durham, NC 27713, USA

3 University of Massachusetts Boston, Department of Physics, 100 Morrissey Boulevard, Boston, MA 02125, USA

4 Z-Tech Corporation, an ICF International Company at NCTR/FDA, 3900 NCTR Road, Jefferson, AR 72079, USA

5 Center for Drug Evaluation and Research, US Food and Drug Administration, 10903 New Hampshire Avenue, Silver Spring, MD 20993, USA

6 Biogen Idec Inc., 5200 Research Place, San Diego, CA 92122, USA

7 ViaLogy Inc., 2400 Lincoln Avenue, Altadena, CA 91001, USA

8 SAS Institute Inc., SAS Campus Drive, Cary, NC 27513, USA

9 Applied Biosystems, 850 Lincoln Centre Drive, Foster City, CA 94404, USA

10 Eppendorf Array Technologies, rue du Séminaire 20a, 5000 Namur, Belgium

11 Agilent Technologies Inc., 5301 Stevens Creek Boulevard, Santa Clara, CA 95051, USA

12 Pharmaceutical Informatics Institute, Zhejiang University, Hangzhou 310027, China

13 Affymetrix Inc., 3420 Central Expressway, Santa Clara, CA 95051, USA

14 Center for Biologics Evaluation and Research, US Food and Drug Administration, 8800 Rockville Pike, Bethesda, MD 20892, USA

15 National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, 8600 Rockville Pike, Bethesda, MD 20894, USA

16 National Cancer Institute Advanced Technology Center, 8717 Grovemont Circle, Gaithersburg, MD 20877, USA

17 University of Texas Southwestern Medical Center, 6000 Harry Hines Boulevard, Dallas, TX 75390, USA

18 Panomics Inc., 6519 Dumbarton Circle, Fremont, CA 94555, USA

19 Novartis Institutes for Biomedical Research, 250 Massachusetts Avenue, Cambridge, MA 02139, USA

20 GE Healthcare, 7700 S River Parkway, Tempe, AZ 85284, USA

21 UCLA David Geffen School of Medicine, Transcriptional Genomics Core, Cedars-Sinai Medical Center, 8700 Beverly Boulevard, Los Angeles, CA 90048, USA

22 Ohio Medical University, 3000 Arlington Avenue, Toledo, OH 43614, USA

23 CapitalBio Corporation, 18 Life Science Parkway, Changping District, Beijing 102206, China

24 Solexa Inc., 25861 Industrial Boulevard, Hayward, CA 94545, USA

25 University of Illinois at Urbana-Champaign, Department of Bioengineering, 1304 W. Springfield Avenue, Urbana, IL 61801, USA

For all author emails, please log on.

BMC Bioinformatics 2008, 9(Suppl 9):S10  doi:10.1186/1471-2105-9-S9-S10

Published: 12 August 2008

Abstract

Background

Reproducibility is a fundamental requirement in scientific experiments. Some recent publications have claimed that microarrays are unreliable because lists of differentially expressed genes (DEGs) are not reproducible in similar experiments. Meanwhile, new statistical methods for identifying DEGs continue to appear in the scientific literature. The resultant variety of existing and emerging methods exacerbates confusion and continuing debate in the microarray community on the appropriate choice of methods for identifying reliable DEG lists.

Results

Using the data sets generated by the MicroArray Quality Control (MAQC) project, we investigated the impact on the reproducibility of DEG lists of a few widely used gene selection procedures. We present comprehensive results from inter-site comparisons using the same microarray platform, cross-platform comparisons using multiple microarray platforms, and comparisons between microarray results and those from TaqMan – the widely regarded "standard" gene expression platform. Our results demonstrate that (1) previously reported discordance between DEG lists could simply result from ranking and selecting DEGs solely by statistical significance (P) derived from widely used simple t-tests; (2) when fold change (FC) is used as the ranking criterion with a non-stringent P-value cutoff filtering, the DEG lists become much more reproducible, especially when fewer genes are selected as differentially expressed, as is the case in most microarray studies; and (3) the instability of short DEG lists solely based on P-value ranking is an expected mathematical consequence of the high variability of the t-values; the more stringent the P-value threshold, the less reproducible the DEG list is. These observations are also consistent with results from extensive simulation calculations.

Conclusion

We recommend the use of FC-ranking plus a non-stringent P cutoff as a straightforward and baseline practice in order to generate more reproducible DEG lists. Specifically, the P-value cutoff should not be stringent (too small) and FC should be as large as possible. Our results provide practical guidance to choose the appropriate FC and P-value cutoffs when selecting a given number of DEGs. The FC criterion enhances reproducibility, whereas the P criterion balances sensitivity and specificity.