This article is part of the supplement: Proceedings of the Fifth Annual MCBIOS Conference. Systems Biology: Bridging the Omics .The balance of reproducibility, sensitivity, and specificity of lists of differentially expressed genes in microarray studies1 National Center for Toxicological Research, US Food and Drug Administration, 3900 NCTR Road, Jefferson, AR 72079, USA 2 Expression Analysis Inc., 2605 Meridian Parkway, Durham, NC 27713, USA 3 University of Massachusetts Boston, Department of Physics, 100 Morrissey Boulevard, Boston, MA 02125, USA 4 Z-Tech Corporation, an ICF International Company at NCTR/FDA, 3900 NCTR Road, Jefferson, AR 72079, USA 5 Center for Drug Evaluation and Research, US Food and Drug Administration, 10903 New Hampshire Avenue, Silver Spring, MD 20993, USA 6 Biogen Idec Inc., 5200 Research Place, San Diego, CA 92122, USA 7 ViaLogy Inc., 2400 Lincoln Avenue, Altadena, CA 91001, USA 8 SAS Institute Inc., SAS Campus Drive, Cary, NC 27513, USA 9 Applied Biosystems, 850 Lincoln Centre Drive, Foster City, CA 94404, USA 10 Eppendorf Array Technologies, rue du Séminaire 20a, 5000 Namur, Belgium 11 Agilent Technologies Inc., 5301 Stevens Creek Boulevard, Santa Clara, CA 95051, USA 12 Pharmaceutical Informatics Institute, Zhejiang University, Hangzhou 310027, China 13 Affymetrix Inc., 3420 Central Expressway, Santa Clara, CA 95051, USA 14 Center for Biologics Evaluation and Research, US Food and Drug Administration, 8800 Rockville Pike, Bethesda, MD 20892, USA 15 National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, 8600 Rockville Pike, Bethesda, MD 20894, USA 16 National Cancer Institute Advanced Technology Center, 8717 Grovemont Circle, Gaithersburg, MD 20877, USA 17 University of Texas Southwestern Medical Center, 6000 Harry Hines Boulevard, Dallas, TX 75390, USA 18 Panomics Inc., 6519 Dumbarton Circle, Fremont, CA 94555, USA 19 Novartis Institutes for Biomedical Research, 250 Massachusetts Avenue, Cambridge, MA 02139, USA 20 GE Healthcare, 7700 S River Parkway, Tempe, AZ 85284, USA 21 UCLA David Geffen School of Medicine, Transcriptional Genomics Core, Cedars-Sinai Medical Center, 8700 Beverly Boulevard, Los Angeles, CA 90048, USA 22 Ohio Medical University, 3000 Arlington Avenue, Toledo, OH 43614, USA 23 CapitalBio Corporation, 18 Life Science Parkway, Changping District, Beijing 102206, China 24 Solexa Inc., 25861 Industrial Boulevard, Hayward, CA 94545, USA 25 University of Illinois at Urbana-Champaign, Department of Bioengineering, 1304 W. Springfield Avenue, Urbana, IL 61801, USA
BMC Bioinformatics 2008, 9(Suppl 9):S10doi:10.1186/1471-2105-9-S9-S10
AbstractBackgroundReproducibility is a fundamental requirement in scientific experiments. Some recent publications have claimed that microarrays are unreliable because lists of differentially expressed genes (DEGs) are not reproducible in similar experiments. Meanwhile, new statistical methods for identifying DEGs continue to appear in the scientific literature. The resultant variety of existing and emerging methods exacerbates confusion and continuing debate in the microarray community on the appropriate choice of methods for identifying reliable DEG lists. ResultsUsing the data sets generated by the MicroArray Quality Control (MAQC) project, we investigated the impact on the reproducibility of DEG lists of a few widely used gene selection procedures. We present comprehensive results from inter-site comparisons using the same microarray platform, cross-platform comparisons using multiple microarray platforms, and comparisons between microarray results and those from TaqMan – the widely regarded "standard" gene expression platform. Our results demonstrate that (1) previously reported discordance between DEG lists could simply result from ranking and selecting DEGs solely by statistical significance (P) derived from widely used simple t-tests; (2) when fold change (FC) is used as the ranking criterion with a non-stringent P-value cutoff filtering, the DEG lists become much more reproducible, especially when fewer genes are selected as differentially expressed, as is the case in most microarray studies; and (3) the instability of short DEG lists solely based on P-value ranking is an expected mathematical consequence of the high variability of the t-values; the more stringent the P-value threshold, the less reproducible the DEG list is. These observations are also consistent with results from extensive simulation calculations. ConclusionWe recommend the use of FC-ranking plus a non-stringent P cutoff as a straightforward and baseline practice in order to generate more reproducible DEG lists. Specifically, the P-value cutoff should not be stringent (too small) and FC should be as large as possible. Our results provide practical guidance to choose the appropriate FC and P-value cutoffs when selecting a given number of DEGs. The FC criterion enhances reproducibility, whereas the P criterion balances sensitivity and specificity. |



on Google Scholar







author email
corresponding author email