Email updates

Keep up to date with the latest news and content from BMC Genomics and BioMed Central.

Open Access Research article

How to get the most from microarray data: advice from reverse genomics

Ivan P Gorlov1*, Ji-Yeon Yang2, Jinyoung Byun3, Christopher Logothetis1, Olga Y Gorlova4, Kim-Anh Do5 and Christopher Amos3

Author Affiliations

1 Department of Genitourinary Medical Oncology, Unit 1374, The University of Texas MD Anderson Cancer Center, 1155 Pressler Street, Houston, TX 77030-3721, USA

2 Department of Bioinformatics and Computational Biology, The University of Texas MD Anderson Cancer Center, Houston, TX, USA

3 Department of Community and Family Medicine, Geisel School of Medicine, Dartmouth College, Lebanon, NH, USA

4 Department of Eidemiology, The University of Texas MD Anderson Cancer Center, Houston, TX, USA

5 Department of Biostatistics, The University of Texas MD Anderson Cancer Center, Houston, TX, USA

For all author emails, please log on.

BMC Genomics 2014, 15:223  doi:10.1186/1471-2164-15-223

Published: 21 March 2014

Abstract

Background

Whole-genome profiling of gene expression is a powerful tool for identifying cancer-associated genes. Genes differentially expressed between normal and tumorous tissues are usually considered to be cancer associated. We recently demonstrated that the analysis of interindividual variation in gene expression can be useful for identifying cancer associated genes. The goal of this study was to identify the best microarray data–derived predictor of known cancer associated genes.

Results

We found that the traditional approach of identifying cancer genes—identifying differentially expressed genes—is not very efficient. The analysis of interindividual variation of gene expression in tumor samples identifies cancer-associated genes more effectively. The results were consistent across 4 major types of cancer: breast, colorectal, lung, and prostate. We used recently reported cancer-associated genes (2011–2012) for validation and found that novel cancer-associated genes can be best identified by elevated variance of the gene expression in tumor samples.

Conclusions

The observation that the high interindividual variation of gene expression in tumor tissues is the best predictor of cancer-associated genes is likely a result of tumor heterogeneity on gene level. Computer simulation demonstrates that in the case of heterogeneity, an assessment of variance in tumors provides a better identification of cancer genes than does the comparison of the expression in normal and tumor tissues. Our results thus challenge the current paradigm that comparing the mean expression between normal and tumorous tissues is the best approach to identifying cancer-associated genes; we found that the high interindividual variation in expression is a better approach, and that using variation would improve our chances of identifying cancer-associated genes.

Keywords:
Gene expression; Cancer genes; Interindividual variation in gene expression