Email updates

Keep up to date with the latest news and content from BMC Bioinformatics and BioMed Central.

Open Access Highly Accessed Methodology article

A Latent Variable Approach for Meta-Analysis of Gene Expression Data from Multiple Microarray Experiments

Hyungwon Choi1, Ronglai Shen1, Arul M Chinnaiyan2 and Debashis Ghosh3*

Author Affiliations

1 Department of Biostatistics, University of Michigan, Ann Arbor, MI, USA

2 Departments of Pathology and Urology, University of Michigan, Ann Arbor, MI, USA

3 Department of Statistics and Huck Institute for Life Sciences, Penn State University, University Park, PA, USA

For all author emails, please log on.

BMC Bioinformatics 2007, 8:364  doi:10.1186/1471-2105-8-364

Published: 27 September 2007



With the explosion in data generated using microarray technology by different investigators working on similar experiments, it is of interest to combine results across multiple studies.


In this article, we describe a general probabilistic framework for combining high-throughput genomic data from several related microarray experiments using mixture models. A key feature of the model is the use of latent variables that represent quantities that can be combined across diverse platforms. We consider two methods for estimation of an index termed the probability of expression (POE). The first, reported in previous work by the authors, involves Markov Chain Monte Carlo (MCMC) techniques. The second method is a faster algorithm based on the expectation-maximization (EM) algorithm. The methods are illustrated with application to a meta-analysis of datasets for metastatic cancer.


The statistical methods described in the paper are available as an R package, metaArray 1.8.1, which is at Bioconductor, whose URL is webcite.