Bayesian meta-analysis models for microarray data: a comparative study
1 Department of Mathematics and Statistics, University of Massachusetts, Amherst, Massachusetts, USA
2 Department of Mathematics, University of Arkansas, Fayetteville, Arkansas, USA
BMC Bioinformatics 2007, 8:80 doi:10.1186/1471-2105-8-80Published: 7 March 2007
With the growing abundance of microarray data, statistical methods are increasingly needed to integrate results across studies. Two common approaches for meta-analysis of microarrays include either combining gene expression measures across studies or combining summaries such as p-values, probabilities or ranks. Here, we compare two Bayesian meta-analysis models that are analogous to these methods.
Two Bayesian meta-analysis models for microarray data have recently been introduced. The first model combines standardized gene expression measures across studies into an overall mean, accounting for inter-study variability, while the second combines probabilities of differential expression without combining expression values. Both models produce the gene-specific posterior probability of differential expression, which is the basis for inference. Since the standardized expression integration model includes inter-study variability, it may improve accuracy of results versus the probability integration model. However, due to the small number of studies typical in microarray meta-analyses, the variability between studies is challenging to estimate. The probability integration model eliminates the need to model variability between studies, and thus its implementation is more straightforward. We found in simulations of two and five studies that combining probabilities outperformed combining standardized gene expression measures for three comparison values: the percent of true discovered genes in meta-analysis versus individual studies; the percent of true genes omitted in meta-analysis versus separate studies, and the number of true discovered genes for fixed levels of Bayesian false discovery. We identified similar results when pooling two independent studies of Bacillus subtilis. We assumed that each study was produced from the same microarray platform with only two conditions: a treatment and control, and that the data sets were pre-scaled.
The Bayesian meta-analysis model that combines probabilities across studies does not aggregate gene expression measures, thus an inter-study variability parameter is not included in the model. This results in a simpler modeling approach than aggregating expression measures, which accounts for variability across studies. The probability integration model identified more true discovered genes and fewer true omitted genes than combining expression measures, for our data sets.