Email updates

Keep up to date with the latest news and content from BMC Bioinformatics and BioMed Central.

Open Access Research article

Integrative analysis of multiple gene expression profiles with quality-adjusted effect size models

Pingzhao Hu1, Celia MT Greenwood12 and Joseph Beyene12*

Author Affiliations

1 The Hospital for Sick Children Research Institute, 555 University Ave., Toronto, ON, M5G 1X8, Canada

2 Department of Public Health Sciences, University of Toronto, 1 King's College Circle, Toronto, ON, M5S 1A8, Canada

For all author emails, please log on.

BMC Bioinformatics 2005, 6:128  doi:10.1186/1471-2105-6-128

Published: 27 May 2005

Abstract

Background

With the explosion of microarray studies, an enormous amount of data is being produced. Systematic integration of gene expression data from different sources increases statistical power of detecting differentially expressed genes and allows assessment of heterogeneity. The challenge, however, is in designing and implementing efficient analytic methodologies for combination of data generated by different research groups.

Results

We extended traditional effect size models to combine information from different microarray datasets by incorporating a quality measure for each gene in each study into the effect size estimation. We illustrated our method by integrating two datasets generated using different Affymetrix oligonucleotide types. Our results indicate that the proposed quality-adjusted weighting strategy for modelling inter-study variation of gene expression profiles not only increases consistency and decreases heterogeneous results between these two datasets, but also identifies many more differentially expressed genes than methods proposed previously.

Conclusion

Data integration and synthesis is becoming increasingly important. We live in a high-throughput era where technologies constantly change leaving behind a trail of data with different forms, shapes and sizes. Statistical and computational methodologies are therefore critical for extracting the most out of these related but not identical sources of data.