Bayesian integrated modeling of expression data: a case study on RhoG
1 Department of Mathematics and Statistics, University of Helsinki, P.O. Box 68, FIN-00014, Helsinki, Finland
2 Institute of Biotechnology, University of Helsinki, P.O. Box 56, FIN-00014, Helsinki, Finland
3 National Institute for Health and Welfare (THL), Mannerheimintie 166, 00300 Helsinki, Finland
BMC Bioinformatics 2010, 11:295 doi:10.1186/1471-2105-11-295Published: 1 June 2010
DNA microarrays provide an efficient method for measuring activity of genes in parallel and even covering all the known transcripts of an organism on a single array. This has to be balanced against that analyzing data emerging from microarrays involves several consecutive steps, and each of them is a potential source of errors. Errors tend to accumulate when moving from the lower level towards the higher level analyses because of the sequential nature. Eliminating such errors does not seem feasible without completely changing the technologies, but one should nevertheless try to meet the goal of being able to realistically assess degree of the uncertainties that are involved when drawing the final conclusions from such analyses.
We present a Bayesian hierarchical model for finding differentially expressed genes between two experimental conditions, proposing an integrated statistical approach where correcting signal saturation, systematic array effects, dye effects, and finding differentially expressed genes, are all modeled jointly. The integration allows all these components, and also the associated errors, to be considered simultaneously. The inference is based on full posterior distribution of gene expression indices and on quantities derived from them rather than on point estimates. The model was applied and tested on two different datasets.
The method presents a way of integrating various steps of microarray analysis into a single joint analysis, and thereby enables extracting information on differential expression in a manner, which properly accounts for various sources of potential error in the process.