Open Access Open Badges Research article

A Beta-mixture model for dimensionality reduction, sample classification and analysis

Kirsti Laurila12, Bodil Oster3, Claus L Andersen3, Philippe Lamy23, Torben Orntoft3, Olli Yli-Harja1 and Carsten Wiuf2*

Author Affiliations

1 Department of Signal Processing, Tampere University of Technology, P.O. Box 527, FI-33101 Tampere, Finland

2 Bioinformatics Research Centre, Aarhus University, C.F. Møllers Allé 8, DK-8000 Århus C, Denmark

3 Department for Molecular Medicine, Aarhus University Hospital/Skejby, Brendstrupgårdsvej 100, DK-8200 Århus N, Denmark

For all author emails, please log on.

BMC Bioinformatics 2011, 12:215  doi:10.1186/1471-2105-12-215

Published: 27 May 2011



Patterns of genome-wide methylation vary between tissue types. For example, cancer tissue shows markedly different patterns from those of normal tissue. In this paper we propose a beta-mixture model to describe genome-wide methylation patterns based on probe data from methylation microarrays. The model takes dependencies between neighbour probe pairs into account and assumes three broad categories of methylation, low, medium and high. The model is described by 37 parameters, which reduces the dimensionality of a typical methylation microarray significantly. We used methylation microarray data from 42 colon cancer samples to assess the model.


Based on data from colon cancer samples we show that our model captures genome-wide characteristics of methylation patterns. We estimate the parameters of the model and show that they vary between different tissue types. Further, for each methylation probe the posterior probability of a methylation state (low, medium or high) is calculated and the probability that the state is correctly predicted is assessed. We demonstrate that the model can be applied to classify cancer tissue types accurately and that the model provides accessible and easily interpretable data summaries.


We have developed a beta-mixture model for methylation microarray data. The model substantially reduces the dimensionality of the data. It can be used for further analysis, such as sample classification or to detect changes in methylation status between different samples and tissues.