Open Access Highly Accessed Software

R/BHC: fast Bayesian hierarchical clustering for microarray data

Richard S Savage1, Katherine Heller3, Yang Xu3, Zoubin Ghahramani3, William M Truman4, Murray Grant4, Katherine J Denby12 and David L Wild1*

Author Affiliations

1 Systems Biology Centre, University of Warwick, Coventry House, Coventry, CV4 7AL, UK

2 Warwick HRI, University of Warwick, Wellesbourne, CV35 9EF, UK

3 Department of Engineering, University of Cambridge, Cambridge CB2 1PZ, UK

4 School of Biosciences, University of Exeter, Exeter, EX4 4QD, UK

For all author emails, please log on.

BMC Bioinformatics 2009, 10:242  doi:10.1186/1471-2105-10-242

Published: 6 August 2009



Although the use of clustering methods has rapidly become one of the standard computational approaches in the literature of microarray gene expression data analysis, little attention has been paid to uncertainty in the results obtained.


We present an R/Bioconductor port of a fast novel algorithm for Bayesian agglomerative hierarchical clustering and demonstrate its use in clustering gene expression microarray data. The method performs bottom-up hierarchical clustering, using a Dirichlet Process (infinite mixture) to model uncertainty in the data and Bayesian model selection to decide at each step which clusters to merge.


Biologically plausible results are presented from a well studied data set: expression profiles of A. thaliana subjected to a variety of biotic and abiotic stresses. Our method avoids several limitations of traditional methods, for example how many clusters there should be and how to choose a principled distance metric.