A formal concept analysis approach to consensus clustering of multi-experiment expression data
1 Department of Information Technology, Ghent University - iMinds, Gaston Crommenlaan 8 (201), 9050 Ghent, Belgium
2 Department of Computer Systems and Technology, Technical University of Sofia-branch Plovdiv, Tsanko Dyustabanov 25, 4000 Plovdiv, Bulgaria
3 ICT & Software Engineering Group, Sirris, The Collective Center for the Belgian Technological Industry, Reyerslaan 80, 1030 Brussels, Belgium
BMC Bioinformatics 2014, 15:151 doi:10.1186/1471-2105-15-151Published: 19 May 2014
Presently, with the increasing number and complexity of available gene expression datasets, the combination of data from multiple microarray studies addressing a similar biological question is gaining importance. The analysis and integration of multiple datasets are expected to yield more reliable and robust results since they are based on a larger number of samples and the effects of the individual study-specific biases are diminished. This is supported by recent studies suggesting that important biological signals are often preserved or enhanced by multiple experiments. An approach to combining data from different experiments is the aggregation of their clusterings into a consensus or representative clustering solution which increases the confidence in the common features of all the datasets and reveals the important differences among them.
We propose a novel generic consensus clustering technique that applies Formal Concept Analysis (FCA) approach for the consolidation and analysis of clustering solutions derived from several microarray datasets. These datasets are initially divided into groups of related experiments with respect to a predefined criterion. Subsequently, a consensus clustering algorithm is applied to each group resulting in a clustering solution per group.
These solutions are pooled together and further analysed by employing FCA which allows extracting valuable insights from the data and generating a gene partition over all the experiments. In order to validate the FCA-enhanced approach two consensus clustering algorithms are adapted to incorporate the FCA analysis. Their performance is evaluated on gene expression data from multi-experiment study examining the global cell-cycle control of fission yeast. The FCA results derived from both methods demonstrate that, although both algorithms optimize different clustering characteristics, FCA is able to overcome and diminish these differences and preserve some relevant biological signals.
The proposed FCA-enhanced consensus clustering technique is a general approach to the combination of clustering algorithms with FCA for deriving clustering solutions from multiple gene expression matrices. The experimental results presented herein demonstrate that it is a robust data integration technique able to produce good quality clustering solution that is representative for the whole set of expression matrices.