Email updates

Keep up to date with the latest news and content from BMC Bioinformatics and BioMed Central.

This article is part of the supplement: Proceedings of the Fourth Annual MCBIOS Conference. Computational Frontiers in Biomedicine

Open Access Proceedings

Robust clustering in high dimensional data using statistical depths

Yuanyuan Ding1, Xin Dang2, Hanxiang Peng2* and Dawn Wilkins1

Author Affiliations

1 Computer & Information Science Department, The University of Mississippi, University, MS, USA

2 Department of Mathematics, The University of Mississippi, University, MS, USA

For all author emails, please log on.

BMC Bioinformatics 2007, 8(Suppl 7):S8  doi:10.1186/1471-2105-8-S7-S8

Published: 1 November 2007

Abstract

Background

Mean-based clustering algorithms such as bisecting k-means generally lack robustness. Although componentwise median is a more robust alternative, it can be a poor center representative for high dimensional data. We need a new algorithm that is robust and works well in high dimensional data sets e.g. gene expression data.

Results

Here we propose a new robust divisive clustering algorithm, the bisecting k-spatialMedian, based on the statistical spatial depth. A new subcluster selection rule, Relative Average Depth, is also introduced. We demonstrate that the proposed clustering algorithm outperforms the componentwise-median-based bisecting k-median algorithm for high dimension and low sample size (HDLSS) data via applications of the algorithms on two real HDLSS gene expression data sets. When further applied on noisy real data sets, the proposed algorithm compares favorably in terms of robustness with the componentwise-median-based bisecting k-median algorithm.

Conclusion

Statistical data depths provide an alternative way to find the "center" of multivariate data sets and are useful and robust for clustering.