Email updates

Keep up to date with the latest news and content from BMC Bioinformatics and BioMed Central.

This article is part of the supplement: Symposium of Computations in Bioinformatics and Bioscience (SCBB07)

Open Access Research

A practical comparison of two K-Means clustering algorithms

Gregory A Wilkin1 and Xiuzhen Huang2*

Author Affiliations

1 601 North 12th Street, Paragould, Arkansas 72450, USA

2 Department of Computer Science, Arkansas State University, State University, Arkansas 72467, USA

For all author emails, please log on.

BMC Bioinformatics 2008, 9(Suppl 6):S19  doi:10.1186/1471-2105-9-S6-S19

Published: 28 May 2008

Abstract

Background

Data clustering is a powerful technique for identifying data with similar characteristics, such as genes with similar expression patterns. However, not all implementations of clustering algorithms yield the same performance or the same clusters.

Results

In this paper, we study two implementations of a general method for data clustering: k-means clustering. Our experimentation compares the running times and distance efficiency of Lloyd's K-means Clustering and the Progressive Greedy K-means Clustering.

Conclusion

Based on our implementation, not just in processing time, but also in terms of mean squared-difference (MSD), Lloyd's K-means Clustering algorithm is more efficient. This analysis was performed using both a gene expression level sample and on randomly-generated datasets in three-dimensional space. However, other circumstances may dictate a different choice in some situations.