Email updates

Keep up to date with the latest news and content from BMC Bioinformatics and BioMed Central.

Open Access Highly Accessed Methodology article

Data reduction for spectral clustering to analyze high throughput flow cytometry data

Habil Zare12, Parisa Shooshtari12, Arvind Gupta3 and Ryan R Brinkman24*

Author Affiliations

1 Department of Computing Science, University of British Columbia, Vancouver, BC, Canada

2 Terry Fox Laboratory, BC Cancer Agency, 675 W 10th Ave., Vancouver, BC, Canada

3 Faculty of Science, University of British Columbia, Vancouver, BC, Canada

4 Department of Medical Genetics, University of British Columbia, Vancouver, BC, Canada

For all author emails, please log on.

BMC Bioinformatics 2010, 11:403  doi:10.1186/1471-2105-11-403

Published: 28 July 2010



Recent biological discoveries have shown that clustering large datasets is essential for better understanding biology in many areas. Spectral clustering in particular has proven to be a powerful tool amenable for many applications. However, it cannot be directly applied to large datasets due to time and memory limitations. To address this issue, we have modified spectral clustering by adding an information preserving sampling procedure and applying a post-processing stage. We call this entire algorithm SamSPECTRAL.


We tested our algorithm on flow cytometry data as an example of large, multidimensional data containing potentially hundreds of thousands of data points (i.e., "events" in flow cytometry, typically corresponding to cells). Compared to two state of the art model-based flow cytometry clustering methods, SamSPECTRAL demonstrates significant advantages in proper identification of populations with non-elliptical shapes, low density populations close to dense ones, minor subpopulations of a major population and rare populations.


This work is the first successful attempt to apply spectral methodology on flow cytometry data. An implementation of our algorithm as an R package is freely available through BioConductor.