Email updates

Keep up to date with the latest news and content from BMC Cell Biology and BioMed Central.

This article is part of the supplement: 2006 International Workshop on Multiscale Biological Imaging, Data Mining and Informatics

Open Access Research

Phenotype clustering of breast epithelial cells in confocal images based on nuclear protein distribution analysis

Fuhui Long14, Hanchuan Peng24, Damir Sudar1, Sophie A Lelièvre3 and David W Knowles1*

Author Affiliations

1 Life Sciences Division, Lawrence Berkeley National Laboratory, Berkeley, CA 94720 USA

2 Genomics Division West, Lawrence Berkeley National Laboratory, Berkeley, CA 94720 USA

3 Department of Basic Medical Science, Purdue University, West Lafayette, IN 47907 USA

4 Janelia Farm Research Campus, Howard Hughes Medical Institute, Ashburn, VA 20147 USA

For all author emails, please log on.

BMC Cell Biology 2007, 8(Suppl 1):S3  doi:10.1186/1471-2121-8-S1-S3

Published: 10 July 2007



The distribution of chromatin-associated proteins plays a key role in directing nuclear function. Previously, we developed an image-based method to quantify the nuclear distributions of proteins and showed that these distributions depended on the phenotype of human mammary epithelial cells. Here we describe a method that creates a hierarchical tree of the given cell phenotypes and calculates the statistical significance between them, based on the clustering analysis of nuclear protein distributions.


Nuclear distributions of nuclear mitotic apparatus protein were previously obtained for non-neoplastic S1 and malignant T4-2 human mammary epithelial cells cultured for up to 12 days. Cell phenotype was defined as S1 or T4-2 and the number of days in cultured. A probabilistic ensemble approach was used to define a set of consensus clusters from the results of multiple traditional cluster analysis techniques applied to the nuclear distribution data. Cluster histograms were constructed to show how cells in any one phenotype were distributed across the consensus clusters. Grouping various phenotypes allowed us to build phenotype trees and calculate the statistical difference between each group. The results showed that non-neoplastic S1 cells could be distinguished from malignant T4-2 cells with 94.19% accuracy; that proliferating S1 cells could be distinguished from differentiated S1 cells with 92.86% accuracy; and showed no significant difference between the various phenotypes of T4-2 cells corresponding to increasing tumor sizes.


This work presents a cluster analysis method that can identify significant cell phenotypes, based on the nuclear distribution of specific proteins, with high accuracy.