A novel noise handling method to improve clustering of gene expression patterns

Bhattacharya, Anindya; De, Rajat K

doi:10.1186/1471-2105-12-S7-A3

Volume 12 Supplement 7

UT-ORNL-KBRIN Bioinformatics Summit 2011

Meeting abstract
Open access
Published: 05 August 2011

A novel noise handling method to improve clustering of gene expression patterns

Anindya Bhattacharya¹ &
Rajat K De²

BMC Bioinformatics volume 12, Article number: A3 (2011) Cite this article

2080 Accesses
Metrics details

Background

Cluster analysis of gene expression data is a useful tool for identifying biologically relevant groups of genes that show similar expression patterns under multiple experimental conditions. Performance of clustering algorithms is largely dependent on selected similarity measure. Efficiency in handling outliers is a major contributor to the success of a similarity measure. In gene expression data, there may be pairs of genes that have completely different expression values over a few samples under certain experimental condition(s), although they exhibit similar behavior over the other samples. Depending on the algorithms, these outliers are either placed in single element clusters (hierarchical clustering), are allowed to be in a cluster that is more similar compared to others (partitioning clustering) or they may be completely discarded from grouping (density-based, grid-based and graph-based clustering). In all these cases outliers affect the outcome of a clustering result. Measurement errors or conditional changes during microarray experiments may cause a single sample, if not more, differing in expression level to a great extent compared to the other samples. Expression value of the single or a very few outlier samples may cause a gene to be an outlier. We formulate a new weighted function based method to reduce the effect of outliers on similarity measures. The better the similarity measure is in measuring similarity between genes in the presence of outliers, the better the performance of the clustering algorithm will be in forming biologically relevant groups of genes.

Results

The effectiveness of the weighted function based method has been demonstrated with the clustering algorithms, viz., K-means [1], Minimization of Disagreement (MIND) [2], Divisive Correlation Clustering Algorithm (DCCA) [3], Average Correlation Clustering Algorithm (ACCA) [4] and Bi-Correlation Clustering Algorithm (BCCA) [5] on a yeast gene expression dataset (Yeast Cheng and Church dataset from Yeast Functional Genomics Database [http://yfgdb.princeton.edu/]). Assessment of the results has been done by using P-values on functional annotations. P-values less than 5.0 × 10^-7 are reported as enriched functional categories. Figure 1 shows the number of functionally enriched attributes in the most enriched clusters obtained by each of the clustering and biclustering algorithms on the yeast gene expression dataset. The results suggest that the new weighted function based method significantly improves performance of all the cases, in terms of finding biologically relevant groups of genes.

References

Jain AK, Dubes RC: Algorithms for Clustering Data. New Jersey: Prentice Hall; 1988.
Google Scholar
Bansal N, Blum A, Chawla S: Correlation clustering. Machine Learning 2004, 56: 89–113. 10.1023/B:MACH.0000033116.57574.95
Article Google Scholar
Bhattacharya A, De RK: Divisive correlation clustering algorithm (DCCA) for grouping of genes: Detecting varying patterns in expression profiles. Bioinformatics 2008, 24: 1359–1366. 10.1093/bioinformatics/btn133
Article CAS PubMed Google Scholar
Bhattacharya A, De RK: Average correlation clustering algorithm (ACCA) for grouping of co-regulated genes with similar pattern of variation in their expression values. Journal of Biomedical Informatics 2010, 43: 560–568. 10.1016/j.jbi.2010.02.001
Article CAS PubMed Google Scholar
Bhattacharya A, De RK: Bi-correlation clustering algorithm for determining a set of co-regulated genes. Bioinformatics 2009, 25: 2795–2801. 10.1093/bioinformatics/btp526
Article CAS PubMed Google Scholar

Download references

Author information

Authors and Affiliations

Center for Integrative and Translational Genomics, University of Tennessee Health Science Center, Memphis, TN, 38163, USA
Anindya Bhattacharya
Machine Intelligence Unit, Indian Statistical Institute, Kolkata, India
Rajat K De

Authors

Anindya Bhattacharya
View author publications
You can also search for this author in PubMed Google Scholar
Rajat K De
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Anindya Bhattacharya.

Rights and permissions

This article is published under license to BioMed Central Ltd. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Reprints and permissions

About this article

Cite this article

Bhattacharya, A., De, R.K. A novel noise handling method to improve clustering of gene expression patterns. BMC Bioinformatics 12 (Suppl 7), A3 (2011). https://doi.org/10.1186/1471-2105-12-S7-A3

Download citation

Published: 05 August 2011
DOI: https://doi.org/10.1186/1471-2105-12-S7-A3

UT-ORNL-KBRIN Bioinformatics Summit 2011

A novel noise handling method to improve clustering of gene expression patterns

Background

Results

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

BMC Bioinformatics

Contact us

UT-ORNL-KBRIN Bioinformatics Summit 2011

A novel noise handling method to improve clustering of gene expression patterns

Background

Results

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

BMC Bioinformatics

Contact us