Email updates

Keep up to date with the latest news and content from BMC Bioinformatics and BioMed Central.

Open Access Highly Accessed Methodology article

Filtering Genes for Cluster and Network Analysis

David Tritchler134*, Elena Parkhomenko2 and Joseph Beyene12

Author affiliations

1 Department of Biostatistics, University of Toronto, Toronto, Ontario, Canada

2 Department of Child Health Evaluative Sciences, Hospital for Sick Children Research Institute, Toronto, Ontario, Canada

3 Department of Biostatistics, State University of New York at Buffalo, Buffalo, New York, USA

4 Ontario Cancer Institute, Toronto, Ontario, Canada

For all author emails, please log on.

Citation and License

BMC Bioinformatics 2009, 10:193  doi:10.1186/1471-2105-10-193

Published: 23 June 2009

Abstract

Background

Prior to cluster analysis or genetic network analysis it is customary to filter, or remove genes considered to be irrelevant from the set of genes to be analyzed. Often genes whose variation across samples is less than an arbitrary threshold value are deleted. This can improve interpretability and reduce bias.

Results

This paper introduces modular models for representing network structure in order to study the relative effects of different filtering methods. We show that cluster analysis and principal components are strongly affected by filtering. Filtering methods intended specifically for cluster and network analysis are introduced and compared by simulating modular networks with known statistical properties. To study more realistic situations, we analyze simulated "real" data based on well-characterized E. coli and S. cerevisiae regulatory networks.

Conclusion

The methods introduced apply very generally, to any similarity matrix describing gene expression. One of the proposed methods, SUMCOV, performed well for all models simulated.