Email updates

Keep up to date with the latest news and content from BMC Research Notes and BioMed Central.

Open Access Short Report

Comparison of threshold selection methods for microarray gene co-expression matrices

Bhavesh R Borate1, Elissa J Chesler3, Michael A Langston2, Arnold M Saxton4* and Brynn H Voy3

Author Affiliations

1 Genome Science and Technology Program, University of Tennessee, Knoxville, Tennessee, USA

2 Department of Electrical Engineering and Computer Science, University of Tennessee, Knoxville, Tennessee, USA

3 Oak Ridge National Laboratory, Systems Genetics Group, Biosciences Division, Oak Ridge, Tennessee, USA

4 Department of Animal Science, University of Tennessee, Knoxville, Tennessee, USA

For all author emails, please log on.

BMC Research Notes 2009, 2:240  doi:10.1186/1756-0500-2-240

Published: 2 December 2009

Abstract

Background

Network and clustering analyses of microarray co-expression correlation data often require application of a threshold to discard small correlations, thus reducing computational demands and decreasing the number of uninformative correlations. This study investigated threshold selection in the context of combinatorial network analysis of transcriptome data.

Findings

Six conceptually diverse methods - based on number of maximal cliques, correlation of control spots with expressed genes, top 1% of correlations, spectral graph clustering, Bonferroni correction of p-values, and statistical power - were used to estimate a correlation threshold for three time-series microarray datasets. The validity of thresholds was tested by comparison to thresholds derived from Gene Ontology information. Stability and reliability of the best methods were evaluated with block bootstrapping.

Two threshold methods, number of maximal cliques and spectral graph, used information in the correlation matrix structure and performed well in terms of stability. Comparison to Gene Ontology found thresholds from number of maximal cliques extracted from a co-expression matrix were the most biologically valid. Approaches to improve both methods were suggested.

Conclusion

Threshold selection approaches based on network structure of gene relationships gave thresholds with greater relevance to curated biological relationships than approaches based on statistical pair-wise relationships.