BMC Bioinformatics
|
Viewing options:Associated material:Related literature:- Articles citing this article
- Other articles by authors
- Related articles/pages
Tools:Post to:
|
 Methodology articleA robust measure of correlation between two genes on a microarrayJohanna Hardin1 , Aya Mitani2 , Leanne Hicks3 and Brian VanKoten4  1
Department of Mathematics, Pomona College, Claremont, CA 91711, USA 2
Department of Mathematics, Pitzer College, Claremont, CA 91711, USA 3
Department of Statistics, University of Nebraska, Lincoln, NE 68588, USA 4
Department of Mathematics, Lewis and Clark College, Portland, OR 97219, USA author email corresponding author email
BMC Bioinformatics 2007,
8:220doi:10.1186/1471-2105-8-220 Abstract
Background
The underlying goal of microarray experiments is to identify gene expression patterns across different experimental conditions. Genes that are contained in a particular pathway or that respond similarly to experimental conditions could be co-expressed and show similar patterns of expression on a microarray. Using any of a variety of clustering methods or gene network analyses we can partition genes of interest into groups, clusters, or modules based on measures of similarity. Typically, Pearson correlation is used to measure distance (or similarity) before implementing a clustering algorithm. Pearson correlation is quite susceptible to outliers, however, an unfortunate characteristic when dealing with microarray data (well known to be typically quite noisy.)
Results
We propose a resistant similarity metric based on Tukey's biweight estimate of multivariate scale and location. The resistant metric is simply the correlation obtained from a resistant covariance matrix of scale. We give results which demonstrate that our correlation metric is much more resistant than the Pearson correlation while being more efficient than other nonparametric measures of correlation (e.g., Spearman correlation.) Additionally, our method gives a systematic gene flagging procedure which is useful when dealing with large amounts of noisy data.
Conclusion
When dealing with microarray data, which are known to be quite noisy, robust methods should be used. Specifically, robust distances, including the biweight correlation, should be used in clustering and gene network analysis. |