Email updates

Keep up to date with the latest news and content from BMC Bioinformatics and BioMed Central.

Open Access Highly Accessed Research article

Comparison of co-expression measures: mutual information, correlation, and model based indices

Lin Song12, Peter Langfelder1 and Steve Horvath12*

Author Affiliations

1 Human Genetics, David Geffen School of Medicine, University of California, California, Los Angeles, USA

2 Biostatistics, School of Public Health, University of California, California, Los Angeles, USA

For all author emails, please log on.

BMC Bioinformatics 2012, 13:328  doi:10.1186/1471-2105-13-328

Published: 9 December 2012

Additional files

Additional file 1:

Detailed methods descriptions. In this document, we provide detail information of entropy, mutual information, likelihood ratio test statistics and p-value calculation of correlation coefficients.

Format: PDF Size: 88KB Download file

This file can be viewed with: Adobe Acrobat Reader

Open Data

Additional file 2:

Empirical analysis using large number of genes in the mouse adipose and ND data sets. Page one is an empirical analysis using all 23568 genes without restricting to 3000 genes for the mouse adipose data set. (A) Absolute value of bicor versus AMI,UniversalVersion2. One million randomly sampled gene pairs are plotted to reduce computational burden. The two measures show good monotonic relationship. The red curve predicts AMI,UniversalVersion2 from bicor. The blue circle highlights the probe pair with the highest AMI,UniversalVersion2 z-score among those with insignificant bicor z-scores (less than 1.9 ); the red circle highlights the probe pair with the highest bicor z-score among those with insignificant AMI,UniversalVersion2 z-scores (less than 1.9 ). Red and blue circles are selected based on all gene pairs rather than sampled ones. (B) Prediction from bicor based on Eq. 18 and observed AMI,UniversalVersion2 are highly correlated. As in (A), one million randomly sampled gene pairs are plotted. Line y=x is added. (C) Gene expression of probe pairs highlighted by blue circles. (D) Gene expression of probe pairs highlighted by red circles.

Page two is the same analysis for ND data set using 10000 randomly selected genes rather than 3000 genes with highest variance.

Format: PDF Size: 448KB Download file

This file can be viewed with: Adobe Acrobat Reader

Open Data

Additional file 3:

Comparison of MIC and correlation based co-expression measures. Comparison of MIC and correlation in our empirical gene expression data sets except SAFHS. This is an extension of Figure 6. 5 best GO enrichment p-values from all modules identified using MIC and TOM are log transformed, pooled together and shown as barplots. Error bars stand for 95% confidence intervals. On top of each panel is a p-value based on multi-group comparison test. TOM outperforms MIC in all data sets except the mouse brain data.

Format: PDF Size: 6KB Download file

This file can be viewed with: Adobe Acrobat Reader

Open Data

Additional file 4:

Compare polynomial and spline regression models to correlation or mutual information based co-expression measures in simulation. Each point corresponds to a pair of numeric vectors x and y with length m = 1000. Data is simulated as in Figure 1. (A) Square root of R2 from polynomial regression symmetrized by Eq. 5 versus absolute Pearson correlation values. The two measures are indistinguishable since the data is simulated to exhibit linear relationships. (B) R2 from polynomial regression symmetrized by Eq. 5 versus AMI,UniversalVersion2. The red line predicts AMI,UniversalVersion2 from R2. (C-D) Same plots for spline regression models.

Format: PDF Size: 558KB Download file

This file can be viewed with: Adobe Acrobat Reader

Open Data

Additional file 5:

Polynomial and spline regression models for estimating non-linear relationships in real data application. In this document, we use polynomial and spline regression models to estimate non-linear relationships in real data applications.

Format: PDF Size: 1.7MB Download file

This file can be viewed with: Adobe Acrobat Reader

Open Data

Additional file 6:

The relationship between module size and gene ontology enrichment p-values in 8 real data applications. In each panel, module size (x-axis) is plotted against −log10 GO enrichment p-values (y-axis)in dots. Loess regression lines are provided to show the trend. Red and black color represent network modules constructed using TOM and AMI,UniversalVersion2 based measures, respectively. In most data sets, the enrichment of modules defined by TOM is better than that of comparably sized modules defined by AMI,UniversalVersion2.

Format: PDF Size: 155KB Download file

This file can be viewed with: Adobe Acrobat Reader

Open Data

Additional file 7:

Comparison of bicor, Pearson correlation and Spearman correlation based signed adjacency in 8 empirical data sets. Each panel show the −log10 transformed 5 best gene ontology enrichment p-values of all modules identified using each type of adjacency. Error bars stand for 95% confidence intervals. On top of each panel is a p-value based on multi-group comparison test. All three types of correlation are similar in terms of GO enrichment.

Format: PDF Size: 15KB Download file

This file can be viewed with: Adobe Acrobat Reader

Open Data