Sequencing and arrays show correlated differential expression but sequencing is more susceptible to sampling error. Read counts are not evenly distributed across genes. For the RMg sample, log10 read counts per gene are shown (A), with genes ordered by abundance. The log2 ratio of the medians of six replicate microarray experiments for RM in ethanol vs RM in glucose is compared to the log2 ratio of sequencing read counts. The methods are correlated (R = 0.75356, 95% CI: 0.7236–0.785). Colors indicate significantly differentially expressed genes at a FDR<1% and 1.5 fold or greater change, where significance is determined using Fisher's exact test for the sequencing data and the Mann-Whitney test for the array data. Purple indicates significantly different by both methods, green is significantly different by sequencing only, blue is significantly different by microarrays only, and red is significant by both methods but with opposite directionality (B). Data from (B) but represented as a Venn diagram of significant differences; note in red the 9 genes measured as significantly changed but in opposite directions (C). The results from (B) can be modeled by sampling from binomial distributions for each gene. Here a single random sampling is shown (D). The correlation of log2 expression ratios determined by microarrays and sequencing is highly dependent on the number of read counts per gene. For both the actual data (black), and simulated data (green) with 95% confidence intervals (light green), correlation improves as the thresholds for sequence coverage increase (E).
Bloom et al. BMC Genomics 2009 10:221 doi:10.1186/1471-2164-10-221