Resolution:
standard / ## Figure 1.
Sequencing and arrays show correlated differential expression but sequencing is more
susceptible to sampling error. Read counts are not evenly distributed across genes. For the RMg sample, log_{10 }read counts per gene are shown (A), with genes ordered by abundance. The log_{2 }ratio of the medians of six replicate microarray experiments for RM in ethanol vs
RM in glucose is compared to the log_{2 }ratio of sequencing read counts. The methods are correlated (R = 0.75356, 95% CI:
0.7236–0.785). Colors indicate significantly differentially expressed genes at a FDR<1%
and 1.5 fold or greater change, where significance is determined using Fisher's exact
test for the sequencing data and the Mann-Whitney test for the array data. Purple
indicates significantly different by both methods, green is significantly different
by sequencing only, blue is significantly different by microarrays only, and red is
significant by both methods but with opposite directionality (B). Data from (B) but
represented as a Venn diagram of significant differences; note in red the 9 genes
measured as significantly changed but in opposite directions (C). The results from
(B) can be modeled by sampling from binomial distributions for each gene. Here a single
random sampling is shown (D). The correlation of log_{2 }expression ratios determined by microarrays and sequencing is highly dependent on
the number of read counts per gene. For both the actual data (black), and simulated
data (green) with 95% confidence intervals (light green), correlation improves as
the thresholds for sequence coverage increase (E).
Bloom |