Figure 1.

Depth of transcriptome sequencing coverage. The number of RefSeq NM genes detected by transcriptome sequencing increases with the total number of reads that hit any NM gene, N, accumulated in multiple sequencing runs. The coverage curves, C, show how the number of genes detected approaches saturation for (a) the A sample and (b) the B sample as the total number of reads that align to any NM gene (with e ≤ 10-20) increases using both the ODT (blue diamonds, solid line) and TSEQ (red squares, dashed line) methods of sample preparation. The figures also show the numbers of genes that receive at least 10 and 100 BLAST hits from the GS FLX reads (for ~1× sequence coverage of a typical 2500 bp mRNA). The single points at the far right of the figures show the combined results from both sample preparation methods. Since the coverage function for random sampling obeys the approximate scaling relationship, Cn(N) ~Cn/x(N/x), where n is the minimum number of hits, the coverage curve for 10 hits can be predicted from the empirical results for 1 hit, and the coverage curve for 100 hits from the results for 10 hits, as indicted by the dotted curves. [See Additional file 1for Supplementary Analysis]

Mane et al. BMC Genomics 2009 10:264   doi:10.1186/1471-2164-10-264
