Simulation results and junction detection. (A) Actual mismatch frequency by read position. Replicates (technical and biological) of female (red) and male samples (blue) are indicated. (B) Accuracy of annotated junction detection. Recovered junction coverage (y-axis) compared to actual coverage (x-axis) in simulated input. Read counts (+1) in log10 scale. (C) Sensitivity of junction detection (1 – false negative rate). Receiver operator characteristic (ROC) curve of splice junction detection displays sensitivity as it relates to sequencing depth. Results represent TopHat mapping with annotation (dashed line), and without annotation (solid line). (D) Junction detection in subsamples of real data in read pools of increasing sequencing depth (10–100 million reads in increments of 10 million). Junctions detected with at least one read (black line), or with ≥10 reads (green line) are indicated. For each pool, the additional junctions detected relative to the previous pool are indicated. Total cumulative false positive junction detections in each pool (dashed line). (E) Transcript coverage in subsamples of real data. Annotated transcripts detected with at least 6x coverage (black line) in each subsampled pool of real data. (F) Junction detection false positive rate in simulated data pre- (solid line) and post-filtering (dashed line). False positive rate is in percent of all annotated junctions with simulated reads, and is not cumulative. (G) False negative rate of junction detection due to alignment failure (i.e., not due to sampling), when at least one junction spanning read is generated from simulated transcripts. (H) False negative rate of junction detection due to sampling. Rates are for detecting at least one junction spanning read (coverage ≥ 1, purple lines), or for detecting an entropy score ≥ 2 (orange lines).
Sturgill et al. BMC Bioinformatics 2013 14:320 doi:10.1186/1471-2105-14-320