|
Resolution: standard / high Figure 1.
Sensitivity and Calibration analysis of candidate statistics on simulated contamination
and duplicated replicates RNA-Seq datasets. One in silico replicate out of a pair was successively contaminated by reads from
a biological replicate. Pearson’s r and Kappa showed no obvious changes between different degrees of contamination and
perfect replication (0% contamination). SERE on the contrary was sensitive as early
as 25% of the reads were originated from the biological replicate. Marked differences
appeared as soon as the contamination reached 50% (SERE = 1.04). For duplicated datasets
(“D”, identical data), which can either result from a “copy and paste” error or data
falsification, Pearson’s r and Kappa are 1.0 suggesting perfect replication, although duplicates are imperfect
replicates. SERE clearly discriminates between duplicated replicates (SERE = 0.0)
and perfect replicates (SERE = 1.0). All computations were performed on RNA-Seq sample
“control 1”, which was randomly split in a pair of in silico replicates (5 million
reads per sample). Then, one in silico sample was contaminated to different degrees
by reads originating from “control 2”. The procedure was repeated 200 times. D: duplicates.
Schulze et al. BMC Genomics 2012 13:524 doi:10.1186/1471-2164-13-524 |