Figure 1.

Sensitivity and Calibration analysis of candidate statistics on simulated contamination and duplicated replicates RNA-Seq datasets. One in silico replicate out of a pair was successively contaminated by reads from a biological replicate. Pearson’s r and Kappa showed no obvious changes between different degrees of contamination and perfect replication (0% contamination). SERE on the contrary was sensitive as early as 25% of the reads were originated from the biological replicate. Marked differences appeared as soon as the contamination reached 50% (SERE = 1.04). For duplicated datasets (“D”, identical data), which can either result from a “copy and paste” error or data falsification, Pearson’s r and Kappa are 1.0 suggesting perfect replication, although duplicates are imperfect replicates. SERE clearly discriminates between duplicated replicates (SERE = 0.0) and perfect replicates (SERE = 1.0). All computations were performed on RNA-Seq sample “control 1”, which was randomly split in a pair of in silico replicates (5 million reads per sample). Then, one in silico sample was contaminated to different degrees by reads originating from “control 2”. The procedure was repeated 200 times. D: duplicates.

Schulze et al. BMC Genomics 2012 13:524   doi:10.1186/1471-2164-13-524
Download authors' original image