Total read count (sample size) dependence of candidate statistics comparing perfect replicate RNA-Seq datasets. The Simple Error Ratio Estimate (SERE) was 1 when two replicate RNA-Seq datasets of different sizes were compared. Variation of SERE for repeat computations from independent replicate dataset pairs for each total read count demonstrated a stable 99% confidence interval (CI) of approximately +/- 0.01. The Pearson correlation coefficient fell as read counts decreased. Kappa also strongly depended on the total read count. All computations were performed on 200 model RNA-Seq datasets obtained by drawing reads randomly from a universal read set (described in Methods).
Schulze et al. BMC Genomics 2012 13:524 doi:10.1186/1471-2164-13-524