Error probability symmetries for Divergent (a) and (d), Artificial (b) and (e), and Titanium (c) and (f) data sets. (a)-(c): context-independent substitution error probabilities inferred by DADA with 95% confidence intervals based on binomial sampling error. Note the approximate symmetry between i→jand probabilities (which show up contiguously along the y-axis), where denotes the complement of nucleotide i. (d)-(f): All 96 reverse-complementary pairs of context-dependent error probabilities inferred by DADA for each data set. For each pair, the probability of the error away from an A or C is plotted on the x-axis and the error probability away from T or G is plotted on the y-axis. The pairing between these probabilities – seen by the tendency to lie along the diagonal – is stronger for the largest probabilities, which have the least sampling noise. The colors signify complementary pairs of errors red = (A→G,T→C) cyan=(C→T,G→A) green=(A→T,T→A) black=(C→A,G→T) blue=(A→C,T→G) purple=(C→G,G→C).
Rosen et al. BMC Bioinformatics 2012 13:283 doi:10.1186/1471-2105-13-283