Error profile of Pacific Biosciences data.a) A chart showing the number of observations of the alternate allele in all heterozygous sites and how reference bias pulls the median significantly below the expected 0.5. This combination creates multiple possible alignments with the highest alignment score, allowing the aligner in some cases to hide the true alternate allele inside an insertion to maximize the alignment score at the cost of reference bias. b) IGV browser (http://www.broadinstitute.org/igv/ webcite) screenshot of the validation dataset showing an example of a case of aligner-created reference bias on Pacific Biosciences RS data. The true SNPs (C) are correctly called in individual reads. c) An IGV browser[18,19] screen snapshot of a region in the discovery dataset where Illumina HiSeq data suffers from context specific errors that makes it appear as a true heterozygous site whereas Pacific Biosciences RS data (with errors nearly random, though more frequent) clearly shows no event in this region.
Carneiro et al. BMC Genomics 2012 13:375 doi:10.1186/1471-2164-13-375