Table 2

SAMQA Biological Tests

Biological Tests

Inclusion Criteria


Mapping quality

Low Phred-adjusted mapping quality score


Read length

Shortened read lengths for a given sequencing technology


Read count

Low aggregate number of reads for a given sequencing technology


Read frequency

Low number of reads for a given set of kilobase regions


Coverage

Low coverage for a given read group, chromosome, or kilobase region


Structural variations

High numbers of localized structural variation


Anomalous sequence data

Instances of "random" chromosomes from human assembly [8]


Population estimates of structural variation

Very high projected structural variation across different platform units


Read group correlation

Low mapping quality correlation for megabase regions, across read groups

Low coverage correlation of megabase regions, across read groups


These tests extract useful, biological features from the data for expert analysis. Other extraction tools (e.g. detection of polyadenylation within individual sequences, determinants of the feature-dimensional "shape" of the data, as through multidimensional Bayesian analysis) may be added as appropriate to the data or downstream analysis requires.

Robinson et al. BMC Genomics 2011 12:419   doi:10.1186/1471-2164-12-419

Open Data