Genomic mapping and sequence length. Randomly generated sets of 50,000 (A|G|C|T) sequences of varying lengths described on the X-axis were mapped to the mouse genome using bowtie. The total number of mappings of each sequence that either mapped to no region (black triangle) mapped to multiple regions (blue diamond) or mapped to a single region (red square) were plotted for each sequence length using bowtie mapping software allowing A) 3mismatches B) 2 mismatches C) 1 mismatch and D) 0 mismatches. At intermediate sequence lengths for a fixed number of mismatches, ~20 % of randomly generated sequences were capable of uniquely mapping to the genome.
Sarver et al. BMC Bioinformatics 2012 13:154 doi:10.1186/1471-2105-13-154