The effect of the random sampling approach for minimizing GC content influences. (A). The average GC content in bins of 100 bps in the region -1 kb to +1 kb is shown for 159 promoters of genes with small and large intestine-specific expression (black), and for the same number of sequences randomly sampled from the genomic set of promoters with k = 1 (blue), k = 2 (red), and k = 3 (green). Values of sampled sets are mean values with bars representing the standard deviation based on 500 sampled sets. (B). For the same dataset, the average RMSD of GC content is shown for k = 1 to 10. In this case k* is set to 2.
Vandenbon et al. BMC Bioinformatics 2013 14:26 doi:10.1186/1471-2105-14-26