Figure 5.

Distributions of cumulative information in LogoAlign results. A. The cumulative information (gap penalized) for the 936 non-redundant motifs identified by LogoAlign (1) is compared to 65,000 alignments of randomly chosen 12-bp windows from each input Logo with a maximum of 20% gaps per position per input Logo (2) and 20,000 biased samples in which the most information-rich 12-bp window from one randomly chosen input Logo is aligned to randomly chosen gap-free 12-bp windows from the other nine input Logos (3). The LogoAlign results range from 13.6 standard deviations to 22.6 standard deviations above the mean of the unbiased distribution. B. The distribution of results produced by LogoAlign (black) is compared to results from a search of random sequences (gray) with similar base composition and phylogenetic structure (see Methods). All sequences used to produce these distributions were trimmed (i.e., end-gaps were removed) in order to prevent over-accumulation of gaps within the scrambled control set. The result is bimodal (one-tailed t-test; p-value << 0.0001), and the best motifs found within V1R promoters are significantly more information-rich than those identified in the control (Z-score = ~8.0; p-value = 10-15).

Stewart and Lane BMC Genomics 2007 8:253   doi:10.1186/1471-2164-8-253
