Figure 4.

Effect of the number of k-mers used for three modeling approaches. The performance of three modeling approaches was measured from 10-fold cross-validation. Each bar is the AUC value of the experiment. X-axis is the number of most significant variables (p-value in t-test) used in each experiment. Consistently in 4-mer to 6-mer and regardless of number of patterns, segment modeling outperformed other modeling approaches. More importantly, from the experiments using variable numbers of k-mers from 10 to 100, we have shown that the selection of k-mers does not have a big impact on the model performances and the higher accuracies of the segment modeling approach, compared to the promoter and site-specific modeling approaches, is likely due to the effectiveness of the segment model.

Yang et al. BMC Bioinformatics 2012 13(Suppl 3):S15   doi:10.1186/1471-2105-13-S3-S15