Assessing feature combinations as enhancer signatures with cross validation using Naive Bayes classifications. (A) Pie charts representing the genomic distributions of the co-OSN and co-MYC, unknown training sets. Intergenic regions are defined to be regions ≥ 10 kb away from the closest TSS or transcription end site; whereas upstream regions are regions within 10 kb upstream of TSSs. (B) The first 11 columns depict the features used in each given row (Naive Bayes classifier) and the 12th (Others) column represents the rest of the features listed in Table 1. The capability of each classifier in categorizing co-OSN regions (Enh training set) from co-MYC regions (PrL training set) and unknown is assessed using 10-fold cross validation. The last four columns listing the area under ROC curve (AUC), precision, modified precision (precision*) and recall values are color-coded with red indicating good model performance and blue indicating poor performance. Naive Bayes classifiers with different feature combination are sorted by the average ranking of the four indices.
Chen et al. BMC Genomics 2012 13:152 doi:10.1186/1471-2164-13-152