Variation of area under the ROC curve when different number of features are used. The features are sorted by applying FeaLect on 20 random training samples. Then, the training samples and the highly scored features are considered to build linear classifiers by lars. The best AUC is reported by testing on a set of validating samples disjoint from the training set. For both lymphoma and colon datasets, the performance of the optimum classifier decreases if all features are provided to lars. This observation practically shows the advantage of using a limited number of highly scored features over pure lars.
Zare et al. BMC Genomics 2013 14(Suppl 1):S14 doi:10.1186/1471-2164-14-S1-S14