The effect of cross-validation on simulated datasets. Percolator was run with and without the cross-validation protocol enabled (blue and green lines, respectively) for 100 simulated datasets. Each dataset contained 2500 synthetic target and decoy PSMs, represented by 50 randomly generated features. 1000 of the target PSMs were intentionally made different, as examples of synthetic "correct" matches. The medians of the 100 runs are shown by full lines, and the lower and upper dashed lines represent the 5% and 95% quartiles. (A) shows the number of synthetic PSMs deemed significant for each q values threshold. (B) shows the fraction of synthetic incorrect PSMs among the accepted PSMs against the estimated q values.
Granholm et al. BMC Bioinformatics 2012 13(Suppl 16):S3 doi:10.1186/1471-2105-13-S16-S3