Figure 2.

Schematic layout of the datasets used for SVM training and testing. The primary dataset consist of non-redundant tetrapeptide caspase substrate cleavage sites obtained from literature (see 1) and an equal number of non-cleavage sites. 1The P4P1 sequences consist of all the sequences in the primary tetrapeptide cleavage site dataset. P4P2' and P14 P10' datasets were derived by extracting subsequence segments from the parent protein chains in the vicinity of the tetrapeptide cleavage sites, as shown in Figure 1. All datasets contain equal number of positive and negative examples.

Wee et al. BMC Bioinformatics 2006 7(Suppl 5):S14   doi:10.1186/1471-2105-7-S5-S14