Building Classifier Training Data. This diagram describes the construction of training data. A set of protein sequences undergo a trypsin in silico digestion to form a collection of tryptic peptides. A number (m) of peptide physicochemical properties are computed for each peptide (for peptide i, properties 1-m are denoted in the figure as: Vi,1, Vi,2, Vi,3,... Vi,m,) and prior MS results are searched to determine if the peptide has been observed or not (for peptide i, the detection call is denoted in the figure as Di). The resulting training data forms a matrix of values where each row represents the values related to a particular peptide. This output training data associates peptide properties with the MS detection call and will later train a classifier to produce peptide detection probabilities based on peptide physicochemical properties.
Braisted et al. BMC Bioinformatics 2008 9:529 doi:10.1186/1471-2105-9-529