Transformation and other factors of the peptide mass spectrometry pairwise peak-list comparison process
- Equal contributors
1 Max Planck Institute for Molecular Genetics, Ihnestraβe 63-73, D-14195 Berlin, Germany
2 Institute for Computer Science, Free University Berlin, Takustr. 9, D-14195 Berlin, Germany
3 DFG Research Center for Experimental Biomedicine, University of Würzburg, Versbacherstr. 9, D-97078 Würzburg, Germany
4 Max Delbrück Center for Molecular Medicine, Robert-Roessle-Str. 10, D-13125 Berlin-Buch, Germany
5 Boyce Thompson Institute for Plant Research, Tower Road, Ithaca 14850, NY, USA
6 Institute for Medical Informatics, Biometry and Epidemiology; Charite University Medicine Berlin, Hindenburgdamm 30 (HBD 30), 12200 Berlin
7 School of Mathematics and Statistics, Merz Court, University of Newcastle upon Tyne, NE1 7RU, UK
BMC Bioinformatics 2005, 6:285 doi:10.1186/1471-2105-6-285Published: 30 November 2005
Biological Mass Spectrometry is used to analyse peptides and proteins. A mass spectrum generates a list of measured mass to charge ratios and intensities of ionised peptides, which is called a peak-list. In order to classify the underlying amino acid sequence, the acquired spectra are usually compared with synthetic ones. Development of suitable methods of direct peak-list comparison may be advantageous for many applications.
The pairwise peak-list comparison is a multistage process composed of matching of peaks embedded in two peak-lists, normalisation, scaling of peak intensities and dissimilarity measures. In our analysis, we focused on binary and intensity based measures. We have modified the measures in order to comprise the mass spectrometry specific properties of mass measurement accuracy and non-matching peaks. We compared the labelling of peak-list pairs, obtained using different factors of the pairwise peak-list comparison, as being the same or different to those determined by sequence database searches. In order to elucidate how these factors influence the peak-list comparison we adopted an analysis of variance type method with the partial area under the ROC curve as a dependent variable.
The analysis of variance provides insight into the relevance of various factors influencing the outcome of the pairwise peak-list comparison. For large MS/MS and PMF data sets the outcome of ANOVA analysis was consistent, providing a strong indication that the results presented here might be valid for many various types of peptide mass measurements.