This article is part of the supplement: Selected articles from the Second IEEE International Conference on Computational Advances in Bio and Medical Sciences (ICCABS 2012): Genomics
Filtering of MS/MS data for peptide identification
1 SUNY Binghamton Computer Science Department, Binghamton, NY, USA
2 SUNY Binghamton Biological Sciences Department, Binghamton, NY, USA
BMC Genomics 2013, 14(Suppl 7):S2 doi:10.1186/1471-2164-14-S7-S2Published: 5 November 2013
The identification of proteins based on analysis of tandem mass spectrometry (MS/MS) data is a valuable tool that is not fully realized because of the difficulty in carrying out automated analysis of large numbers of spectra. MS/MS spectra consist of peaks that represent each peptide fragment, usually b and y ions, with experimentally determined mass to charge ratios. Whether the strategy employed is database matching or De Novo sequencing, a major obstacle is distinguishing signal from noise. Improved ability to distinguish signal peaks of low intensity from background noise increases the likelihood of correctly identifying the peptide, as valuable information is preserved while extraneous information is not left to mislead.
This paper introduces an automated noise filtering method based on the construction of orthogonal polynomials. By subdividing the spectrum into a variable number (3 to 11) of bins, peaks that are considered "noise" are identified at a local level. Using a De Novo sequencing algorithm that we are developing, this filtering method was applied to a published dataset of more than 3000 mass spectra and an original dataset of more than 300 spectra. The samples were peptides from purified known proteins; therefore, the solutions could be compared to the correct sequences and the peaks corresponding to b, y and other fragments of significance could be identified. The same procedure was applied using two other published filtering methods. The ratios of the number of significant peaks that were preserved relative to the total number of peaks in each spectrum were determined. In the event that filtering out too many or too few signal peaks can lead to inaccuracy in sequence determination, the percentage of amino acid residues in the correct positions relative to the total number of amino acid residues in the correct sequence was also calculated for each sequence determined.
The results show that an orthogonal polynomial-based method of distinguishing signal peaks from background in mass spectra preserves a greater portion of signal peaks than compared methods, improving accuracy in sequence determination.