Log on / register
Feedback | Support | My details
Open AccessResearch article

A novel scoring schema for peptide identification by searching protein sequence databases using tandem mass spectrometry data

Zhuo Zhang1* email, Shiwei Sun2* email, Xiaopeng Zhu1 email, Suhua Chang2 email, Xiaofei Liu2 email, Chungong Yu2 email, Dongbo Bu2 email and Runsheng Chen1,2 email

Institute of Biophysics, Chinese Academy of Sciences, Beijing, P. R. China

Institute of Computing Technology, Chinese Academy of Sciences, Beijing, P. R. China

author email corresponding author email* Contributed equally

BMC Bioinformatics 2006, 7:222doi:10.1186/1471-2105-7-222

Published: 26 April 2006

Abstract

Background

Tandem mass spectrometry (MS/MS) is a powerful tool for protein identification. Although great efforts have been made in scoring the correlation between tandem mass spectra and an amino acid sequence database, improvements could be made in three aspects, including characterization ofpeaks in spectra, adoption of effective scoring functions and access to thereliability of matching between peptides and spectra.

Results

A novel scoring function is presented, along with criteria to estimate the performance confidence of the function. Through learning the typesof product ions and the probability of generating them, a hypothetic spectrum was generated for each candidate peptide. Then relative entropy was introduced to measure the similarity between the hypothetic and the observed spectra. Based on the extreme value distribution (EVD) theory, a threshold was chosen to distinguish a true peptide assignment from a random one. Tests on a public MS/MS dataset demonstrated that this method performs better than the well-known SEQUEST.

Conclusion

A reliable identification of proteins from the spectra promises a more efficient application of tandem mass spectrometry to proteomes with high complexity.


© 1999-2009 BioMed Central Ltd unless otherwise stated. Part of Springer Science+Business Media.