This article is part of the supplement: International Workshop on Computational Systems Biology: Approaches to Analysis of Genome Complexity and Regulatory Gene Networks

Open Access Research

Reducing the haystack to find the needle: improved protein identification after fast elimination of non-interpretable peptide MS/MS spectra and noise reduction

Nedim Mujezinovic1, Georg Schneider2, Michael Wildpaner3, Karl Mechtler4 and Frank Eisenhaber256*

Author Affiliations

1 Sarajevo School of Science and Technology, Bistrik 7, Sarajevo 71000, Bosnia-Herzegovina

2 Bioinformatics Institute (BII), A*STAR, Biopolis, 30 Biopolis Street, #07-01 Matrix Bldg., Singapore 138671

3 Google Switzerland GmbH, Brandschenkestra├če 110, 8002 Zuerich, Switzerland

4 Research Institute of Molecular Pathology, Dr. Bohr-Gasse 7, A-1030 Vienna, Austria

5 Department of Biological Sciences (DBS), National University of Singapore (NUS), 8 Medical Drive, Singapore 117597

6 School of Computater Engineering (SCE), Nanyang Technological University (NTU), 50 Nanyang Drive, Singapore 637553

For all author emails, please log on.

BMC Genomics 2010, 11(Suppl 1):S13  doi:10.1186/1471-2164-11-S1-S13

Published: 10 February 2010



Tandem mass spectrometry (MS/MS) has become a standard method for identification of proteins extracted from biological samples but the huge number and the noise contamination of MS/MS spectra obstruct swift and reliable computer-aided interpretation. Typically, a minor fraction of the spectra per sample (most often, only a few %) and about 10% of the peaks per spectrum contribute to the final result if protein identification is not prevented by the noise at all.


Two fast preprocessing screens can substantially reduce the haystack of MS/MS data. (1) Simple sequence ladder rules remove spectra non-interpretable in peptide sequences. (2) Modified Fourier-transform-based criteria clear background in the remaining data. In average, only a remainder of 35% of the MS/MS spectra (each reduced in size by about one quarter) has to be handed over to the interpretation software for reliable protein identification essentially without loss of information, with a trend to improved sequence coverage and with proportional decrease of computer resource consumption.


The search for sequence ladders in tandem MS/MS spectra with subsequent noise suppression is a promising strategy to reduce the number of MS/MS spectra from electro-spray instruments and to enhance the reliability of protein matches. Supplementary material and the software are available from an accompanying WWW-site with the URL webcite.