Email updates

Keep up to date with the latest news and content from BMC Bioinformatics and BioMed Central.

Open Access Research article

Speeding up tandem mass spectrometry-based database searching by longest common prefix

Chen Zhou123, Hao Chi123, Le-Heng Wang12, You Li12, Yan-Jie Wu123, Yan Fu12, Rui-Xiang Sun12 and Si-Min He12*

Author Affiliations

1 Key Lab of Intelligent Information Processing, Chinese Academy of Sciences, Beijing 100190, China

2 Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China

3 Graduate University of Chinese Academy of Sciences, Beijing 100049, China

For all author emails, please log on.

BMC Bioinformatics 2010, 11:577  doi:10.1186/1471-2105-11-577

Published: 25 November 2010

Abstract

Background

Tandem mass spectrometry-based database searching has become an important technology for peptide and protein identification. One of the key challenges in database searching is the remarkable increase in computational demand, brought about by the expansion of protein databases, semi- or non-specific enzymatic digestion, post-translational modifications and other factors. Some software tools choose peptide indexing to accelerate processing. However, peptide indexing requires a large amount of time and space for construction, especially for the non-specific digestion. Additionally, it is not flexible to use.

Results

We developed an algorithm based on the longest common prefix (ABLCP) to efficiently organize a protein sequence database. The longest common prefix is a data structure that is always coupled to the suffix array. It eliminates redundant candidate peptides in databases and reduces the corresponding peptide-spectrum matching times, thereby decreasing the identification time. This algorithm is based on the property of the longest common prefix. Even enzymatic digestion poses a challenge to this property, but some adjustments can be made to this algorithm to ensure that no candidate peptides are omitted. Compared with peptide indexing, ABLCP requires much less time and space for construction and is subject to fewer restrictions.

Conclusions

The ABLCP algorithm can help to improve data analysis efficiency. A software tool implementing this algorithm is available at http://pfind.ict.ac.cn/pfind2dot5/index.htm webcite