Email updates

Keep up to date with the latest news and content from BMC Bioinformatics and BioMed Central.

This article is part of the supplement: Selected articles from the Eleventh Asia Pacific Bioinformatics Conference (APBC 2013): Bioinformatics

Open Access Open Badges Proceedings

Mass spectrometry-based protein identification by integrating de novo sequencing with database searching

Penghao Wang1* and Susan R Wilson12

Author affiliations

1 Prince of Wales Clinical School, University of New South Wales, Australia

2 Mathematical Sciences Institute, Australian National University, Australia

For all author emails, please log on.

Citation and License

BMC Bioinformatics 2013, 14(Suppl 2):S24  doi:10.1186/1471-2105-14-S2-S24

Published: 21 January 2013



Mass spectrometry-based protein identification is a very challenging task. The main identification approaches include de novo sequencing and database searching. Both approaches have shortcomings, so an integrative approach has been developed. The integrative approach firstly infers partial peptide sequences, known as tags, directly from tandem spectra through de novo sequencing, and then puts these sequences into a database search to see if a close peptide match can be found. However the current implementation of this integrative approach has several limitations. Firstly, simplistic de novo sequencing is applied and only very short sequence tags are used. Secondly, most integrative methods apply an algorithm similar to BLAST to search for exact sequence matches and do not accommodate sequence errors well. Thirdly, by applying these methods the integrated de novo sequencing makes a limited contribution to the scoring model which is still largely based on database searching.


We have developed a new integrative protein identification method which can integrate de novo sequencing more efficiently into database searching. Evaluated on large real datasets, our method outperforms popular identification methods.