Email updates

Keep up to date with the latest news and content from BMC Genomics and BioMed Central.

Open Access Research article

A proteogenomic update to Yersinia: enhancing genome annotation

Samuel H Payne*, Shih-Ting Huang and Rembert Pieper

Author Affiliations

J Craig Venter Institute, 9704 Medical Center Drive, Rockville, MD 20850, USA

For all author emails, please log on.

BMC Genomics 2010, 11:460  doi:10.1186/1471-2164-11-460

Published: 5 August 2010

Abstract

Background

Modern biomedical research depends on a complete and accurate proteome. With the widespread adoption of new sequencing technologies, genome sequences are generated at a near exponential rate, diminishing the time and effort that can be invested in genome annotation. The resulting gene set contains numerous errors in even the most basic form of annotation: the primary structure of the proteins.

Results

The application of experimental proteomics data to genome annotation, called proteogenomics, can quickly and efficiently discover misannotations, yielding a more accurate and complete genome annotation. We present a comprehensive proteogenomic analysis of the plague bacterium, Yersinia pestis KIM. We discover non-annotated genes, correct protein boundaries, remove spuriously annotated ORFs, and make major advances towards accurate identification of signal peptides. Finally, we apply our data to 21 other Yersinia genomes, correcting and enhancing their annotations.

Conclusions

In total, 141 gene models were altered and have been updated in RefSeq and Genbank, which can be accessed seamlessly through any NCBI tool (e.g. blast) or downloaded directly. Along with the improved gene models we discover new, more accurate means of identifying signal peptides in proteomics data.