High-throughput proteogenomics of Ruegeria pomeroyi: seeding a better genomic annotation for the whole marine Roseobacter clade
Citation and License
BMC Genomics 2012, 13:73 doi:10.1186/1471-2164-13-73Published: 15 February 2012
The structural and functional annotation of genomes is now heavily based on data obtained using automated pipeline systems. The key for an accurate structural annotation consists of blending similarities between closely related genomes with biochemical evidence of the genome interpretation. In this work we applied high-throughput proteogenomics to Ruegeria pomeroyi, a member of the Roseobacter clade, an abundant group of marine bacteria, as a seed for the annotation of the whole clade.
A large dataset of peptides from R. pomeroyi was obtained after searching over 1.1 million MS/MS spectra against a six-frame translated genome database. We identified 2006 polypeptides, of which thirty-four were encoded by open reading frames (ORFs) that had not previously been annotated. From the pool of 'one-hit-wonders', i.e. those ORFs specified by only one peptide detected by tandem mass spectrometry, we could confirm the probable existence of five additional new genes after proving that the corresponding RNAs were transcribed. We also identified the most-N-terminal peptide of 486 polypeptides, of which sixty-four had originally been wrongly annotated.
By extending these re-annotations to the other thirty-six Roseobacter isolates sequenced to date (twenty different genera), we propose the correction of the assigned start codons of 1082 homologous genes in the clade. In addition, we also report the presence of novel genes within operons encoding determinants of the important tricarboxylic acid cycle, a feature that seems to be characteristic of some Roseobacter genomes. The detection of their corresponding products in large amounts raises the question of their function. Their discoveries point to a possible theory for protein evolution that will rely on high expression of orphans in bacteria: their putative poor efficiency could be counterbalanced by a higher level of expression. Our proteogenomic analysis will increase the reliability of the future annotation of marine bacterial genomes.