RNRdb, a curated database of the universal enzyme family ribonucleotide reductase, reveals a high level of misannotation in sequences deposited to Genbank
1 Department of Molecular Biology and Functional Genomics, Stockholm University, Stockholm, Sweden
2 Cellular Biotechnology, Institute for Bioengineering of Catalonia (IBEC), Scientific Park of Barcelona, Barcelona, Spain
3 School of Biological Sciences, University of Canterbury, Christchurch, New Zealand
BMC Genomics 2009, 10:589 doi:10.1186/1471-2164-10-589Published: 8 December 2009
Ribonucleotide reductases (RNRs) catalyse the only known de novo pathway for deoxyribonucleotide synthesis, and are therefore essential to DNA-based life. While ribonucleotide reduction has a single evolutionary origin, significant differences between RNRs nevertheless exist, notably in cofactor requirements, subunit composition and allosteric regulation. These differences result in distinct operational constraints (anaerobicity, iron/oxygen dependence and cobalamin dependence), and form the basis for the classification of RNRs into three classes.
In RNRdb (Ribonucleotide Reductase database), we have collated and curated all known RNR protein sequences with the aim of providing a resource for exploration of RNR diversity and distribution. By comparing expert manual annotations with annotations stored in Genbank, we find that significant inaccuracies exist in larger databases. To our surprise, only 23% of protein sequences included in RNRdb are correctly annotated across the key attributes of class, role and function, with 17% being incorrectly annotated across all three categories. This illustrates the utility of specialist databases for applications where a high degree of annotation accuracy may be important. The database houses information on annotation, distribution and diversity of RNRs, and links to solved RNR structures, and can be searched through a BLAST interface. RNRdb is accessible through a public web interface at http://rnrdb.molbio.su.se webcite.
RNRdb is a specialist database that provides a reliable annotation and classification resource for RNR proteins, as well as a tool to explore distribution patterns of RNR classes. The recent expansion in available genome sequence data have provided us with a picture of RNR distribution that is more complex than believed only a few years ago; our database indicates that RNRs of all three classes are found across all three cellular domains. Moreover, we find a number of organisms that encode all three classes.