Comments(1)Good to see criticism of this type of dataNeil Saunders (20 April 2006) University of Queensland It's good to see someone take a critical look at environmental sequence data. One thing that the authors don't mention is the questionable validity of many protein sequences that are annotated as "hypothetical". The Sargasso sequences in GenBank do not appear to have been annotated for 23S rRNA genes and many of the short, hypothetical ORFs are in fact just translated 23S regions. You can see this for yourself if you BLAST a 23S sequence (e.g. from E. coli) versus the env_nt dataset, note the sequence coordinates of the 23S hit then visit the GenBank entry for that hit (e.g. gi 44249358). In many cases the so-called hypothetical ORFs lie in a 23S rDNA gene. Perhaps NCBI and the other databases should consider segregation of environmental data from the bulk of the nr dataset to avoid contamination with junk. Competing interests None declared Have something to say? Post a comment on this article! |




on Google Scholar




