Email updates

Keep up to date with the latest news and content from BMC Genomics and BioMed Central.

Open Access Methodology article

Bcheck: a wrapper tool for detecting RNase P RNA genes

Dilmurat Yusuf1, Manja Marz23, Peter F Stadler13456 and Ivo L Hofacker1*

Author Affiliations

1 Institute for Theoretical Chemistry, University of Vienna, Währingerstraße 17, A-1090 Wien, Austria

2 Institut für Pharmazeutische Chemie, Philipps Universität Marburg, Marbacher Weg 6, D-35032 Marburg, Germany

3 Bioinformatics Group, Department of Computer Science University of Leipzig, Härtelstrasse 16-18, D-01407, Leipzig, Germany

4 Max Planck Institute for Mathematics in the Sciences, Inselstraße 22 D-04103 Leipzig, Germany

5 Fraunhofer Institute for Cell Therapy and Immunology, Perlickstraße 1, D-04103 Leipzig, Germany

6 Santa Fe Institute, 1399 Hyde Park Rd, Santa Fe, NM 87501, USA

For all author emails, please log on.

BMC Genomics 2010, 11:432  doi:10.1186/1471-2164-11-432

Published: 13 July 2010

Abstract

Background

Effective bioinformatics solutions are needed to tackle challenges posed by industrial-scale genome annotation. We present Bcheck, a wrapper tool which predicts RNase P RNA genes by combining the speed of pattern matching and sensitivity of covariance models. The core of Bcheck is a library of subfamily specific descriptor models and covariance models.

Results

Scanning all microbial genomes in GenBank identifies RNase P RNA genes in 98% of 1024 microbial chromosomal sequences within just 4 hours on single CPU. Comparing to existing annotations found in 387 of the GenBank files, Bcheck predictions have more intact structure and are automatically classified by subfamily membership. For eukaryotic chromosomes Bcheck could identify the known RNase P RNA genes in 84 out of 85 metazoan genomes and 19 out of 21 fungi genomes. Bcheck predicted 37 novel eukaryotic RNase P RNA genes, 32 of which are from fungi. Gene duplication events are observed in at least 20 metazoan organisms. Scanning of meta-genomic data from the Global Ocean Sampling Expedition, comprising over 10 million sample sequences (18 Gigabases), predicted 2909 unique genes, 98% of which fall into ancestral bacteria A type of RNase P RNA and 66% of which have no close homolog to known prokaryotic RNase P RNA.

Conclusions

The combination of efficient filtering by means of a descriptor-based search and subsequent construction of a high-quality gene model by means of a covariance model provides an efficient method for the detection of RNase P RNA genes in large-scale sequencing data.

Bcheck is implemented as webserver and can also be downloaded for local use from http://rna.tbi.univie.ac.at/bcheck webcite