Email updates

Keep up to date with the latest news and content from BMC Bioinformatics and BioMed Central.

Open Access Highly Accessed Database

ELM: enhanced lowest common ancestor based method for detecting a pathogenic virus from a large sequence dataset

Keisuke Ueno1, Akihiro Ishii2 and Kimihito Ito1*

Author Affiliations

1 Division of Bioinformatics, Research Center for Zoonosis Control, Hokkaido University, Sapporo, Hokkaido 001-0020, Japan

2 Hokudai Center for Zoonosis Control in Zambia, Research Center for Zoonosis Control, Hokkaido University, Sapporo, Hokkaido 001-0020, Japan

For all author emails, please log on.

BMC Bioinformatics 2014, 15:254  doi:10.1186/1471-2105-15-254

Published: 28 July 2014

Abstract

Background

Emerging viral diseases, most of which are caused by the transmission of viruses from animals to humans, pose a threat to public health. Discovering pathogenic viruses through surveillance is the key to preparedness for this potential threat. Next generation sequencing (NGS) helps us to identify viruses without the design of a specific PCR primer. The major task in NGS data analysis is taxonomic identification for vast numbers of sequences. However, taxonomic identification via a BLAST search against all the known sequences is a computational bottleneck.

Description

Here we propose an enhanced lowest-common-ancestor based method (ELM) to effectively identify viruses from massive sequence data. To reduce the computational cost, ELM uses a customized database composed only of viral sequences for the BLAST search. At the same time, ELM adopts a novel criterion to suppress the rise in false positive assignments caused by the small database. As a result, identification by ELM is more than 1,000 times faster than the conventional methods without loss of accuracy.

Conclusions

We anticipate that ELM will contribute to direct diagnosis of viral infections. The web server and the customized viral database are freely available at http://bioinformatics.czc.hokudai.ac.jp/ELM/ webcite.

Keywords:
Next generation sequencing; Virus discovery; Diagnostic virology; Virome; Taxonomic identification