Email updates

Keep up to date with the latest news and content from BMC Genomics and BioMed Central.

This article is part of the supplement: Selected articles from the Eleventh Asia Pacific Bioinformatics Conference (APBC 2013): Genomics

Open Access Proceedings

Accelerating read mapping with FastHASH

Hongyi Xin1, Donghyuk Lee1, Farhad Hormozdiari2, Samihan Yedkar1, Onur Mutlu1* and Can Alkan3*

Author Affiliations

1 Depts. of Computer Science and Electrical and Computer Engineering, Carnegie Mellon University, Pittsburgh, PA, 15213, USA

2 Dept. of Computer Science, University of California Los Angeles, Los Angeles, CA, 90095, USA

3 Dept. of Computer Engineering, Bilkent University, Ankara, 06800, Turkey

For all author emails, please log on.

BMC Genomics 2013, 14(Suppl 1):S13  doi:10.1186/1471-2164-14-S1-S13

Published: 21 January 2013

Abstract

With the introduction of next-generation sequencing (NGS) technologies, we are facing an exponential increase in the amount of genomic sequence data. The success of all medical and genetic applications of next-generation sequencing critically depends on the existence of computational techniques that can process and analyze the enormous amount of sequence data quickly and accurately. Unfortunately, the current read mapping algorithms have difficulties in coping with the massive amounts of data generated by NGS.

We propose a new algorithm, FastHASH, which drastically improves the performance of the seed-and-extend type hash table based read mapping algorithms, while maintaining the high sensitivity and comprehensiveness of such methods. FastHASH is a generic algorithm compatible with all seed-and-extend class read mapping algorithms. It introduces two main techniques, namely Adjacency Filtering, and Cheap K-mer Selection.

We implemented FastHASH and merged it into the codebase of the popular read mapping program, mrFAST. Depending on the edit distance cutoffs, we observed up to 19-fold speedup while still maintaining 100% sensitivity and high comprehensiveness.