Email updates

Keep up to date with the latest news and content from BMC Genomics and BioMed Central.

This article is part of the supplement: Selected articles from the Eleventh Asia Pacific Bioinformatics Conference (APBC 2013): Genomics

Open Access Proceedings

Genome reassembly with high-throughput sequencing data

Nathaniel Parrish1, Benjamin Sudakov2 and Eleazar Eskin1*

Author affiliations

1 Department of Computer Science, University of California Los Angeles, Los Angeles, California, USA

2 Department of Mathematics, University of California Los Angeles, Los Angeles, California, USA

For all author emails, please log on.

Citation and License

BMC Genomics 2013, 14(Suppl 1):S8  doi:10.1186/1471-2164-14-S1-S8

Published: 21 January 2013

Abstract

Motivation

Recent studies in genomics have highlighted the significance of structural variation in determining individual variation. Current methods for identifying structural variation, however, are predominantly focused on either assembling whole genomes from scratch, or identifying the relatively small changes between a genome and a reference sequence. While significant progress has been made in recent years on both de novo assembly and resequencing (read mapping) methods, few attempts have been made to bridge the gap between them.

Results

In this paper, we present a computational method for incorporating a reference sequence into an assembly algorithm. We propose a novel graph construction that builds upon the well-known de Bruijn graph to incorporate the reference, and describe a simple algorithm, based on iterative message passing, which uses this information to significantly improve assembly results. We validate our method by applying it to a series of 5 Mb simulation genomes derived from both mammalian and bacterial references. The results of applying our method to this simulation data are presented along with a discussion of the benefits and drawbacks of this technique.