Email updates

Keep up to date with the latest news and content from BMC Bioinformatics and BioMed Central.

This article is part of the supplement: Italian Society of Bioinformatics (BITS): Annual Meeting 2012

Open Access Highly Accessed Research

GAM-NGS: genomic assemblies merger for next generation sequencing

Riccardo Vicedomini12*, Francesco Vezzi3, Simone Scalabrin2, Lars Arvestad34 and Alberto Policriti12

Author Affiliations

1 Department of Mathematics and Computer Science, University of Udine, 33100 Udine, Italy

2 IGA, Institute of Applied Genomics, 33100 Udine, Italy

3 KTH Royal Institute of Technology, Science for Life Laboratory, School of Computer Science and Communication, 17121 Solna, Sweden

4 Swedish e-Science Research Centre, Dept. of Computer Science and Numerical Analysis, Stockholm University, 17121 Solna, Sweden

For all author emails, please log on.

BMC Bioinformatics 2013, 14(Suppl 7):S6  doi:10.1186/1471-2105-14-S7-S6

Published: 22 April 2013

Abstract

Background

In recent years more than 20 assemblers have been proposed to tackle the hard task of assembling NGS data. A common heuristic when assembling a genome is to use several assemblers and then select the best assembly according to some criteria. However, recent results clearly show that some assemblers lead to better statistics than others on specific regions but are outperformed on other regions or on different evaluation measures. To limit these problems we developed GAM-NGS (Genomic Assemblies Merger for Next Generation Sequencing), whose primary goal is to merge two or more assemblies in order to enhance contiguity and correctness of both. GAM-NGS does not rely on global alignment: regions of the two assemblies representing the same genomic locus (called blocks) are identified through reads' alignments and stored in a weighted graph. The merging phase is carried out with the help of this weighted graph that allows an optimal resolution of local problematic regions.

Results

GAM-NGS has been tested on six different datasets and compared to other assembly reconciliation tools. The availability of a reference sequence for three of them allowed us to show how GAM-NGS is a tool able to output an improved reliable set of sequences. GAM-NGS is also a very efficient tool able to merge assemblies using substantially less computational resources than comparable tools. In order to achieve such goals, GAM-NGS avoids global alignment between contigs, making its strategy unique among other assembly reconciliation tools.

Conclusions

The difficulty to obtain correct and reliable assemblies using a single assembler is forcing the introduction of new algorithms able to enhance de novo assemblies. GAM-NGS is a tool able to merge two or more assemblies in order to improve contiguity and correctness. It can be used on all NGS-based assembly projects and it shows its full potential with multi-library Illumina-based projects. With more than 20 available assemblers it is hard to select the best tool. In this context we propose a tool that improves assemblies (and, as a by-product, perhaps even assemblers) by merging them and selecting the generating that is most likely to be correct.