Email updates

Keep up to date with the latest news and content from BMC Bioinformatics and BioMed Central.

This article is part of the supplement: Proceedings of the Eleventh Annual Research in Computational Molecular Biology (RECOMB) Satellite Workshop on Comparative Genomics

Open Access Highly Accessed Proceedings

Finishing bacterial genome assemblies with Mix

Hayssam Soueidan1, Florence Maurier2, Alexis Groppi2, Pascal Sirand-Pugnet34, Florence Tardy5, Christine Citti67, Virginie Dupuy8 and Macha Nikolski29*

Author Affiliations

1 Molecular Carcinogenesis, The Netherlands Cancer Institute, 1066CX Amsterdam, The Netherlands

2 Univ. Bordeaux, CBiB, F-33000 Bordeaux, France

3 Univ. Bordeaux, UMR 1332 Biologie du Fruit et Pathologie, F-33140 Villenave d'Ornon, France

4 INRA, UMR 1332 Biologie du Fruit et Pathologie, F-33140 Villenave d'Ornon, France

5 Anses, Laboratoire de Lyon, UMR Mycoplasmoses des Ruminants, F-69364 Lyon, France

6 INRA, UMR1225, F-31076 Toulouse, France

7 Univ. Toulouse, INP-ENVT, UMR1225, F-31076 Toulouse, France

8 CIRAD, UMR CMAEE, Campus de Baillarguet, F-34398 Montpellier, France

9 Univ. Bordeaux, CNRS / LaBRI, F33405 Talence, France

For all author emails, please log on.

BMC Bioinformatics 2013, 14(Suppl 15):S16  doi:10.1186/1471-2105-14-S15-S16

Published: 15 October 2013

Abstract

Motivation

Among challenges that hamper reaping the benefits of genome assembly are both unfinished assemblies and the ensuing experimental costs. First, numerous software solutions for genome de novo assembly are available, each having its advantages and drawbacks, without clear guidelines as to how to choose among them. Second, these solutions produce draft assemblies that often require a resource intensive finishing phase.

Methods

In this paper we address these two aspects by developing Mix , a tool that mixes two or more draft assemblies, without relying on a reference genome and having the goal to reduce contig fragmentation and thus speed-up genome finishing. The proposed algorithm builds an extension graph where vertices represent extremities of contigs and edges represent existing alignments between these extremities. These alignment edges are used for contig extension. The resulting output assembly corresponds to a set of paths in the extension graph that maximizes the cumulative contig length.

Results

We evaluate the performance of Mix on bacterial NGS data from the GAGE-B study and apply it to newly sequenced Mycoplasma genomes. Resulting final assemblies demonstrate a significant improvement in the overall assembly quality. In particular, Mix is consistent by providing better overall quality results even when the choice is guided solely by standard assembly statistics, as is the case for de novo projects.

Availability

Mix is implemented in Python and is available at https://github.com/cbib/MIX webcite, novel data for our Mycoplasma study is available at http://services.cbib.u-bordeaux2.fr/mix/ webcite.