Genome assembly in the spotlightBiggest ever contest puts genome assemblers through their paces
22 Jul 2013
The largest systematic assessment the process of genome assembly is published today in BioMed Central and BGI’s open access journal GigaScience. The second Assemblaton competition saw 21 teams submit 43 entries based on data from three different unassembled bird, fish, and snake genomes sequenced using three different technologies. Ten key metrics are outlined, based on over 100 different measures for each assembly, and they focus on different aspects of an assembly’s quality.
The research came to publication via an unusual peer review process. Assemblathon2 is on a preprint server (http://arxiv.org/abs/1301.5406) and the named reviewers have blogged and commented on their reviews of the paper. Since the data was in the public domain and the authors enjoyed the discussion, GigaScience’s editors encouraged open discussion of the peer review of this article.
With a new species genome announced almost daily, genomics is getting faster and cheaper all the time. Piecing together genomes from raw sequencing data to produce high quality finished genome sequences without the aid of a previously assembled reference is still technically challenging and requires a huge amount of computational power and resources. It is performed by more and more labs around the world. With new sequencing tools every month, and nearly limitless ways of carrying this complex process out, it is not clear as to which is the best method of piecing a genome together. The Assemblathon is a set of periodic collaborative efforts aiming to address this issue to help improve how genomics is carried out.
The logistics of carrying out such a large competition were challenging, with large volumes of test and entry data hosted by supercomputing centers and mirrored in the cloud, and automated scripts calculated and presented the many results. Reviewing the paper was equally challenging and novel; everyone embraced GigaScience’s open and transparent review process, with authors and reviewers tweeting and posting comments online and in blogs during the review process. The results of this real-time, open peer-review are available to view on the Assemblathon website, with the signed reviewer reports and history also archived and viewable alongside the article. To boost reproducibility the supporting data and 27 GB of entries are hosted in the GigaScience GigaDB database and in the NCBI SRA database.
- ENDS –
Head of Communication, BioMed Central
Tel: +44 20 3192 2737
Mobile: +44 7825 287 546
Notes to Editors
1. GigaScience aims to revolutionize data dissemination, organization, understanding, and use. An online open-access open-data journal, we publish 'big-data' studies from the entire spectrum of life and biomedical sciences. To achieve our goals, the journal has a novel publication format: one that links standard manuscript publication with an extensive database that hosts all associated data and provides data analysis tools and cloud-computing resources.
2. BioMed Central (http://www.biomedcentral.com/) is an STM (Science, Technology and Medicine) publisher which has pioneered the open access publishing model. All peer-reviewed research articles published by BioMed Central are made immediately and freely accessible online, and are licensed to allow redistribution and reuse. BioMed Central is part of Springer Science+Business Media, a leading global publisher in the STM sector. @BioMedCentral