Email updates

Keep up to date with the latest news and content from BMC Bioinformatics and BioMed Central.

Open Access Highly Accessed Methodology article

Launching genomics into the cloud: deployment of Mercury, a next generation sequence analysis pipeline

Jeffrey G Reid1*, Andrew Carroll2, Narayanan Veeraraghavan1, Mahmoud Dahdouli1, Andreas Sundquist2, Adam English1, Matthew Bainbridge1, Simon White1, William Salerno1, Christian Buhay1, Fuli Yu13, Donna Muzny1, Richard Daly2, Geoff Duyk2, Richard A Gibbs13 and Eric Boerwinkle14

Author Affiliations

1 Human Genome Sequencing Center, Baylor College of Medicine, One Baylor Plaza, Houston, TX 77030, USA

2 DNAnexus, Mountain View, CA 94040, USA

3 Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA

4 Human Genetics Center, University of Texas Health Science Center at Houston, Houston, TX 77030, USA

For all author emails, please log on.

BMC Bioinformatics 2014, 15:30  doi:10.1186/1471-2105-15-30

Published: 29 January 2014

Abstract

Background

Massively parallel DNA sequencing generates staggering amounts of data. Decreasing cost, increasing throughput, and improved annotation have expanded the diversity of genomics applications in research and clinical practice. This expanding scale creates analytical challenges: accommodating peak compute demand, coordinating secure access for multiple analysts, and sharing validated tools and results.

Results

To address these challenges, we have developed the Mercury analysis pipeline and deployed it in local hardware and the Amazon Web Services cloud via the DNAnexus platform. Mercury is an automated, flexible, and extensible analysis workflow that provides accurate and reproducible genomic results at scales ranging from individuals to large cohorts.

Conclusions

By taking advantage of cloud computing and with Mercury implemented on the DNAnexus platform, we have demonstrated a powerful combination of a robust and fully validated software pipeline and a scalable computational resource that, to date, we have applied to more than 10,000 whole genome and whole exome samples.

Keywords:
NGS data; Variant calling; Annotation; Clinical sequencing; Cloud computing