Email updates

Keep up to date with the latest news and content from BMC Genomics and BioMed Central.

Open Access Highly Accessed Software

Cloud-based uniform ChIP-Seq processing tools for modENCODE and ENCODE

Quang M Trinh1, Fei-Yang Arthur Jen1, Ziru Zhou1, Kar Ming Chu1, Marc D Perry1, Ellen T Kephart1, Sergio Contrino2, Peter Ruzanov1 and Lincoln D Stein13*

Author Affiliations

1 Ontario Institute for Cancer Research, MaRS Centre, South Tower, 101 College Street, Suite 800, Toronto, ON, M5G 0A3, Canada

2 Department of Genetics, University of Cambridge, Downing Street, Cambridge, CB2 3EH, UK

3 Department of Molecular Genetics, University of Toronto, 1 Kings College Circle, Toronto, ON, M5S 1A8, Canada

For all author emails, please log on.

BMC Genomics 2013, 14:494  doi:10.1186/1471-2164-14-494

Published: 22 July 2013

Abstract

Background

Funded by the National Institutes of Health (NIH), the aim of the Model Organism ENCyclopedia of DNA Elements (modENCODE) project is to provide the biological research community with a comprehensive encyclopedia of functional genomic elements for both model organisms C. elegans (worm) and D. melanogaster (fly). With a total size of just under 10 terabytes of data collected and released to the public, one of the challenges faced by researchers is to extract biologically meaningful knowledge from this large data set. While the basic quality control, pre-processing, and analysis of the data has already been performed by members of the modENCODE consortium, many researchers will wish to reinterpret the data set using modifications and enhancements of the original protocols, or combine modENCODE data with other data sets. Unfortunately this can be a time consuming and logistically challenging proposition.

Results

In recognition of this challenge, the modENCODE DCC has released uniform computing resources for analyzing modENCODE data on Galaxy (https://github.com/modENCODE-DCC/Galaxy webcite), on the public Amazon Cloud (http://aws.amazon.com webcite), and on the private Bionimbus Cloud for genomic research (http://www.bionimbus.org webcite). In particular, we have released Galaxy workflows for interpreting ChIP-seq data which use the same quality control (QC) and peak calling standards adopted by the modENCODE and ENCODE communities. For convenience of use, we have created Amazon and Bionimbus Cloud machine images containing Galaxy along with all the modENCODE data, software and other dependencies.

Conclusions

Using these resources provides a framework for running consistent and reproducible analyses on modENCODE data, ultimately allowing researchers to use more of their time using modENCODE data, and less time moving it around.