Email updates

Keep up to date with the latest news and content from BMC Bioinformatics and BioMed Central.

Open Access Highly Accessed Software

SRAdb: query and use public next-generation sequencing data from within R

Yuelin Zhu12, Robert M Stephens2, Paul S Meltzer1 and Sean R Davis1*

Author Affiliations

1 Genetics Branch, National Cancer Institute, National Institutes of HealthBethesda, MD 20892, USA

2 Advanced Biomedical Computing Center, National Cancer Institute-Frederick, SAIC-Frederick Inc., Frederick, MD 21702, USA

For all author emails, please log on.

BMC Bioinformatics 2013, 14:19  doi:10.1186/1471-2105-14-19

Published: 17 January 2013

Abstract

Background

The Sequence Read Archive (SRA) is the largest public repository of sequencing data from the next generation of sequencing platforms including Illumina (Genome Analyzer, HiSeq, MiSeq, .etc), Roche 454 GS System, Applied Biosystems SOLiD System, Helicos Heliscope, PacBio RS, and others.

Results

SRAdb is an attempt to make queries of the metadata associated with SRA submission, study, sample, experiment and run more robust and precise, and make access to sequencing data in the SRA easier. We have parsed all the SRA metadata into a SQLite database that is routinely updated and can be easily distributed. The SRAdb R/Bioconductor package then utilizes this SQLite database for querying and accessing metadata. Full text search functionality makes querying metadata very flexible and powerful. Fastq files associated with query results can be downloaded easily for local analysis. The package also includes an interface from R to a popular genome browser, the Integrated Genomics Viewer.

Conclusions

SRAdb Bioconductor package provides a convenient and integrated framework to query and access SRA metadata quickly and powerfully from within R.