Email updates

Keep up to date with the latest news and content from BMC Bioinformatics and BioMed Central.

Open Access Highly Accessed Methodology article

Design of RNA splicing analysis null models for post hoc filtering of Drosophila head RNA-Seq data with the splicing analysis kit (Spanki)

David Sturgill12*, John H Malone3, Xia Sun4, Harold E Smith1, Leonard Rabinow4, Marie-Laure Samson4 and Brian Oliver12

Author Affiliations

1 National Institute of Diabetes and Digestive and Kidney Diseases, National Institutes of Health, 50 South Drive, Bethesda, MD 20892, USA

2 Program in Computational Biology, Bioinformatics, and Genomics, University of Maryland, College Park, MD 20742, USA

3 Department of Molecular and Cell Biology, University of Connecticut, Storrs, Connecticut 06269, USA

4 CNRS UMR 8195, Centre de Neurosciences Paris-Sud, Univ Paris-Sud, Orsay F-91405, CEDEX, France

For all author emails, please log on.

BMC Bioinformatics 2013, 14:320  doi:10.1186/1471-2105-14-320

Published: 9 November 2013

Abstract

Background

The production of multiple transcript isoforms from one gene is a major source of transcriptome complexity. RNA-Seq experiments, in which transcripts are converted to cDNA and sequenced, allow the resolution and quantification of alternative transcript isoforms. However, methods to analyze splicing are underdeveloped and errors resulting in incorrect splicing calls occur in every experiment.

Results

We used RNA-Seq data to develop sequencing and aligner error models. By applying these error models to known input from simulations, we found that errors result from false alignment to minor splice motifs and antisense stands, shifted junction positions, paralog joining, and repeat induced gaps. By using a series of quantitative and qualitative filters, we eliminated diagnosed errors in the simulation, and applied this to RNA-Seq data from Drosophila melanogaster heads. We used high-confidence junction detections to specifically interrogate local splicing differences between transcripts. This method out-performed commonly used RNA-seq methods to identify known alternative splicing events in the Drosophila sex determination pathway. We describe a flexible software package to perform these tasks called Splicing Analysis Kit (Spanki), available at http://www.cbcb.umd.edu/software/spanki webcite.

Conclusions

Splice-junction centric analysis of RNA-Seq data provides advantages in specificity for detection of alternative splicing. Our software provides tools to better understand error profiles in RNA-Seq data and improve inference from this new technology. The splice-junction centric approach that this software enables will provide more accurate estimates of differentially regulated splicing than current tools.