Email updates

Keep up to date with the latest news and content from BMC Bioinformatics and BioMed Central.

Open Access Highly Accessed Software

PASTA: splice junction identification from RNA-Sequencing data

Shaojun Tang1234 and Alberto Riva12*

Author Affiliations

1 Department of Molecular Genetics and Microbiology, College of Medicine, University of Florida, Gainesville, FL, USA

2 University of Florida Genetics Institute, University of Florida, Gainesville, FL, USA

3 Current address: Department of Pathology, Children's Hospital Boston and Harvard Medical School, Boston, MA, USA

4 Current address: Proteomics Center at Children's Hospital Boston, Boston, MA, USA

For all author emails, please log on.

BMC Bioinformatics 2013, 14:116  doi:10.1186/1471-2105-14-116

Published: 4 April 2013

Abstract

Background

Next generation transcriptome sequencing (RNA-Seq) is emerging as a powerful experimental tool for the study of alternative splicing and its regulation, but requires ad-hoc analysis methods and tools. PASTA (Patterned Alignments for Splicing and Transcriptome Analysis) is a splice junction detection algorithm specifically designed for RNA-Seq data, relying on a highly accurate alignment strategy and on a combination of heuristic and statistical methods to identify exon-intron junctions with high accuracy.

Results

Comparisons against TopHat and other splice junction prediction software on real and simulated datasets show that PASTA exhibits high specificity and sensitivity, especially at lower coverage levels. Moreover, PASTA is highly configurable and flexible, and can therefore be applied in a wide range of analysis scenarios: it is able to handle both single-end and paired-end reads, it does not rely on the presence of canonical splicing signals, and it uses organism-specific regression models to accurately identify junctions.

Conclusions

PASTA is a highly efficient and sensitive tool to identify splicing junctions from RNA-Seq data. Compared to similar programs, it has the ability to identify a higher number of real splicing junctions, and provides highly annotated output files containing detailed information about their location and characteristics. Accurate junction data in turn facilitates the reconstruction of the splicing isoforms and the analysis of their expression levels, which will be performed by the remaining modules of the PASTA pipeline, still under development. Use of PASTA can therefore enable the large-scale investigation of transcription and alternative splicing.

Keywords:
RNA-Seq; Next-generation sequencing; Alternative splicing; Computational analysis of alternative splicing