Open Access Highly Accessed Software

MAP-RSeq: Mayo Analysis Pipeline for RNA sequencing

Krishna R Kalari1, Asha A Nair1, Jaysheel D Bhavsar1, Daniel R O’Brien1, Jaime I Davila1, Matthew A Bockol1, Jinfu Nie1, Xiaojia Tang1, Saurabh Baheti1, Jay B Doughty1, Sumit Middha1, Hugues Sicotte1, Aubrey E Thompson2, Yan W Asmann3 and Jean-Pierre A Kocher14*

Author Affiliations

1 Department of Health Sciences Research, Mayo Clinic, 200 First Street SW, Rochester, MN 55905, USA

2 Department of Cancer Biology, Mayo Clinic, 4500 San Pablo Road, Jacksonville, FL 32224, USA

3 Department of Health Sciences Research, Mayo Clinic, 4500 San Pablo Road, Jacksonville, FL 32224, USA

4 Present Address: Department of Health Sciences Research, Mayo Clinic, 200 First Street SW, Rochester, MN 55905, USA

For all author emails, please log on.

BMC Bioinformatics 2014, 15:224  doi:10.1186/1471-2105-15-224

Published: 27 June 2014

Abstract

Background

Although the costs of next generation sequencing technology have decreased over the past years, there is still a lack of simple-to-use applications, for a comprehensive analysis of RNA sequencing data. There is no one-stop shop for transcriptomic genomics. We have developed MAP-RSeq, a comprehensive computational workflow that can be used for obtaining genomic features from transcriptomic sequencing data, for any genome.

Results

For optimization of tools and parameters, MAP-RSeq was validated using both simulated and real datasets. MAP-RSeq workflow consists of six major modules such as alignment of reads, quality assessment of reads, gene expression assessment and exon read counting, identification of expressed single nucleotide variants (SNVs), detection of fusion transcripts, summarization of transcriptomics data and final report. This workflow is available for Human transcriptome analysis and can be easily adapted and used for other genomes. Several clinical and research projects at the Mayo Clinic have applied the MAP-RSeq workflow for RNA-Seq studies. The results from MAP-RSeq have thus far enabled clinicians and researchers to understand the transcriptomic landscape of diseases for better diagnosis and treatment of patients.

Conclusions

Our software provides gene counts, exon counts, fusion candidates, expressed single nucleotide variants, mapping statistics, visualizations, and a detailed research data report for RNA-Seq. The workflow can be executed on a standalone virtual machine or on a parallel Sun Grid Engine cluster. The software can be downloaded from http://bioinformaticstools.mayo.edu/research/maprseq/ webcite.

Keywords:
Transcriptomic sequencing; RNA-Seq; Bioinformatics workflow; Gene expression; Exon counts; Fusion transcripts; Expressed single nucleotide variants; RNA-Seq reports