Email updates

Keep up to date with the latest news and content from BMC Bioinformatics and BioMed Central.

This article is part of the supplement: Proceedings of the Third Annual RECOMB Satellite Workshop on Massively Parallel Sequencing (RECOMB-seq 2013)

Open Access Highly Accessed Proceedings

Gene set enrichment analysis of RNA-Seq data: integrating differential expression and splicing

Xi Wang12 and Murray J Cairns123*

Author Affiliations

1 School of Biomedical Sciences and Pharmacy, The University of Newcastle, Callaghan, New South Wales, Australia

2 Hunter Medical Research Institute, New Lambton, New South Wales, Australia

3 Schizophrenia Research Institute, Sydney, New South Wales, Australia

For all author emails, please log on.

BMC Bioinformatics 2013, 14(Suppl 5):S16  doi:10.1186/1471-2105-14-S5-S16

Published: 10 April 2013

Abstract

Background

RNA-Seq has become a key technology in transcriptome studies because it can quantify overall expression levels and the degree of alternative splicing for each gene simultaneously. To interpret high-throughout transcriptome profiling data, functional enrichment analysis is critical. However, existing functional analysis methods can only account for differential expression, leaving differential splicing out altogether.

Results

In this work, we present a novel approach to derive biological insight by integrating differential expression and splicing from RNA-Seq data with functional gene set analysis. This approach designated SeqGSEA, uses count data modelling with negative binomial distributions to first score differential expression and splicing in each gene, respectively, followed by two strategies to combine the two scores for integrated gene set enrichment analysis. Method comparison results and biological insight analysis on an artificial data set and three real RNA-Seq data sets indicate that our approach outperforms alternative analysis pipelines and can detect biological meaningful gene sets with high confidence, and that it has the ability to determine if transcription or splicing is their predominant regulatory mechanism.

Conclusions

By integrating differential expression and splicing, the proposed method SeqGSEA is particularly useful for efficiently translating RNA-Seq data to biological discoveries.