Email updates

Keep up to date with the latest news and content from BMC Bioinformatics and BioMed Central.

This article is part of the supplement: Selected articles from the 9th Annual Biotechnology and Bioinformatics Symposium (BIOT 2012)

Open Access Research

Comparative studies of differential gene calling using RNA-Seq data

Ximeng Zheng1 and Etsuko N Moriyama12*

Author Affiliations

1 School of Biological Sciences, University of Nebraska-Lincoln, Nebraska, 68588-0118, USA

2 Center for Plant Science Innovation, University of Nebraska-Lincoln, Nebraska, 68588-0118, USA

For all author emails, please log on.

BMC Bioinformatics 2013, 14(Suppl 13):S7  doi:10.1186/1471-2105-14-S13-S7

Published: 1 October 2013

Abstract

Background

With its massive amount of data, gene-expression profiling by RNA-Seq has many advantanges compared with microarray experiments. RNA-Seq analysis, however, is fundamentally different from microarray data analysis. Techniques developed for analyzing microarray data thus cannot be directly applicable for the digital gene expression data. Several statistical methods have been developed for identifying differentially expressed genes specifically from RNA-Seq data over the past few years.

Results

In this study, we examined the performance of differential gene-calling methods using RNA-Seq data in practical situations. We focused on two representative methods: one parametric method, DESeq, and one nonparametric method, NOISeq. We examined their performance using both simulated and real datasets. Our simulation followed the RNA-Seq process and produced more realistic short read data. Both DESeq and NOISeq identified over-expressed genes more correctly than under-expressed genes. While DESeq was more likely to call longer genes as differentially expressed than shorter ones, NOISeq did not have such bias. When the underlying variation increased, both methods showed higher rates of false positives. When replicates were not available in the experiments, both methods showed lower rates of true positives and higher rates of false positives.

Conclusions

The level of variation clearly affected the performance of both methods, showing the importance of understanding the variation in the data as well as having replications in RNA-Seq experiments. We showed that it is possible to obtain improved differential gene-calling results by combining the results obtained by the two methods. We suggested strategies to use these two methods individually or combined according to the characteristics of the data.