This article is part of the supplement: Selected articles from the Eleventh Asia Pacific Bioinformatics Conference (APBC 2013): Genomics
Improved moderation for gene-wise variance estimation in RNA-Seq via the exploitation of external information
1 School of Mathematics and Statistics, University of Sydney, Sydney NSW 2006, Australia
2 CSIRO Mathematical and Information Sciences, Private Bag 33, Clayton South 3168, Australia
3 Department of Biomedical Sciences, Cornell University, Ithaca, NY, USA
BMC Genomics 2013, 14(Suppl 1):S9 doi:10.1186/1471-2164-14-S1-S9Published: 21 January 2013
The cost of RNA-Seq has been decreasing over the last few years. Despite this, experiments with four or less biological replicates are still quite common. Estimating the variances of gene expression estimates becomes both a challenging and interesting problem in these situations of low replication. However, with the wealth of microarray and other publicly available gene expression data readily accessible on public repositories, these sources of information can be leveraged to make improvements in variance estimation.
We have proposed a novel approach called Tshrink+ for inferring differential gene expression through improved modelling of the gene-wise variances. Existing methods share information between genes of similar average expression by shrinking, or moderating, the gene-wise variances to a fitted common variance. We have been able to achieve improved estimation of the common variance by using gene-wise sample variances from external experiments, as well as gene length.
Using biological data we show that utilising additional external information can improve the modelling of the common variance and hence the calling of differentially expressed genes. These sources of additional information include gene length and gene-wise sample variances from other RNA-Seq and microarray datasets, of both related and seemingly unrelated tissue types. The results of this are promising, with our differential expression test, Tshrink+, performing favourably when compared to existing methods such as DESeq and edgeR when considering both gene ranking and sensitivity. These improved variance models could easily be implemented in both DESeq and edgeR and highlight the need for a database that offers a profile of gene variances over a range of tissue types and organisms.