Differential expression analysis for paired RNA-seq data
1 Department of Biostatistics, Yale School of Public Health, New Haven, Connecticut, USA
2 Department of Statistics, George Washington University, Washington, DC, USA
3 Novartis Institutes for BioMedical Research, Cambridge, Massachusetts, USA
4 Section of Rheumatology, Yale School of Medicine, New Haven, Connecticut, USA
5 Department of Microbiology and Immunology, University of Maryland School of Medicine, Baltimore, Maryland, USA
BMC Bioinformatics 2013, 14:110 doi:10.1186/1471-2105-14-110Published: 27 March 2013
RNA-Seq technology measures the transcript abundance by generating sequence reads and counting their frequencies across different biological conditions. To identify differentially expressed genes between two conditions, it is important to consider the experimental design as well as the distributional property of the data. In many RNA-Seq studies, the expression data are obtained as multiple pairs, e.g., pre- vs. post-treatment samples from the same individual. We seek to incorporate paired structure into analysis.
We present a Bayesian hierarchical mixture model for RNA-Seq data to separately account for the variability within and between individuals from a paired data structure. The method assumes a Poisson distribution for the data mixed with a gamma distribution to account variability between pairs. The effect of differential expression is modeled by two-component mixture model. The performance of this approach is examined by simulated and real data.
In this setting, our proposed model provides higher sensitivity than existing methods to detect differential expression. Application to real RNA-Seq data demonstrates the usefulness of this method for detecting expression alteration for genes with low average expression levels or shorter transcript length.