Email updates

Keep up to date with the latest news and content from BMC Bioinformatics and BioMed Central.

Open Access Highly Accessed Research article

Empirical Bayesian analysis of paired high-throughput sequencing data with a beta-binomial distribution

Thomas J Hardcastle* and Krystyna A Kelly

Author Affiliations

Department of Plant Sciences, University of Cambridge, Downing Street, Cambridge, CB2 3EA, UK

For all author emails, please log on.

BMC Bioinformatics 2013, 14:135  doi:10.1186/1471-2105-14-135

Published: 23 April 2013

Abstract

Background

Pairing of samples arises naturally in many genomic experiments; for example, gene expression in tumour and normal tissue from the same patients. Methods for analysing high-throughput sequencing data from such experiments are required to identify differential expression, both within paired samples and between pairs under different experimental conditions.

Results

We develop an empirical Bayesian method based on the beta-binomial distribution to model paired data from high-throughput sequencing experiments. We examine the performance of this method on simulated and real data in a variety of scenarios. Our methods are implemented as part of the RbaySeq package (versions 1.11.6 and greater) available from Bioconductor (http://www.bioconductor.org webcite).

Conclusions

We compare our approach to alternatives based on generalised linear modelling approaches and show that our method offers significant gains in performance on simulated data. In testing on real data from oral squamous cell carcinoma patients, we discover greater enrichment of previously identified head and neck squamous cell carcinoma associated gene sets than has previously been achieved through a generalised linear modelling approach, suggesting that similar gains in performance may be found in real data. Our methods thus show real and substantial improvements in analyses of high-throughput sequencing data from paired samples.