Email updates

Keep up to date with the latest news and content from BMC Bioinformatics and BioMed Central.

This article is part of the supplement: Selected articles from The 8th Annual Biotechnology and Bioinformatics Symposium (BIOT-2011)

Open Access Highly Accessed Research

Accuracy of RNA-Seq and its dependence on sequencing depth

Guoshuai Cai1, Hua Li2, Yue Lu3, Xuelin Huang4, Juhee Lee4, Peter Müller5, Yuan Ji4 and Shoudan Liang1*

Author Affiliations

1 Department of Bioinformatics and Computational Biology, The University of Texas MD Anderson Cancer Center, Houston, Texas 77030, USA

2 Department of Stem Cell Transplantation and Cellular Therapy, The University of Texas MD Anderson Cancer Center, Houston, Texas 77030, USA

3 Department of Leukemia, The University of Texas MD Anderson Cancer Center, Houston, Texas 77030, USA

4 Department of Biostatistics, The University of Texas MD Anderson Cancer Center, Houston, Texas 77030, USA

5 Department of Mathematics, The University of Texas at Austin, Austin, Texas 78712, USA

For all author emails, please log on.

BMC Bioinformatics 2012, 13(Suppl 13):S5  doi:10.1186/1471-2105-13-S13-S5

Published: 24 August 2012

Abstract

Background

The cost of DNA sequencing has undergone a dramatical reduction in the past decade. As a result, sequencing technologies have been increasingly applied to genomic research. RNA-Seq is becoming a common technique for surveying gene expression based on DNA sequencing. As it is not clear how increased sequencing capacity has affected measurement accuracy of mRNA, we sought to investigate that relationship.

Result

We empirically evaluate the accuracy of repeated gene expression measurements using RNA-Seq. We identify library preparation steps prior to DNA sequencing as the main source of error in this process. Studying three datasets, we show that the accuracy indeed improves with the sequencing depth. However, the rate of improvement as a function of sequence reads is generally slower than predicted by the binomial distribution. We therefore used the beta-binomial distribution to model the overdispersion. The overdispersion parameters we introduced depend explicitly on the number of reads so that the resulting statistical uncertainty is consistent with the empirical data that measurement accuracy increases with the sequencing depth. The overdispersion parameters were determined by maximizing the likelihood. We shown that our modified beta-binomial model had lower false discovery rate than the binomial or the pure beta-binomial models.

Conclusion

We proposed a novel form of overdispersion guaranteeing that the accuracy improves with sequencing depth. We demonstrated that the new form provides a better fit to the data.