Email updates

Keep up to date with the latest news and content from BMC Genomics and BioMed Central.

Open Access Highly Accessed Research article

Influence of RNA extraction methods and library selection schemes on RNA-seq data

Marc Sultan12*, Vyacheslav Amstislavskiy1, Thomas Risch1, Moritz Schuette1, Simon Dökel1, Meryem Ralser1, Daniela Balzereit1, Hans Lehrach1 and Marie-Laure Yaspo1*

Author Affiliations

1 Max Planck Institute for Molecular Genetics, Ihnestr. 63-73, Berlin 14195, Germany

2 Present address: Novartis Institutes for Biomedical Research, Biomarker Development, Fabrikstr. 10, Basel, Switzerland

For all author emails, please log on.

BMC Genomics 2014, 15:675  doi:10.1186/1471-2164-15-675

Published: 11 August 2014

Abstract

Background

Gene expression analysis by RNA sequencing is now widely used in a number of applications surveying the whole transcriptomes of cells and tissues. The recent introduction of ribosomal RNA depletion protocols, such as RiboZero, has extended the view of the polyadenylated transcriptome to the poly(A)- fraction of the RNA. However, substantial amounts of intronic transcriptional activity has been reported in RiboZero protocols, raising issues regarding their potential nuclear origin and the impact on the actual sequence depth in exonic regions.

Results

Using HEK293 human cells as source material, we assessed here the impact of the two commonly used RNA extraction methods and of the library construction protocols (rRNA depletion versus mRNA) on 1) the relative abundance of intronic reads and 2) on the estimation of gene expression values. We benchmarked the rRNA depletion-based sequencing with a specific analysis of the cytoplasmic and nuclear transcriptome fractions, suggesting that the large majority of the intronic reads correspond to unprocessed nuclear transcripts rather than to independent transcriptional units. We show that Qiagen or TRIzol extraction methods retain differentially nuclear RNA species, and that consequently, rRNA depletion-based RNA sequencing protocols are particularly sensitive to the extraction methods.

Conclusions

We could show that the combination of Trizol-based RNA extraction with rRNA depletion sequencing protocols led to the largest fraction of intronic reads, after the sequencing of the nuclear transcriptome. We discuss here the impact of the various strategies on gene expression and alternative splicing estimation measures. Further, we propose guidelines and a double selection strategy for minimizing the expression biases, without loss of information.

Keywords:
RNA-Seq; RNA extraction; rRNA depletion; poly(A)+ selection; Intronic reads