Open Access Highly Accessed Methodology article

Rnnotator: an automated de novo transcriptome assembly pipeline from stranded RNA-Seq reads

Jeffrey Martin12, Vincent M Bruno3, Zhide Fang4, Xiandong Meng12, Matthew Blow12, Tao Zhang12, Gavin Sherlock5, Michael Snyder5 and Zhong Wang12*

Author Affiliations

1 Genomics Division, Lawrence Berkeley National Laboratory, Berkeley, California, USA

2 Department of Energy, Joint Genome Institute, Walnut Creek, California, USA

3 Department of Molecular, Cellular and Developmental Biology, Yale University, New Haven, CT 06520, USA

4 School of Public Health, LSU-Health Sciences Center, New Orleans, LA 70112,USA

5 Department of Genetics, Stanford University Medical School, Stanford, CA 94305-5120, USA

For all author emails, please log on.

BMC Genomics 2010, 11:663  doi:10.1186/1471-2164-11-663

Published: 24 November 2010



Comprehensive annotation and quantification of transcriptomes are outstanding problems in functional genomics. While high throughput mRNA sequencing (RNA-Seq) has emerged as a powerful tool for addressing these problems, its success is dependent upon the availability and quality of reference genome sequences, thus limiting the organisms to which it can be applied.


Here, we describe Rnnotator, an automated software pipeline that generates transcript models by de novo assembly of RNA-Seq data without the need for a reference genome. We have applied the Rnnotator assembly pipeline to two yeast transcriptomes and compared the results to the reference gene catalogs of these organisms. The contigs produced by Rnnotator are highly accurate (95%) and reconstruct full-length genes for the majority of the existing gene models (54.3%). Furthermore, our analyses revealed many novel transcribed regions that are absent from well annotated genomes, suggesting Rnnotator serves as a complementary approach to analysis based on a reference genome for comprehensive transcriptomics.


These results demonstrate that the Rnnotator pipeline is able to reconstruct full-length transcripts in the absence of a complete reference genome.