Rnnotator: an automated de novo transcriptome assembly pipeline from stranded RNA-Seq reads
1 Genomics Division, Lawrence Berkeley National Laboratory, Berkeley, California, USA
2 Department of Energy, Joint Genome Institute, Walnut Creek, California, USA
3 Department of Molecular, Cellular and Developmental Biology, Yale University, New Haven, CT 06520, USA
4 School of Public Health, LSU-Health Sciences Center, New Orleans, LA 70112,USA
5 Department of Genetics, Stanford University Medical School, Stanford, CA 94305-5120, USA
BMC Genomics 2010, 11:663 doi:10.1186/1471-2164-11-663Published: 24 November 2010
Comprehensive annotation and quantification of transcriptomes are outstanding problems in functional genomics. While high throughput mRNA sequencing (RNA-Seq) has emerged as a powerful tool for addressing these problems, its success is dependent upon the availability and quality of reference genome sequences, thus limiting the organisms to which it can be applied.
Here, we describe Rnnotator, an automated software pipeline that generates transcript models by de novo assembly of RNA-Seq data without the need for a reference genome. We have applied the Rnnotator assembly pipeline to two yeast transcriptomes and compared the results to the reference gene catalogs of these organisms. The contigs produced by Rnnotator are highly accurate (95%) and reconstruct full-length genes for the majority of the existing gene models (54.3%). Furthermore, our analyses revealed many novel transcribed regions that are absent from well annotated genomes, suggesting Rnnotator serves as a complementary approach to analysis based on a reference genome for comprehensive transcriptomics.
These results demonstrate that the Rnnotator pipeline is able to reconstruct full-length transcripts in the absence of a complete reference genome.