Email updates

Keep up to date with the latest news and content from BMC Genomics and BioMed Central.

Open Access Highly Accessed Research article

A comparison across non-model animals suggests an optimal sequencing depth for de novo transcriptome assembly

Warren R Francis12*, Lynne M Christianson1, Rainer Kiko3, Meghan L Powers12, Nathan C Shaner4 and Steven H D Haddock1*

Author affiliations

1 Monterey Bay Aquarium Research Institute, 7700 Sandholdt Rd, Moss Landing, CA 95039, USA

2 Department of Ocean Sciences, University of California Santa Cruz, Santa Cruz, CA, USA

3 Helmholtz Center for Ocean Research Kiel, GEOMAR, Hohenbergstr. 2, 24105 Kiel, Germany

4 The Scintillon Institute, 9924 Mesa Rim Rd., San Diego, CA 92121, USA

For all author emails, please log on.

Citation and License

BMC Genomics 2013, 14:167  doi:10.1186/1471-2164-14-167

Published: 12 March 2013

Abstract

Background

The lack of genomic resources can present challenges for studies of non-model organisms. Transcriptome sequencing offers an attractive method to gather information about genes and gene expression without the need for a reference genome. However, it is unclear what sequencing depth is adequate to assemble the transcriptome de novo for these purposes.

Results

We assembled transcriptomes of animals from six different phyla (Annelids, Arthropods, Chordates, Cnidarians, Ctenophores, and Molluscs) at regular increments of reads using Velvet/Oases and Trinity to determine how read count affects the assembly. This included an assembly of mouse heart reads because we could compare those against the reference genome that is available. We found qualitative differences in the assemblies of whole-animals versus tissues. With increasing reads, whole-animal assemblies show rapid increase of transcripts and discovery of conserved genes, while single-tissue assemblies show a slower discovery of conserved genes though the assembled transcripts were often longer. A deeper examination of the mouse assemblies shows that with more reads, assembly errors become more frequent but such errors can be mitigated with more stringent assembly parameters.

Conclusions

These assembly trends suggest that representative assemblies are generated with as few as 20 million reads for tissue samples and 30 million reads for whole-animals for RNA-level coverage. These depths provide a good balance between coverage and noise. Beyond 60 million reads, the discovery of new genes is low and sequencing errors of highly-expressed genes are likely to accumulate. Finally, siphonophores (polymorphic Cnidarians) are an exception and possibly require alternate assembly strategies.