Open Access Highly Accessed Research article

De novo assembled expressed gene catalog of a fast-growing Eucalyptus tree produced by Illumina mRNA-Seq

Eshchar Mizrachi1, Charles A Hefer2, Martin Ranik1, Fourie Joubert2 and Alexander A Myburg1*

Author Affiliations

1 Department of Genetics, Forestry and Agricultural Biotechnology Institute (FABI), University of Pretoria, Pretoria, 0002, South Africa

2 Bioinformatics and Computational Biology Unit, Department of Biochemistry, University of Pretoria, Pretoria, 0002, South Africa

For all author emails, please log on.

BMC Genomics 2010, 11:681  doi:10.1186/1471-2164-11-681

Published: 1 December 2010



De novo assembly of transcript sequences produced by short-read DNA sequencing technologies offers a rapid approach to obtain expressed gene catalogs for non-model organisms. A draft genome sequence will be produced in 2010 for a Eucalyptus tree species (E. grandis) representing the most important hardwood fibre crop in the world. Genome annotation of this valuable woody plant and genetic dissection of its superior growth and productivity will be greatly facilitated by the availability of a comprehensive collection of expressed gene sequences from multiple tissues and organs.


We present an extensive expressed gene catalog for a commercially grown E. grandis × E. urophylla hybrid clone constructed using only Illumina mRNA-Seq technology and de novo assembly. A total of 18,894 transcript-derived contigs, a large proportion of which represent full-length protein coding genes were assembled and annotated. Analysis of assembly quality, length and diversity show that this dataset represent the most comprehensive expressed gene catalog for any Eucalyptus tree. mRNA-Seq analysis furthermore allowed digital expression profiling of all of the assembled transcripts across diverse xylogenic and non-xylogenic tissues, which is invaluable for ascribing putative gene functions.


De novo assembly of Illumina mRNA-Seq reads is an efficient approach for transcriptome sequencing and profiling in Eucalyptus and other non-model organisms. The transcriptome resource (Eucspresso, webcite) generated by this study will be of value for genomic analysis of woody biomass production in Eucalyptus and for comparative genomic analysis of growth and development in woody and herbaceous plants.