Open Access Highly Accessed Research article

De novo assembly of Euphorbia fischeriana root transcriptome identifies prostratin pathway related genes

Roberto A Barrero1, Brett Chapman1, Yanfang Yang2, Paula Moolhuijzen1, Gabriel Keeble-Gagnère1, Nan Zhang2, Qi Tang23, Matthew I Bellgard1* and Deyou Qiu2*

Author Affiliations

1 Centre for Comparative Genomics, Murdoch University, WA 6150, Australia

2 State Key Laboratory of Tree Genetics and Breeding, The Research Institute of Forestry, Chinese Academy of Forestry, Beijing 100091, China

3 Guangxi Branch Institute, Institute of Medicinal Plant Development, Chinese Academy of Medical Sciences, Nanning 530023, China

For all author emails, please log on.

BMC Genomics 2011, 12:600  doi:10.1186/1471-2164-12-600

Published: 13 December 2011

Additional files

Additional file 1:

Assessment of reads using FastQC before trimming. A) Quality of reads per base. The central red line is the median base quality, the yellow box represents the inter-quartile range (25-75%), the upper and lower whiskers represent the 10% and 90% points and the blue line represents the mean base quality. B) Distribution of mean quality scores over all sequenced reads. C) The GC content distribution over all sequenced reads as compared against the theoretical GC distribution. The blip in GC content above the theoretical GC distribution is most likely due to the utilized primers at the 5' end of the reads used during RNA-seq library preparation and sequencing.

Format: EPS Size: 440KB Download file

Open Data

Additional file 2:

Assessment of reads using FastQC after trimming. A) Quality of reads per base after adaptive window trimming using a quality average threshold of 20 and a minimum length threshold of 20, The central red line is the median value, the yellow box represents the inter-quartile range (25-75%), the upper and lower whiskers represent the 10% and 90% points and the blue line represents the mean base quality B) The mean sequence quality scores over all reads. C) The GC content distribution over all sequenced reads as compared against the theoretical GC distribution.

Format: EPS Size: 428KB Download file

Open Data

Additional file 3:

Comparison of Oases assemblies using various k-mers. A) Oases assemblies using k-mers ranging from 17 to 47 with minimum transcript size of 100 bp. B) Oases assemblies with minimum transcript size of 300 bp. C) Zoomed in view of Oases assemblies with minimum transcript size of 300 bp over k-mers 19 to 37. The maximum N50, largest number of gene clusters and largest number of transcripts can be obtained using a k-mer of 25. Thus, this k-mer and a length threshold of > = 300 bp were selected as parameters to assemble the reference E. fischeriana root transcriptome.

Format: EPS Size: 451KB Download file

Open Data

Additional file 4:

Evaluation of k-mer coverage for Velvet assembly. The frequency of k-mers (number of appearances or coverage) was determined using untrimmed Illumina reads and ESTs. The results show a large peak of k-mers with coverage of one, which mostly correspond to sequencing errors. Thus, a k-mer coverage threshold of 2 was utilized in the de novo transcriptome assembly.

Format: EPS Size: 110KB Download file

Open Data

Additional file 5:

Example of an Oases 'gene cluster'. A) Multiple sequence alignment of transcripts into the same 'gene cluster'. Note that transcript 2 (T2) is a 5'end truncation version of T6 and that T4 has a significant sequence variation. B) Blast homology screening revealed that T1, T3 and T5 are mitochondria encoded acetyl-CoA acetyltransferase transcripts.

Format: DOC Size: 58KB Download file

This file can be viewed with: Microsoft Word Viewer

Open Data

Additional file 6:

Clustering of GGPS with unrelated hypothetical proteins. A) Blast results of transcripts clustered into EFI_010585 isoform cluster using Oases. Note that only transcript 4 has similarity to GGPS and this is encoded in the reverse strand. B) Multiple sequence alignment of the reverse complemented GGPS transcript and a sequence representing the hypothetical transcripts.

Format: DOC Size: 52KB Download file

This file can be viewed with: Microsoft Word Viewer

Open Data

Additional file 7:

Predicted tRNA genes in the E. fischeriana root transcriptome. To identify tRNA genes the reference assembled root transcriptome was screened using tRNAscan-SE as previously described [17]. Additionally, we identified another four tRNA genes highlighted with asterisks in a transcriptome assembly conducted using a k-mer size of 17 and a length threshold of > = 100 bp. Their fasta sequences are appended at the bottom.

Format: DOC Size: 44KB Download file

This file can be viewed with: Microsoft Word Viewer

Open Data

Additional file 8:

Predicted rRNA genes the E. fischeriana root transcriptome. To find rRNAs the reference root transcriptome was screened using RNAmmer as previously described [18].

Format: DOC Size: 44KB Download file

This file can be viewed with: Microsoft Word Viewer

Open Data

Additional file 9:

Validation of RNA-seq expression trends using a real time RT-PCR approach. Reciprocal (1/Ct) real time PCR values were averaged for each enzyme mentioned in Table 3. The multiplied by 10000 and log2 transformed. Similarly RNA-seq mean average values were log2 transformed and compared against real time RT-PCR results. A) A significantly strong correlation (R2 = 0.91686) of RNA-seq and real time PCR expression levels were observed for six enzymes. From left to right black boxes correspond to GGPPS, DXS, AACT, HMGR, MDD and IDS enzymes; B) Upon inclusion of the HDS expression data (blue box), the linear correlation of RNA-seq and real time PCR expression levels was still significant (R2 = 0.79392); C) Upon further addition of the CS expression data (red box) the linear correlation dropped significantly (R2 = 0.2173). Abbreviations of enzymes are as shown in Figure 6's legend.

Format: EPS Size: 258KB Download file

Open Data

Additional file 10:

Fasta sequences of 18,180 transcripts.

Format: TXT Size: 18.7MB Download file

Open Data