Email updates

Keep up to date with the latest news and content from BMC Genomics and BioMed Central.

Open Access Highly Accessed Research article

Cutoffs and k-mers: implications from a transcriptome study in allopolyploid plants

Nicole Gruenheit1*, Oliver Deusch1, Christian Esser2, Matthias Becker1, Claudia Voelckel1 and Peter Lockhart1

Author affiliations

1 Institute of Molecular Biosciences, Massey University, Palmerston North, New Zealand

2 Institute for Computer Science, Heinrich-Heine-University, 40225 Düsseldorf, Germany

For all author emails, please log on.

Citation and License

BMC Genomics 2012, 13:92  doi:10.1186/1471-2164-13-92

Published: 14 March 2012

Abstract

Background

Transcriptome analysis is increasingly being used to study the evolutionary origins and ecology of non-model plants. One issue for both transcriptome assembly and differential gene expression analyses is the common occurrence in plants of hybridisation and whole genome duplication (WGD) and hybridization resulting in allopolyploidy. The divergence of duplicated genes following WGD creates near identical homeologues that can be problematic for de novo assembly and also reference based assembly protocols that use short reads (35 - 100 bp).

Results

Here we report a successful strategy for the assembly of two transcriptomes made using 75 bp Illumina reads from Pachycladon fastigiatum and Pachycladon cheesemanii. Both are allopolyploid plant species (2n = 20) that originated in the New Zealand Alps about 0.8 million years ago. In a systematic analysis of 19 different coverage cutoffs and 20 different k-mer sizes we showed that i) none of the genes could be assembled across all of the parameter space ii) assembly of each gene required an optimal set of parameter values and iii) these parameter values could be explained in part by different gene expression levels and different degrees of similarity between genes.

Conclusions

To obtain optimal transcriptome assemblies for allopolyploid plants, k-mer size and k-mer coverage need to be considered simultaneously across a broad parameter space. This is important for assembling a maximum number of full length ESTs and for avoiding chimeric assemblies of homeologous and paralogous gene copies.

Keywords:
EST library; mRNA-seq; Transcriptome assembly; Pachycladon; Allopolyploidie