Transposon fingerprinting using low coverage whole genome shotgun sequencing in Cacao (Theobroma cacao L.) and related species
1 Department of Botany and Biodiversity Research Centre, University of British Columbia, 6270 University Boulevard, Vancouver, BC V6T 1Z4, Canada
2 Department of Ecology & Evolutionary Biology, University of Colorado Boulder, 1800 Colorado Ave, Boulder, CO 80309, USA
BMC Genomics 2013, 14:502 doi:10.1186/1471-2164-14-502Published: 24 July 2013
Transposable elements (TEs) and other repetitive elements are a large and dynamically evolving part of eukaryotic genomes, especially in plants where they can account for a significant proportion of genome size. Their dynamic nature gives them the potential for use in identifying and characterizing crop germplasm. However, their repetitive nature makes them challenging to study using conventional methods of molecular biology. Next generation sequencing and new computational tools have greatly facilitated the investigation of TE variation within species and among closely related species.
(i) We generated low-coverage Illumina whole genome shotgun sequencing reads for multiple individuals of cacao (Theobroma cacao) and related species. These reads were analysed using both an alignment/mapping approach and a de novo (graph based clustering) approach. (ii) A standard set of ultra-conserved orthologous sequences (UCOS) standardized TE data between samples and provided phylogenetic information on the relatedness of samples. (iii) The mapping approach proved highly effective within the reference species but underestimated TE abundance in interspecific comparisons relative to the de novo methods. (iv) Individual T. cacao accessions have unique patterns of TE abundance indicating that the TE composition of the genome is evolving actively within this species. (v) LTR/Gypsy elements are the most abundant, comprising c.10% of the genome. (vi) Within T. cacao the retroelement families show an order of magnitude greater sequence variability than the DNA transposon families. (vii) Theobroma grandiflorum has a similar TE composition to T. cacao, but the related genus Herrania is rather different, with LTRs making up a lower proportion of the genome, perhaps because of a massive presence (c. 20%) of distinctive low complexity satellite-like repeats in this genome.
(i) Short read alignment/mapping to reference TE contigs provides a simple and effective method of investigating intraspecific differences in TE composition. It is not appropriate for comparing repetitive elements across the species boundaries, for which de novo methods are more appropriate. (ii) Individual T. cacao accessions have unique spectra of TE composition indicating active evolution of TE abundance within this species. TE patterns could potentially be used as a “fingerprint” to identify and characterize cacao accessions.