Open Access Highly Accessed Research article

De novo characterization of the gametophyte transcriptome in bracken fern, Pteridium aquilinum

Joshua P Der13*, Michael S Barker2, Norman J Wickett3, Claude W dePamphilis3 and Paul G Wolf1

Author Affiliations

1 Department of Biology and Center for Integrated Biosystems, Utah State University, Logan, UT 84322-5305, USA

2 Department of Ecology and Evolutionary Biology, University of Arizona, Tuscon, AZ 85721, USA

3 Department of Biology, Institute of Molecular Evolutionary Genetics, and The Huck Institutes of the Life Sciences, The Pennsylvania State University, University Park, PA 16802, USA

For all author emails, please log on.

BMC Genomics 2011, 12:99  doi:10.1186/1471-2164-12-99

Published: 8 February 2011

Abstract

Background

Because of their phylogenetic position and unique characteristics of their biology and life cycle, ferns represent an important lineage for studying the evolution of land plants. Large and complex genomes in ferns combined with the absence of economically important species have been a barrier to the development of genomic resources. However, high throughput sequencing technologies are now being widely applied to non-model species. We leveraged the Roche 454 GS-FLX Titanium pyrosequencing platform in sequencing the gametophyte transcriptome of bracken fern (Pteridium aquilinum) to develop genomic resources for evolutionary studies.

Results

681,722 quality and adapter trimmed reads totaling 254 Mbp were assembled de novo into 56,256 unique sequences (i.e. unigenes) with a mean length of 547.2 bp and a total assembly size of 30.8 Mbp with an average read-depth coverage of 7.0×. We estimate that 87% of the complete transcriptome has been sequenced and that all transcripts have been tagged. 61.8% of the unigenes had blastx hits in the NCBI nr protein database, representing 22,596 unique best hits. The longest open reading frame in 52.2% of the unigenes had positive domain matches in InterProScan searches. We assigned 46.2% of the unigenes with a GO functional annotation and 16.0% with an enzyme code annotation. Enzyme codes were used to retrieve and color KEGG pathway maps. A comparative genomics approach revealed a substantial proportion of genes expressed in bracken gametophytes to be shared across the genomes of Arabidopsis, Selaginella and Physcomitrella, and identified a substantial number of potentially novel fern genes. By comparing the list of Arabidopsis genes identified by blast with a list of gametophyte-specific Arabidopsis genes taken from the literature, we identified a set of potentially conserved gametophyte specific genes. We screened unigenes for repetitive sequences to identify 548 potentially-amplifiable simple sequence repeat loci and 689 expressed transposable elements.

Conclusions

This study is the first comprehensive transcriptome analysis for a fern and represents an important scientific resource for comparative evolutionary and functional genomics studies in land plants. We demonstrate the utility of high-throughput sequencing of a normalized cDNA library for de novo transcriptome characterization and gene discovery in a non-model plant.