Log on / register
Feedback | Support | My details
Open AccessHighly AccessResearch article

A conifer genomics resource of 200,000 spruce (Picea spp.) ESTs and 6,464 high-quality, sequence-finished full-length cDNAs for Sitka spruce (Picea sitchensis)

Steven G Ralph1,5 email, Hye Jung E Chun2 email, Natalia Kolosova1,3 email, Dawn Cooper1 email, Claire Oddy1 email, Carol E Ritland4 email, Robert Kirkpatrick2 email, Richard Moore2 email, Sarah Barber2 email, Robert A Holt2 email, Steven JM Jones2 email, Marco A Marra2 email, Carl J Douglas3 email, Kermit Ritland4 email and Jörg Bohlmann1 email

Michael Smith Laboratories, University of British Columbia, Vancouver, British Columbia, V6T 1Z4, Canada

British Columbia Cancer Agency Genome Sciences Centre, Vancouver, British Columbia, V5Z 4E6, Canada

Department of Botany, University of British Columbia, Vancouver, British Columbia, V6T 1Z4, Canada

Department of Forest Sciences, University of British Columbia, Vancouver, British Columbia, V6T 1Z4, Canada

Department of Biology, University of North Dakota, Grand Forks, ND, 58202-9019, USA

author email corresponding author email

BMC Genomics 2008, 9:484doi:10.1186/1471-2164-9-484

Published: 14 October 2008

Abstract

Background

Members of the pine family (Pinaceae), especially species of spruce (Picea spp.) and pine (Pinus spp.), dominate many of the world's temperate and boreal forests. These conifer forests are of critical importance for global ecosystem stability and biodiversity. They also provide the majority of the world's wood and fiber supply and serve as a renewable resource for other industrial biomaterials. In contrast to angiosperms, functional and comparative genomics research on conifers, or other gymnosperms, is limited by the lack of a relevant reference genome sequence. Sequence-finished full-length (FL)cDNAs and large collections of expressed sequence tags (ESTs) are essential for gene discovery, functional genomics, and for future efforts of conifer genome annotation.

Results

As part of a conifer genomics program to characterize defense against insects and adaptation to local environments, and to discover genes for the production of biomaterials, we developed 20 standard, normalized or full-length enriched cDNA libraries from Sitka spruce (P. sitchensis), white spruce (P. glauca), and interior spruce (P. glauca-engelmannii complex). We sequenced and analyzed 206,875 3'- or 5'-end ESTs from these libraries, and developed a resource of 6,464 high-quality sequence-finished FLcDNAs from Sitka spruce. Clustering and assembly of 147,146 3'-end ESTs resulted in 19,941 contigs and 26,804 singletons, representing 46,745 putative unique transcripts (PUTs). The 6,464 FLcDNAs were all obtained from a single Sitka spruce genotype and represent 5,718 PUTs.

Conclusion

This paper provides detailed annotation and quality assessment of a large EST and FLcDNA resource for spruce. The 6,464 Sitka spruce FLcDNAs represent the third largest sequence-verified FLcDNA resource for any plant species, behind only rice (Oryza sativa) and Arabidopsis (Arabidopsis thaliana), and the only substantial FLcDNA resource for a gymnosperm. Our emphasis on capturing FLcDNAs and ESTs from cDNA libraries representing herbivore-, wound- or elicitor-treated induced spruce tissues, along with incorporating normalization to capture rare transcripts, resulted in a rich resource for functional genomics and proteomics studies. Sequence comparisons against five plant genomes and the non-redundant GenBank protein database revealed that a substantial number of spruce transcripts have no obvious similarity to known angiosperm gene sequences. Opportunities for future applications of the sequence and clone resources for comparative and functional genomics are discussed.


© 1999-2009 BioMed Central Ltd unless otherwise stated. Part of Springer Science+Business Media.