De novo characterization of the Chinese fir (Cunninghamia lanceolata) transcriptome and analysis of candidate genes involved in cellulose and lignin biosynthesis
1 State Key Laboratory of Plant Physiology and Biochemistry, College of Life Science, Zhejiang University, Hangzhou, 310058, Zhejiang, P.R. China
2 Nurturing Station for the State Key Laboratory of Subtropical Silviculture, Zhejiang Agriculture and Forestry University, Lin’an, Hangzhou, Zhejiang, 311300, P.R. China
3 School of Agricultural and Food Science, Zhejiang Agriculture and Forestry University, Lin’an, Hangzhou, Zhejiang, 311300, P.R. China
BMC Genomics 2012, 13:648 doi:10.1186/1471-2164-13-648Published: 21 November 2012
Chinese fir (Cunninghamia lanceolata) is an important timber species that accounts for 20–30% of the total commercial timber production in China. However, the available genomic information of Chinese fir is limited, and this severely encumbers functional genomic analysis and molecular breeding in Chinese fir. Recently, major advances in transcriptome sequencing have provided fast and cost-effective approaches to generate large expression datasets that have proven to be powerful tools to profile the transcriptomes of non-model organisms with undetermined genomes.
In this study, the transcriptomes of nine tissues from Chinese fir were analyzed using the Illumina HiSeq™ 2000 sequencing platform. Approximately 40 million paired-end reads were obtained, generating 3.62 gigabase pairs of sequencing data. These reads were assembled into 83,248 unique sequences (i.e. Unigenes) with an average length of 449 bp, amounting to 37.40 Mb. A total of 73,779 Unigenes were supported by more than 5 reads, 42,663 (57.83%) had homologs in the NCBI non-redundant and Swiss-Prot protein databases, corresponding to 27,224 unique protein entries. Of these Unigenes, 16,750 were assigned to Gene Ontology classes, and 14,877 were clustered into orthologous groups. A total of 21,689 (29.40%) were mapped to 119 pathways by BLAST comparison against the Kyoto Encyclopedia of Genes and Genomes (KEGG) database. The majority of the genes encoding the enzymes in the biosynthetic pathways of cellulose and lignin were identified in the Unigene dataset by targeted searches of their annotations. And a number of candidate Chinese fir genes in the two metabolic pathways were discovered firstly. Eighteen genes related to cellulose and lignin biosynthesis were cloned for experimental validating of transcriptome data. Overall 49 Unigenes, covering different regions of these selected genes, were found by alignment. Their expression patterns in different tissues were analyzed by qRT-PCR to explore their putative functions.
A substantial fraction of transcript sequences was obtained from the deep sequencing of Chinese fir. The assembled Unigene dataset was used to discover candidate genes of cellulose and lignin biosynthesis. This transcriptome dataset will provide a comprehensive sequence resource for molecular genetics research of C. lanceolata.