Transcriptome analysis reveals ginsenosides biosynthetic genes, microRNAs and simple sequence repeats in Panax ginseng C. A. Meyer
1 Chinese Academy of Medical Sciences & Peking Union Medical College, Institute of Medicinal Plant Development, Beijing, 100094, China
2 China Academy of Chinese Medical Sciences, Institute of Chinese Materia Medica, Beijing, 100700, China
BMC Genomics 2013, 14:245 doi:10.1186/1471-2164-14-245Published: 11 April 2013
Panax ginseng C. A. Meyer is one of the most widely used medicinal plants. Complete genome information for this species remains unavailable due to its large genome size. At present, analysis of expressed sequence tags is still the most powerful tool for large-scale gene discovery. The global expressed sequence tags from P. ginseng tissues, especially those isolated from stems, leaves and flowers, are still limited, hindering in-depth study of P. ginseng.
Two 454 pyrosequencing runs generated a total of 2,423,076 reads from P. ginseng roots, stems, leaves and flowers. The high-quality reads from each of the tissues were independently assembled into separate and shared contigs. In the separately assembled database, 45,849, 6,172, 4,041 and 3,273 unigenes were only found in the roots, stems, leaves and flowers database, respectively. In the jointly assembled database, 178,145 unigenes were observed, including 86,609 contigs and 91,536 singletons. Among the 178,145 unigenes, 105,522 were identified for the first time, of which 65.6% were identified in the stem, leaf or flower cDNA libraries of P. ginseng. After annotation, we discovered 223 unigenes involved in ginsenoside backbone biosynthesis. Additionally, a total of 326 potential cytochrome P450 and 129 potential UDP-glycosyltransferase sequences were predicted based on the annotation results, some of which may encode enzymes responsible for ginsenoside backbone modification. A BLAST search of the obtained high-quality reads identified 14 potential microRNAs in P. ginseng, which were estimated to target 100 protein-coding genes, including transcription factors, transporters and DNA binding proteins, among others. In addition, a total of 13,044 simple sequence repeats were identified from the 178,145 unigenes.
This study provides global expressed sequence tags for P. ginseng, which will contribute significantly to further genome-wide research and analyses in this species. The novel unigenes identified here enlarge the available P. ginseng gene pool and will facilitate gene discovery. In addition, the identification of microRNAs and the prediction of targets from this study will provide information on gene transcriptional regulation in P. ginseng. Finally, the analysis of simple sequence repeats will provide genetic makers for molecular breeding and genetic applications in this species.