Email updates

Keep up to date with the latest news and content from BMC Genomics and BioMed Central.

Open Access Highly Accessed Research article

Development of a EST dataset and characterization of EST-SSRs in a traditional Chinese medicinal plant, Epimedium sagittatum (Sieb. Et Zucc.) Maxim

Shaohua Zeng12, Gong Xiao12, Juan Guo12, Zhangjun Fei34, Yanqin Xu1, Bruce A Roe5 and Ying Wang1*

Author Affiliations

1 Key Laboratory of Pant Germplasm Enhancement and Speciality Agriculture, Wuhan Botanical Garden, Chinese Academy of Sciences, Wuhan, Hubei 430074, China

2 The Graduate School, Chinese Academy of Sciences, Beijing, 100039, China

3 Boyce Thompson Institute for Plant Research, Cornell University, Ithaca, NY 14853, USA

4 USDA Robert W. Holley Center for Agriculture and Health, Ithaca, NY 14853, USA

5 Stephenson Research and Technology Center, University of Oklahoma, Norman, Oklahoma 73019, USA

For all author emails, please log on.

BMC Genomics 2010, 11:94  doi:10.1186/1471-2164-11-94

Published: 8 February 2010



Epimedium sagittatum (Sieb. Et Zucc.) Maxim, a traditional Chinese medicinal plant species, has been used extensively as genuine medicinal materials. Certain Epimedium species are endangered due to commercial overexploition, while sustainable application studies, conservation genetics, systematics, and marker-assisted selection (MAS) of Epimedium is less-studied due to the lack of molecular markers. Here, we report a set of expressed sequence tags (ESTs) and simple sequence repeats (SSRs) identified in these ESTs for E. sagittatum.


cDNAs of E. sagittatum are sequenced using 454 GS-FLX pyrosequencing technology. The raw reads are cleaned and assembled into a total of 76,459 consensus sequences comprising of 17,231 contigs and 59,228 singlets. About 38.5% (29,466) of the consensus sequences significantly match to the non-redundant protein database (E-value < 1e-10), 22,295 of which are further annotated using Gene Ontology (GO) terms. A total of 2,810 EST-SSRs is identified from the Epimedium EST dataset. Trinucleotide SSR is the dominant repeat type (55.2%) followed by dinucleotide (30.4%), tetranuleotide (7.3%), hexanucleotide (4.9%), and pentanucleotide (2.2%) SSR. The dominant repeat motif is AAG/CTT (23.6%) followed by AG/CT (19.3%), ACC/GGT (11.1%), AT/AT (7.5%), and AAC/GTT (5.9%). Thirty-two SSR-ESTs are randomly selected and primer pairs are synthesized for testing the transferability across 52 Epimedium species. Eighteen primer pairs (85.7%) could be successfully transferred to Epimedium species and sixteen of those show high genetic diversity with 0.35 of observed heterozygosity (Ho) and 0.65 of expected heterozygosity (He) and high number of alleles per locus (11.9).


A large EST dataset with a total of 76,459 consensus sequences is generated, aiming to provide sequence information for deciphering secondary metabolism, especially for flavonoid pathway in Epimedium. A total of 2,810 EST-SSRs is identified from EST dataset and ~1580 EST-SSR markers are transferable. E. sagittatum EST-SSR transferability to the major Epimedium germplasm is up to 85.7%. Therefore, this EST dataset and EST-SSRs will be a powerful resource for further studies such as taxonomy, molecular breeding, genetics, genomics, and secondary metabolism in Epimedium species.