Open Access Highly Accessed Research article

Gene discovery using next-generation pyrosequencing to develop ESTs for Phalaenopsis orchids

Yu-Yun Hsiao12, Yun-Wen Chen3, Shi-Ching Huang1, Zhao-Jun Pan1, Chih-Hsiung Fu4, Wen-Huei Chen2, Wen-Chieh Tsai23* and Hong-Hwa Chen123*

Author Affiliations

1 Department of Life Sciences, National Cheng Kung University, Tainan 701, Taiwan

2 Orchid Research Center, National Cheng Kung University, Tainan 701, Taiwan

3 Institute of Tropical Plant Sciences, National Cheng Kung University, Tainan 701, Taiwan

4 Department of Engineering Science, National Cheng Kung University, Tainan 701, Taiwan

For all author emails, please log on.

BMC Genomics 2011, 12:360  doi:10.1186/1471-2164-12-360

Published: 12 July 2011



Orchids are one of the most diversified angiosperms, but few genomic resources are available for these non-model plants. In addition to the ecological significance, Phalaenopsis has been considered as an economically important floriculture industry worldwide. We aimed to use massively parallel 454 pyrosequencing for a global characterization of the Phalaenopsis transcriptome.


To maximize sequence diversity, we pooled RNA from 10 samples of different tissues, various developmental stages, and biotic- or abiotic-stressed plants. We obtained 206,960 expressed sequence tags (ESTs) with an average read length of 228 bp. These reads were assembled into 8,233 contigs and 34,630 singletons. The unigenes were searched against the NCBI non-redundant (NR) protein database. Based on sequence similarity with known proteins, these analyses identified 22,234 different genes (E-value cutoff, e-7). Assembled sequences were annotated with Gene Ontology, Gene Family and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways. Among these annotations, over 780 unigenes encoding putative transcription factors were identified.


Pyrosequencing was effective in identifying a large set of unigenes from Phalaenopsis. The informative EST dataset we developed constitutes a much-needed resource for discovery of genes involved in various biological processes in Phalaenopsis and other orchid species. These transcribed sequences will narrow the gap between study of model organisms with many genomic resources and species that are important for ecological and evolutionary studies.