Email updates

Keep up to date with the latest news and content from BMC Genomics and BioMed Central.

Open Access Highly Accessed Research article

RNA-Seq improves annotation of protein-coding genes in the cucumber genome

Zhen Li1, Zhonghua Zhang2, Pengcheng Yan1, Sanwen Huang2, Zhangjun Fei3 and Kui Lin1*

Author affiliations

1 College of Life Sciences, Beijing Normal University, 19 Xinjiekouwai Street, Beijing, 100875, China

2 Key Laboratory of Horticultural Crops Genetic Improvement of Ministry of Agriculture, Sino-Dutch Joint Lab of Horticultural Genomics Technology, Institute of Vegetables and Flowers, Chinese Academy of Agricultural Sciences, 12 Zhongguancunnan Street, Beijing, 100081, China

3 Boyce Thompson Institute and USDA Robert W. Holley Center for Agriculture and Health, Cornell University, Tower Road Ithaca, New York, 14853-1801, USA

For all author emails, please log on.

Citation and License

BMC Genomics 2011, 12:540  doi:10.1186/1471-2164-12-540

Published: 2 November 2011

Abstract

Background

As more and more genomes are sequenced, genome annotation becomes increasingly important in bridging the gap between sequence and biology. Gene prediction, which is at the center of genome annotation, usually integrates various resources to compute consensus gene structures. However, many newly sequenced genomes have limited resources for gene predictions. In an effort to create high-quality gene models of the cucumber genome (Cucumis sativus var. sativus), based on the EVidenceModeler gene prediction pipeline, we incorporated the massively parallel complementary DNA sequencing (RNA-Seq) reads of 10 cucumber tissues into EVidenceModeler. We applied the new pipeline to the reassembled cucumber genome and included a comparison between our predicted protein-coding gene sets and a published set.

Results

The reassembled cucumber genome, annotated with RNA-Seq reads from 10 tissues, has 23, 248 identified protein-coding genes. Compared with the published prediction in 2009, approximately 8, 700 genes reveal structural modifications and 5, 285 genes only appear in the reassembled cucumber genome. All the related results, including genome sequence and annotations, are available at http://cmb.bnu.edu.cn/Cucumis_sativus_v20/ webcite.

Conclusions

We conclude that RNA-Seq greatly improves the accuracy of prediction of protein-coding genes in the reassembled cucumber genome. The comparison between the two gene sets also suggests that it is feasible to use RNA-Seq reads to annotate newly sequenced or less-studied genomes.