Open Access Highly Accessed Research article

Finding the missing honey bee genes: lessons learned from a genome upgrade

Christine G Elsik12*, Kim C Worley3*, Anna K Bennett2, Martin Beye4, Francisco Camara5, Christopher P Childers26, Dirk C de Graaf7, Griet Debyser8, Jixin Deng3, Bart Devreese8, Eran Elhaik9, Jay D Evans10, Leonard J Foster11, Dan Graur12, Roderic Guigo5, HGSC production teams3, Katharina Jasmin Hoff13, Michael E Holder3, Matthew E Hudson14, Greg J Hunt15, Huaiyang Jiang16, Vandita Joshi3, Radhika S Khetani17, Peter Kosarev18, Christie L Kovar3, Jian Ma19, Ryszard Maleszka20, Robin F A Moritz21, Monica C Munoz-Torres222, Terence D Murphy23, Donna M Muzny3, Irene F Newsham3, Justin T Reese26, Hugh M Robertson24, Gene E Robinson25, Olav Rueppell26, Victor Solovyev27, Mario Stanke13, Eckart Stolle21, Jennifer M Tsuruda28, Matthias Van Vaerenbergh7, Robert M Waterhouse29, Daniel B Weaver30, Charles W Whitfield31, Yuanqing Wu3, Evgeny M Zdobnov29, Lan Zhang3, Dianhui Zhu3, Richard A Gibbs3 and on behalf of Honey Bee Genome Sequencing Consortium

Author Affiliations

1 Division of Animal Sciences, Division of Plant Sciences, and MU Informatics Institute, University of Missouri, Columbia, MO 65211, USA

2 Department of Biology, Georgetown University, Washington, DC 20057, USA

3 Human Genome Sequencing Center, Department of Molecular and Human Genetics, Baylor College of Medicine, MS BCM226, One Baylor Plaza, Houston, TX 77030, USA

4 Institute of Evolutionary Genetics, Heinrich Heine University Duesseldorf, Universitaetsstrasse 1, 40225 Duesseldorf, Germany

5 Center for Genomic Regulation, Universitat Pompeu Fabra, C/Dr. Aiguader 88, E-08003 Barcelona, Catalonia, Spain

6 Division of Animal Sciences, University of Missouri, Columbia, MO 65211, USA

7 Laboratory of Zoophysiology, Ghent University, Krijgslaan 281 S2, B-9000 Ghent, Belgium

8 Laboratory of Protein Biochemistry and Biomolecular Engineering, Ghent University, K.L. Ledeganckstraat 35, B-9000 Ghent, Belgium

9 Department of Mental Health, Johns Hopkins University Bloomberg School of Public Health, Baltimore, MD 21205-2103, USA

10 Bee Research Laboratory, BARC-E, USDA-Agricultural Research Service, Beltsville, MD 20705, USA

11 Department of Biochemistry & Molecular Biology, Centre for High-Throughput Biology, University of British Columbia, 2125 East Mall, Vancouver, BC, Canada

12 Department of Biology and Biochemistry, University of Houston, Houston, TX 77204-5001, USA

13 Ernst Moritz Arndt University Greifswald, Institute for Mathematics and Computer Science, Walther-Rathenau-Str. 47, 17487 Greifswald, Germany

14 Department of Crop Sciences and Institute of Genomic Biology, University of Illinois at Urbana-Champaign, Urbana, IL 61801, USA

15 Department of Entomology, Purdue University, 901 West State Street, West Lafayette, IN 47907-2089, USA

16 Department of Obstetrics, Gynecology & Reproductive Sciences, University of Pittsburgh, MAGEE 0000, Pittsburgh, PA 15260, USA

17 High-Performance Biological Computing (HPCBio), Roy J. Carver Biotechnology Center, University of Illinois at Urbana-Champaign, Urbana, IL 61801, USA

18 Softberry Inc., 116 Radio Circle, Suite 400, Mount Kisco, NY 10549, USA

19 Institute for Genomic Biology and Department of Bioengineering, University of Illinois at Urbana-Champaign, 1270 DCL, MC-278, 1304 W Springfield Ave, Urbana, IL 61801, USA

20 Research School of Biology, The Australian National University, Canberra ACT 0200, Australia

21 Institut für Zoologie, Molekulare Ökologie, Martin-Luther-Universität Halle-Wittenberg, Hoher Weg 4, D-06099 Halle (Saale), Germany

22 Genomics Division, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA

23 National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Building 45, 8600 Rockville Pike, Bethesda, MD 20894, USA

24 Department of Entomology, University of Illinois at Urbana-Champaign, Urbana, IL 61801, USA

25 Institute for Genomic Biology, Department of Entomology, Neuroscience Program, University of Illinois at Urbana-Champaign, 1206 West Gregory Drive, Urbana, IL 61801, USA

26 Department of Biology, University of North Carolina at Greensboro, 321 McIver Street, Greensboro, NC 27412, USA

27 Computer, Electrical and Mathematical Sciences and Engineering Division, King Abdullah University of Science and Technology (KAUST), Thuwal 23955-6900, Kingdom of Saudi Arabia

28 Extension Field Operations, Clemson University, 120 McGinty Ct, Clemson, SC 29634, USA

29 University of Geneva and Swiss Institute of Bioinformatics, CMU, Michel-Servet 1, Geneva CH-1211, Switzerland

30 Genformatic, 6301 Highland Hills Drive, Austin, TX 78731, USA

31 Department of Entomology, Neuroscience Program, Program in Ecology and Evolutionary Biology, University of Illinois at Urbana-Champaign, Urbana, IL 61801, USA

For all author emails, please log on.

BMC Genomics 2014, 15:86  doi:10.1186/1471-2164-15-86

Published: 30 January 2014

Abstract

Background

The first generation of genome sequence assemblies and annotations have had a significant impact upon our understanding of the biology of the sequenced species, the phylogenetic relationships among species, the study of populations within and across species, and have informed the biology of humans. As only a few Metazoan genomes are approaching finished quality (human, mouse, fly and worm), there is room for improvement of most genome assemblies. The honey bee (Apis mellifera) genome, published in 2006, was noted for its bimodal GC content distribution that affected the quality of the assembly in some regions and for fewer genes in the initial gene set (OGSv1.0) compared to what would be expected based on other sequenced insect genomes.

Results

Here, we report an improved honey bee genome assembly (Amel_4.5) with a new gene annotation set (OGSv3.2), and show that the honey bee genome contains a number of genes similar to that of other insect genomes, contrary to what was suggested in OGSv1.0. The new genome assembly is more contiguous and complete and the new gene set includes ~5000 more protein-coding genes, 50% more than previously reported. About 1/6 of the additional genes were due to improvements to the assembly, and the remaining were inferred based on new RNAseq and protein data.

Conclusions

Lessons learned from this genome upgrade have important implications for future genome sequencing projects. Furthermore, the improvements significantly enhance genomic resources for the honey bee, a key model for social behavior and essential to global ecology through pollination.

Keywords:
Apis mellifera; GC content; Gene annotation; Gene prediction; Genome assembly; Genome improvement; Genome sequencing; Repetitive DNA; Transcriptome