Open Access Software

EuCAP, a Eukaryotic Community Annotation Package, and its application to the rice genome

Françoise Thibaud-Nissen12, Matthew Campbell13, John P Hamilton124, Wei Zhu12 and C Robin Buell124*

  • * Corresponding author: C R Buell buell@msu.edu

  • † Equal contributors

Author Affiliations

1 The Institute for Genomic Research, 9712 Medical Center Dr, Rockville, MD 20850, USA

2 J. Craig Venter Institute, 9704 Medical Center Dr, Rockville, MD 20850, USA

3 Pioneer Hi-Bred International, 7300 NW 62nd Ave, Johnston, IA 50131, USA

4 Department of Plant Biology, Michigan State University, East Lansing, MI 48824, USA

For all author emails, please log on.

BMC Genomics 2007, 8:388  doi:10.1186/1471-2164-8-388

Published: 25 October 2007

Abstract

Background

Despite the improvements of tools for automated annotation of genome sequences, manual curation at the structural and functional level can provide an increased level of refinement to genome annotation. The Institute for Genomic Research Rice Genome Annotation (hereafter named the Osa1 Genome Annotation) is the product of an automated pipeline and, for this reason, will benefit from the input of biologists with expertise in rice and/or particular gene families. Leveraging knowledge from a dispersed community of scientists is a demonstrated way of improving a genome annotation. This requires tools that facilitate 1) the submission of gene annotation to an annotation project, 2) the review of the submitted models by project annotators, and 3) the incorporation of the submitted models in the ongoing annotation effort.

Results

We have developed the Eukaryotic Community Annotation Package (EuCAP), an annotation tool, and have applied it to the rice genome. The primary level of curation by community annotators (CA) has been the annotation of gene families. Annotation can be submitted by email or through the EuCAP Web Tool. The CA models are aligned to the rice pseudomolecules and the coordinates of these alignments, along with functional annotation, are stored in the MySQL EuCAP Gene Model database. Web pages displaying the alignments of the CA models to the Osa1 Genome models are automatically generated from the EuCAP Gene Model database. The alignments are reviewed by the project annotators (PAs) in the context of experimental evidence. Upon approval by the PAs, the CA models, along with the corresponding functional annotations, are integrated into the Osa1 Genome Annotation. The CA annotations, grouped by family, are displayed on the Community Annotation pages of the project website http://rice.tigr.org webcite, as well as in the Community Annotation track of the Genome Browser.

Conclusion

We have applied EuCAP to rice. As of July 2007, the structural and/or functional annotation of 1,094 genes representing 57 families have been deposited and integrated into the current gene set. All of the EuCAP components are open-source, thereby allowing the implementation of EuCAP for the annotation of other genomes. EuCAP is available at http://sourceforge.net/projects/eucap/ webcite.