<?xml version='1.0'?>
<!DOCTYPE art SYSTEM 'http://www.biomedcentral.com/xml/article.dtd'>
<art>
   <ui>gb-2006-7-7-r58</ui>
   <ji>GBJ</ji>
   <fm>
      <dochead>Software</dochead>
      <bibl>
         <title>
            <p>yrGATE: a web-based gene-structure annotation tool for the identification and dissemination of eukaryotic genes</p>
         </title>
         <aug>
            <au id="A1">
               <snm>Wilkerson</snm>
               <mi>D</mi>
               <fnm>Matthew</fnm>
               <insr iid="I1"/>
               <email>mwilkers@iastate.edu</email>
            </au>
            <au id="A2">
               <snm>Schlueter</snm>
               <mi>D</mi>
               <fnm>Shannon</fnm>
               <insr iid="I1"/>
               <email>sds@iastate.edu</email>
            </au>
            <au id="A3" ca="yes">
               <snm>Brendel</snm>
               <fnm>Volker</fnm>
               <insr iid="I1"/>
               <insr iid="I2"/>
               <email>vbrendel@iastate.edu</email>
            </au>
         </aug>
         <insg>
            <ins id="I1">
               <p>Department of Genetics, Development and Cell Biology, Iowa State University, Ames, IA 50011-3260, USA</p>
            </ins>
            <ins id="I2">
               <p>Department of Statistics, Iowa State University, Ames, IA 50011-3260, USA</p>
            </ins>
         </insg>
         <source>Genome Biology</source>
         <issn>1465-6906</issn>
         <pubdate>2006</pubdate>
         <volume>7</volume>
         <issue>7</issue>
         <fpage>R58</fpage>
         <url>http://genomebiology.com/2006/7/7/R58</url>
         <xrefbib>
            <pubidlist>
               <pubid idtype="pmpid">16859520</pubid>
               <pubid idtype="doi">10.1186/gb-2006-7-7-r58</pubid>
            </pubidlist>
         </xrefbib>
      </bibl>
      <history>
         <rec>
            <date>
               <day>24</day>
               <month>4</month>
               <year>2006</year>
            </date>
         </rec>
         <revrec>
            <date>
               <day>8</day>
               <month>6</month>
               <year>2006</year>
            </date>
         </revrec>
         <acc>
            <date>
               <day>5</day>
               <month>7</month>
               <year>2006</year>
            </date>
         </acc>
         <pub>
            <date>
               <day>19</day>
               <month>07</month>
               <year>2006</year>
            </date>
         </pub>
      </history>
      <cpyrt>
         <year>2006</year>
         <collab>Wilkerson et al.; licensee BioMed Central Ltd.</collab>
         <note>This is an open access article distributed under the terms of the Creative Commons Attribution License (<url>http://creativecommons.org/licenses/by/2.0</url>), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</note>
      </cpyrt>
      <shorttitle>
         <p>A gene-structure annotation tool</p>
      </shorttitle>
      <shortabs>
         <p>yrGATE is a new web-based tool for community gene and genome annotation.</p>
      </shortabs>
      <abs>
         <sec>
            <st>
               <p>Abstract</p>
            </st>
            <p>Your Gene structure Annotation Tool for Eukaryotes (yrGATE) provides an Annotation Tool and Community Utilities for worldwide web-based community genome and gene annotation. Annotators can evaluate gene structure evidence derived from multiple sources to create gene structure annotations. Administrators regulate the acceptance of annotations into published gene sets. yrGATE is designed to facilitate rapid and accurate annotation of emerging genomes as well as to confirm, refine, or correct currently published annotations. yrGATE is highly portable and supports different standard input and output formats. The yrGATE software and usage cases are available at <url>http://www.plantgdb.org/prj/yrGATE</url>.</p>
         </sec>
      </abs>
   </fm>
   <meta>
      <classifications>
         <classification type="BMC" subtype="man_spc_id" id="30010002">Bioinformatics</classification>
         <classification type="BMC" subtype="man_spc_id" id="30010010">Genome studies</classification>
      </classifications>
   </meta>
   <bdy>
      <sec>
         <st>
            <p>Rationale</p>
         </st>
         <p>Complete and accurate gene structure annotation is a prerequisite for the success of many types of genomic projects. For example, gene expression studies based on gene probes would be misleading unless the gene probes uniquely labelled distinct genes. Identification of potential transcription signals relies on correct determination of transcriptional start and termination sites. Characterization of orthologs or paralogs and other studies of molecular phylogeny are also compromised by incomplete or inaccurate gene structure annotation.</p>
         <p>Gene structure determination is particularly difficult for eukaryotic genomes. Here, we focus on protein-coding genes. In higher eukaryotes, most of these genes contain introns, and a large fraction of the genes appear to permit alternative splicing <abbrgrp><abbr bid="B1">1</abbr><abbr bid="B2">2</abbr><abbr bid="B3">3</abbr></abbrgrp>. High-throughput computational gene structure annotation has been highly successful in providing a first glimpse of the gene content of a genome, but current methods fall short of the goal of complete and accurate gene structure annotation (for example, <abbrgrp><abbr bid="B4">4</abbr><abbr bid="B5">5</abbr><abbr bid="B6">6</abbr></abbrgrp>). Recent research has focused on improving prediction sensitivity and specificity by combining multiple sources of evidence <abbrgrp><abbr bid="B7">7</abbr><abbr bid="B8">8</abbr><abbr bid="B9">9</abbr></abbrgrp>. However, complexities of transcription and pre-mRNA processing, such as introns in non-coding regions, non-canonical splice sites, and utilization of alternative splice sites, still pose formidable challenges for merely computational methods. Re-annotation efforts for most eukaryotic model genomes have, therefore, relied in large part on manual inspection of gene structure evidence <abbrgrp><abbr bid="B5">5</abbr><abbr bid="B10">10</abbr><abbr bid="B11">11</abbr></abbrgrp>. However, manual annotation also has shortcomings, such as being typically time-consuming, having exclusive participation, and providing annotations only intermittently <abbrgrp><abbr bid="B4">4</abbr><abbr bid="B10">10</abbr><abbr bid="B12">12</abbr></abbrgrp>.</p>
         <p>A policy of 'open annotation', using the internet as the forum for annotation, and bringing annotation into the mainstream has been suggested as a means to eliminate the restraints of manual annotation and to develop high quality gene annotation <abbrgrp><abbr bid="B13">13</abbr><abbr bid="B14">14</abbr><abbr bid="B15">15</abbr></abbrgrp>. Several systems have successfully adopted this policy for prokaryote gene annotation (ASAP <abbrgrp><abbr bid="B16">16</abbr></abbrgrp>, PeerGAD <abbrgrp><abbr bid="B17">17</abbr></abbrgrp>, PseudoCAP <abbrgrp><abbr bid="B18">18</abbr></abbrgrp>). Eukaryotic gene annotation projects have not been able to reap the full benefits of community manual annotation because of the absence of an open online community gene annotation system. Here, we describe newly developed software, Your Gene structure Annotation Tool for Eukaryotes (yrGATE), which seeks to compensate for the inadequacies of traditional manual annotation and to provide a community alternative and/or companion to computational gene annotation, specialized for eukaryotes. yrGATE provides similar functionality as the Apollo annotation tool <abbrgrp><abbr bid="B19">19</abbr></abbrgrp> and NCBI's ModelMaker <abbrgrp><abbr bid="B20">20</abbr></abbrgrp>, but includes community utilities, specialized portals to external gene finding and annotation software, and web browser accessibility.</p>
         <p>The yrGATE package consists of a web-based Annotation Tool for gene structure annotation creation and Community Utilities for regulating the acceptance of the annotations into a community gene set. The yrGATE Annotation Tool can be used without the Community Utilities for analysis of gene loci independent of a community. The Annotation Tool presents pre-calculated exon evidence in several summaries with different selection mechanisms and provides other methods for specifying custom exons, allowing thorough analysis and quick annotation of loci. Annotators access the tool over the web, where they create an annotation, decide to save the annotation in their personal account, or submit the annotation for review for acceptance into the community gene set. The online nature of yrGATE permits a large and nonexclusive group of annotators, ranging in expertise from professional curators to students <abbrgrp><abbr bid="B21">21</abbr></abbrgrp>. This also provides a continuous timeframe for gene annotation, allowing annotators to examine new sequence evidence as it becomes available and eliminating the delays of periodic annotation. yrGATE is particularly well suited for emerging genomes that are in the process of being sequenced, such as maize. Additionally, the user-friendly character of the yrGATE system contributes to its accessibility and to its potential for community adoption.</p>
      </sec>
      <sec>
         <st>
            <p>Annotation tool</p>
         </st>
         <p>The Annotation Tool of the yrGATE package is a web-based utility for creating gene structure annotations. The inputs and outputs of the Annotation Tool are depicted in Figure <figr fid="F1">1</figr>. The input consists of a genomic sequence, exon evidence, and evidence references. The output of the Annotation Tool is a gene annotation, which consists of a gene structure (coordinates of exons and introns), the inferred mRNA sequence, a corresponding protein coding region and its associated translation product, evidence attributes, description, and functional information. The input and output can be in several formats (indicated in Figure <figr fid="F1">1</figr>), which will be described in detail in the Implementation section below.</p>
         <fig id="F1">
            <title>
               <p>Figure 1</p>
            </title>
            <caption>
               <p>The applications interface of yrGATE</p>
            </caption>
            <text>
               <p>The applications interface of yrGATE. Input to yrGATE is derived from either local database tables or distributed DAS sources. Output is either to local database tables or in the form of simple text or GFF3 files.</p>
            </text>
            <graphic file="gb-2006-7-7-r58-1"/>
         </fig>
         <p>Defining a gene's exon-intron structure is the central step in creating a eukaryotic gene annotation. The Annotation Tool provides two general categories to specify exons: pre-defined evidence-supported exons and novel user-defined exons. Pre-defined exons are provided by the Annotation Tool from prior computations and are supported by evidence derived from spliced alignments of expressed sequence tags (ESTs) and cDNAs, <it>ab initio </it>predictions, or a combination of sources. The evidence is filtered by stringent thresholds to provide exons suggestive of authentic genes. User-defined exons are exons not contained in the pre-defined evidence and are individually specified by the user. Annotators have several channels to designate both categories of exons.</p>
         <p>The Annotation Tool contains three representations of the evidence: the Evidence Plot, the Evidence Table, and links to evidence reference files. The Evidence Plot is a clickable graphic that presents evidence in a color-coded schematic (8 in Figure <figr fid="F2">2a</figr>). The Evidence Table (11 in Figure <figr fid="F2">2a</figr>) groups exons into mutually exclusive groups of exon variants. For each exon, the table lists its genomic coordinates, the maximum score from the method that generated the exon, and the evidence sources that support the exon. The evidence identifiers are hyperlinked to reference files for the exon, which could be an alignment or other program output. Annotators can select pre-defined exons by clicking on exon diagrams in the Evidence Plot or clicking on buttons in the Evidence Table. The annotator's developing gene structure is graphically displayed below the Evidence Plot for visual comparison (10 in Figure <figr fid="F2">2a</figr>).</p>
         <fig id="F2">
            <title>
               <p>Figure 2</p>
            </title>
            <caption>
               <p>Novel gene annotation</p>
            </caption>
            <text>
               <p>Novel gene annotation. This yrGATE implementation at ZmGDB presents the region 158659-162032 of <it>Zea mays </it>BAC gi 51315585. <b>(a) </b>The main Annotation Tool window contains a completed gene structure annotation. The provided transcript evidence consists of two groups of ESTs (9, circled) separated by a region with no spanning evidence, 160260-160664 (8). User defined exons have been designated in this region. The User Defined Exons Table (2) lists each exon by coordinates and source. <b>(b) </b>Exon 5, 160575..160721, was defined using portals to (b) GENSCAN and GeneSeqer@PlantGDB (not shown). Yellow buttons in the GENSCAN portal (b) add exons to the gene structure in the Annotation Tool (6 in panel a), which are presented pictorially (10 in panel a) for comparison with the Evidence Plot. A protein-coding region was evaluated using the portal to the <b>(c) </b>ORF Finder and imported into the Annotation Tool (4 in panel a) using the yellow button.</p>
            </text>
            <graphic file="gb-2006-7-7-r58-2"/>
         </fig>
         <p>User-defined exons are specified through portals to exon-generating programs or through entry of the genomic coordinates of an exon. As these exons are defined, they are listed in the User Defined Exons Table (2 in Figure <figr fid="F2">2a</figr>). Acting as a type of web service, portals deliver the genome sequence of the annotation region to an online exon-generating program, with appropriate default parameters specified while allowing the user to change these parameters. The program's output is internally reformatted such that the user can directly add exons from the program's output window into the current gene structure displayed in the yrGATE Annotation Tool window. Currently, portals are available to the gene prediction programs GENSCAN <abbrgrp><abbr bid="B22">22</abbr></abbrgrp> and GeneMark <abbrgrp><abbr bid="B23">23</abbr></abbrgrp> and to the GeneSeqer spliced alignment web server <abbrgrp><abbr bid="B24">24</abbr></abbrgrp>. Administrators can easily add new portals for other exon-generating programs or sequence analysis programs, such as folding programs for non-coding RNA annotations. A template portal is provided with the package.</p>
         <p>As an additional channel provided for designating gene structures, the tool allows pasting a coordinate structure into the mRNA structure field (6 in Figure <figr fid="F2">2a</figr>). The format for specifying an mRNA structure follows the conventional notation of designating exons by start and end coordinates separated by non-digits, with multiple exons separated by commas (for example, the Perl regular expression for a two-exon gene structure is [\d+\D+\d+,\d+\D+\d+]). This channel is appropriate for comparing external gene structures with the evidence. Exons not found in the pre-defined evidence are given an 'unknown' source in the User Defined Exons table.</p>
         <p>To document the annotator's procedure and parameters, the Exon Origins attribute of an annotation record automatically stores information about the source of each exon. The following information is stored: the method of exon-generation, a score associated with the method and exon, sequence identifiers used in the method, unique database identifiers to the specific output file or record, and a hyperlink to the program output yielding the exon. Exon Origins allows for complete re-creation of the gene structure annotation and for analysis of manual annotation procedures that could aid in future manual annotation efforts and techniques.</p>
         <p>After a gene structure has been defined, a user can specify the protein coding region of the annotation through entry of genomic coordinates (4 in Figure <figr fid="F2">2a</figr>) or by using the ORF Finder <abbrgrp><abbr bid="B20">20</abbr></abbrgrp> portal. The ORF Finder portal (Figure <figr fid="F2">2b</figr>), operating similarly to the User Defined Exons portals, allows a user to select an open reading frame, which upon selection is imported into the Annotation Tool window and is graphically represented in the Preview Structure.</p>
         <p>Coordinately with gene structure and protein coding region designation and edits, the mRNA and protein sequence fields are updated (3 and 5 in Figure <figr fid="F2">2a</figr>). Hyperlinks, attached to the appropriate sequence, are provided to BLASTN, TBLASTX, BLASTX, TBLASTN and BLASTP at NCBI <abbrgrp><abbr bid="B20">20</abbr></abbrgrp> for an annotator to find similar sequences and/or assign a putative function. Additional pieces of information that can be added to a gene annotation are a description and alternative identifiers.</p>
         <p>For cases in which genomic sequence requires editing, such as correction of sequencing errors or annotation of genes undergoing mRNA editing, the Sequence Editor Tool (7 in Figure <figr fid="F2">2a</figr>) enables annotators to insert, delete, or change bases through a web interface. These changes are incorporated into the Annotation Tool and stored with the annotation record.</p>
         <p>At the conclusion of a gene annotation session, an annotator decides the outcome of their annotation record (1 in Figure <figr fid="F2">2a</figr>). Annotation records can be saved in the annotator's personal account, which limits access of the annotation to the owner of the annotation. Annotations can be submitted for review, in which case the annotation is sent to administrators, who decide to accept or reject the annotation into a community database for sharing with the community. Alternatively, annotations can be saved locally on the annotator's machine by displaying the annotation in a simple text or GFF3 <abbrgrp><abbr bid="B25">25</abbr></abbrgrp> format. Annotators are also able to delete stored annotations that have not been accepted.</p>
      </sec>
      <sec>
         <st>
            <p>Community annotation utilities</p>
         </st>
         <p>The yrGATE package includes community annotation utilities for sharing annotations among a public or private community. These utilities form a process for annotation management and review (diagrammed in Figure <figr fid="F3">3</figr>) for two different types of users, annotators and administrators. The types of users are distinguished by their actions: annotators create annotations and administrators review these annotations for acceptance into a community gene set. The community annotation process will be described from the perspective of a new annotation submission and review.</p>
         <fig id="F3">
            <title>
               <p>Figure 3</p>
            </title>
            <caption>
               <p>Community annotation review process</p>
            </caption>
            <text>
               <p>Community annotation review process. Individual Community Utilities are colored green in this diagram.</p>
            </text>
            <graphic file="gb-2006-7-7-r58-3"/>
         </fig>
         <p>A typical annotation submission begins with an annotator logging in to their private account, which contains all of the annotations created by the annotator. Then, the annotator creates a new annotation using the Annotation Tool and decides to submit the annotation to the community.</p>
         <p>This newly submitted annotation is listed in the Administration Tool, where an administrator can 'check out' this annotation for review, so that other administrators do not review this annotation concurrently. The administrator accesses the 'checked-out' annotation in a review version of the Annotation Tool. Then, the administrator reviews the annotation and is able to edit any attributes of the record. When satisfied with their analysis, the administrator accepts or rejects the annotation. If a decision cannot be reached, the annotation is returned to the to-be-reviewed group. Accepted annotations are added to the public community gene annotation database, where they are presented through the Community Annotation Central and Annotation Record facilities. Rejected annotations can be edited by the annotator to be resubmitted for review.</p>
         <p>For specific implementations, the described community annotation process can be adjusted by dropping any of the steps, such as eliminating the user log in or eliminating the review process so that all submitted annotations are published. New steps can also be added to the review process, such as a voting utility for submitted annotations.</p>
      </sec>
      <sec>
         <st>
            <p>Implementations and case studies</p>
         </st>
         <p>The yrGATE package can be implemented in different configurations depending on the input and output (Figure <figr fid="F1">1</figr>) and on the annotation review process (Figure <figr fid="F3">3</figr>). The input can be either from a local database or a DAS server. The output can be an entry in a local database or to a simple text or GFF3 file. The optional Community Utilities provide annotation review and community maintenance facilities. Two yrGATE implementations, having different configurations, are described below.</p>
         <sec>
            <st>
               <p>Community annotation at PlantGDB</p>
            </st>
            <p>PlantGDB includes a family of species-specific databases: AtGDB <abbrgrp><abbr bid="B26">26</abbr><abbr bid="B27">27</abbr></abbrgrp> for <it>Arabidopsis</it>, ZmGDB <abbrgrp><abbr bid="B28">28</abbr></abbrgrp> for maize, and OsGDB <abbrgrp><abbr bid="B29">29</abbr></abbrgrp> for rice. These species-specific databases each have an annotation community and an implementation of yrGATE. Input to the yrGATE annotation tool is supplied by the respective PlantGDB database. Pre-calculated exon evidence consists of spliced alignments of EST and cDNA sequences generated by the GeneSeqer program <abbrgrp><abbr bid="B30">30</abbr></abbrgrp>. Evidence references consist of hyperlinks to GeneSeqer output files, which are a part of the respective databases. Genome sequence segments are also supplied by the database. In these PlantGDB implementations, yrGATE Community Utilities regulate user management and annotation curation according to the described default configuration (Figure <figr fid="F3">3</figr>). We illustrate yrGATE usage at PlantGDB with two gene annotation case studies.</p>
            <p>The first case study is a novel maize annotation using the ZmGDB yrGATE implementation. An unannotated genome region, 158659-162032 of BAC 51315585, was chosen by the annotator using the genome browsing function of ZmGDB. A screenshot of the Annotation Tool shows the completed annotation (Figure <figr fid="F2">2</figr>). Exons were initially selected from the pre-computed evidence. The evidence, though, consists of two separate groups of ESTs (9 in Figure <figr fid="F2">2a</figr>) with no spanning evidence in the region 160260-160664. The annotator decided to use the GENSCAN and the GeneSeqer@PlantGDB portals to explore potential exons in this region (2 in Figure <figr fid="F2">2a</figr>). After adding three user defined exons, a gene structure connecting both groups of ESTs was defined (6 and 10 in Figure <figr fid="F2">2a</figr>). The portal to the ORF Finder was used to define a protein-coding region, which spanned all eight exons of the putative transcript. Terminal exons, supported by ESTs 71435182 and 32859895, were selected to maximize the untranslated regions. The final step of the annotation session was a BLASTP search at NCBI to compare the novel gene annotation and to assign a putative gene product function. The protein of the annotation had high similarity over most of its length to rice protein NP_915525 and to <it>Arabidopsis </it>protein NP_190282. These proteins provided a putative functional assignment of 'sugar transporter' for the annotation. The annotator was satisfied with the annotation and submitted it for review. Administrators reviewed the annotation and accepted it because it was novel and of good quality. The annotation, ZM-yrGATE-sugar_transporter, is now accessible from the ZmGDB Community Annotation Central <abbrgrp><abbr bid="B31">31</abbr></abbrgrp>.</p>
            <p>The second PlantGDB case study concerns alternative splicing and correction of an inaccurate published annotation of an <it>Arabidopsis </it>gene model using the yrGATE implementation at AtGDB. A screenshot of the transcript view of AtGDB presents two accepted community annotations (green structures in interior window, Figure <figr fid="F4">4</figr>). The annotator decided to investigate this genome region (chromosome 1, segment 30370180-30373939) because, upon visual inspection, the first exon of the published annotation At1g808010.1 conflicts with EST and cDNA evidence (3 in Figure <figr fid="F4">4</figr>). Initially, the annotator used cDNA 23270370 to define the gene structure and EST 496433 to extend the 3'-untranslated region. Through the Evidence Table and evidence reference links to GeneSeqer output of the Annotation Tool, the annotator recognized exon 11 has an alternative size supported by EST 507078. The annotator examined open reading frames of both transcript structures, and seeing that both protein-coding regions extend over all exons except for the 5'-most untranslated exon, decided to create two annotations for this locus. An AtGDB administrator reviewed the annotations and accepted both into the community database because they corrected an inaccurate published annotation and captured alternative splicing variants. These alternative splicing variants are displayed in the Transcript View of AtGDB (1 in Figure <figr fid="F4">4</figr>), which displays sequence alignments coordinated to a diagram. In the Transcript View, the green vertical rectangle (2 in Figure <figr fid="F4">4</figr>) relates the diagram to the multiple sequence alignment, where nucleotides in introns are represented by '>' symbols. Comparing alignments for sequences 23270370 and 507078, a three base difference in the start of the exon 11 is apparent (4 in Figure <figr fid="F4">4</figr>). The upstream intron sequences reveal that both intron variants terminate with the standard AG dinucleotide, which suggests this is a probable alternative splicing event. The Transcript View of AtGDB makes such minute differences distinguishable, which were previously concealed in the diagram.</p>
            <fig id="F4">
               <title>
                  <p>Figure 4</p>
               </title>
               <caption>
                  <p>Community implementation of yrGATE at the PlantGDB <it>Arabidopsis </it>genome browser, AtGDB, for correction of a public annotation and for alternative splicing</p>
               </caption>
               <text>
                  <p>Community implementation of yrGATE at the PlantGDB <it>Arabidopsis </it>genome browser, AtGDB, for correction of a public annotation and for alternative splicing. This two-window screenshot depicts yrGATE annotations in the AtGDB browser. The outer window contains a genome context view of AtGDB, which has links to the yrGATE Annotation Tool and to AtGDB's Transcript View (1). The inner window contains the Transcript View, which presents a genome context graphic and sequence alignments represented in the graphic. The graphic has the following color assignments: yrGATE annotations, green; the public annotation, blue; cDNAs, light blue; ESTs, red; annotation protein coding regions, green and red triangles. The multiple sequence alignment in the lower panel of the Transcript View corresponds to the region of graphic contained within the green rectangle (2). The first exon (3) of the public annotation, At1g80810.1, is not supported by expressed sequence evidence, which instead suggests a downstream exon. There are two yrGATE community annotations, yrGATE-At1g80810-1 and yrGATE-At1g80810-2, both of which contain the first exon supported by the evidence but differ at the 3'-end, because the evidence suggests two alternatives for exon 11 (as seen in the multiple alignment display (4)).</p>
               </text>
               <graphic file="gb-2006-7-7-r58-4"/>
            </fig>
         </sec>
      </sec>
      <sec>
         <st>
            <p>yrGATE with DAS input</p>
         </st>
         <p>DAS servers provide sequence and annotation information that can be queried and is in a standard format <abbrgrp><abbr bid="B32">32</abbr><abbr bid="B33">33</abbr></abbrgrp>. The abundance of DAS servers for a variety of organisms provides rich and diverse sources of input for the yrGATE Annotation Tool. An implementation of yrGATE using input data from DAS servers is provided for general use <abbrgrp><abbr bid="B34">34</abbr></abbrgrp>. This implementation, 'yrGATE with DAS input', does not have a community aspect, although a different configuration could add community functionality. The 'yrGATE with DAS input' Selection Page allows an annotator to specify a DAS reference server and DAS evidence sources (Figure <figr fid="F5">5a</figr>). The green 'look up' buttons beside each text box provide a list for annotators to make selections. After these selections are stored, the Annotation Tool can be accessed with the selected input DAS data (Figure <figr fid="F5">5b</figr>).</p>
         <fig id="F5">
            <title>
               <p>Figure 5</p>
            </title>
            <caption>
               <p>yrGATE with DAS input implementation</p>
            </caption>
            <text>
               <p>yrGATE with DAS input implementation. <b>(a) </b>The entrance to yrGATE is a selection page where a genome and associated evidence sources are specified. Chicken chromosome 3 region 86850000-86990000 is selected. <b>(b) </b>EST and mRNA are primary evidence sources (3). Additionally, secondary evidence sources of published annotations are selected for comparison including RefSeq, Ensembl, Twinscan, SGP, and Geneid genes. The novel annotation, GG-yrGATE-microcephalin, is based on EST and mRNA evidence and is distinct from all published chicken annotations in this region on this strand (2). This novel annotation (4) contains a known angiopoietin gene, NM_204817 (1), on the opposite strand within its 12th intron.</p>
            </text>
            <graphic file="gb-2006-7-7-r58-5"/>
         </fig>
         <p>Figure <figr fid="F5">5</figr> represents a case study of a novel chicken gene structure annotation. The Selection Page specifies the chicken genome chromosome 3 segment 86850000-86990000 as the genome entry point <abbrgrp><abbr bid="B35">35</abbr><abbr bid="B36">36</abbr></abbrgrp>. The selected evidence sources include primary evidence of mRNA and EST BLAT alignments and, for comparison, annotations of types RefSeq <abbrgrp><abbr bid="B37">37</abbr><abbr bid="B38">38</abbr></abbrgrp>, TWINSCAN <abbrgrp><abbr bid="B39">39</abbr></abbrgrp>, Ensembl <abbrgrp><abbr bid="B40">40</abbr></abbrgrp>, Geneid <abbrgrp><abbr bid="B41">41</abbr></abbrgrp>, and SGP <abbrgrp><abbr bid="B42">42</abbr></abbrgrp>. The published annotation evidence sources are selected so that the annotator can compare primary evidence against existing annotations. Inspection of the primary evidence in the Evidence Plot of the Annotation Tool suggests one gene on the forward strand (approximately 86887000-86934000; 1 in Figure <figr fid="F5">5b</figr>) and another gene on the reverse strand (approximately 86853000-86975000; 2 in Figure <figr fid="F5">5b</figr>). The gene on the forward strand (1 in Figure <figr fid="F5">5b</figr>; for example, RefSeq Gene angiopoietin-2, dark blue, labelled NM_204817.1) is accurately annotated based on mRNA and EST evidence. Additional alternative variants are also accurately annotated.</p>
         <p>The primary evidence also suggests an annotation on the reverse strand that contains the angiopoietin-2 gene within one of its introns. However, current annotations on the reverse strand are inaccurate and incomplete based on mRNA and EST evidence (3 in Figure <figr fid="F5">5b</figr>). The first half of this potential gene is represented in some annotations (2 in Figure <figr fid="F5">5b</figr>; SGP, chr3_982.1; Geneid, chr3_1361.1; Ensembl, ENSGALT00000026345.2; TWINSCAN, chr3.87.019.a). Alignments of other species' RefSeq genes <abbrgrp><abbr bid="B43">43</abbr></abbrgrp> (not pictured) indicate a larger gene boundary than the displayed annotations, but this boundary is still too short compared to the primary evidence and does not contain all of the exons supplied by the primary evidence. A novel gene annotation was created on the reverse strand by selecting compatible exons from primary evidence using the Annotation Tool. An open reading frame was designated, and the protein sequence was used to find homologous genes in related species. Based on BLASTP results, this gene was assigned the putative function microcephalin. Interestingly, several species (including human and mouse) have an annotated microcephalin gene with high protein sequence similarity and also maintain the local genome structure of angiopoietin-2 within an intron of the microcephalin gene on the opposite strand.</p>
         <p>Links to these case study annotations are provided on the yrGATE website <abbrgrp><abbr bid="B44">44</abbr></abbrgrp>.</p>
      </sec>
      <sec>
         <st>
            <p>Usability and availability</p>
         </st>
         <p>The Annotation Tool was designed with emphasis on usability for annotators. Annotators can immediately select from high quality evidence that has a high likelihood of yielding an accurate annotation and can specify new custom evidence for cases where the evidence is inadequate. The two categories provide for a good annotation process where high quality evidence is first examined and then additional evidence is checked, which is completed in a minimal amount of mouse clicks and screen display, achieved by the tool's design.</p>
         <p>The main components of the tool are contained in one standard 1,024 &#215; 768 resolution screen. The tool is loaded once per genomic region, and the form fields are dynamically updated, which allows annotators to quickly evaluate the impact of different exon variants and combinations of exons on the gene structure, mRNA sequence, and protein sequence. yrGATE is compatible with several major operating systems, including Linux, Windows and Macintosh, on several web browsers, of which Mozilla Firefox has the best performance in terms of speed.</p>
         <p>yrGATE is available for download <abbrgrp><abbr bid="B44">44</abbr></abbrgrp>. The package consists of Perl, Javascript, HTML, and a MySQL schema. Required Perl libraries for a full implementation are CGI, DBI, LWP, HTTP, PHP::Session, GD, Bio::Graphics, Bio::SeqFeature::Generic, and Bio::Das. Template data are provided for testing and evaluation.</p>
      </sec>
      <sec>
         <st>
            <p>Conclusion</p>
         </st>
         <p>yrGATE opens gene structure annotation to a large, nonexclusive community. The characteristics of yrGATE contribute to its potential for user appeal and community adoption. Among other applications, it is particularly useful for annotating emerging genomes and for correcting inaccurate published annotations. yrGATE is easily adaptable to different input data and can support a community using the Community Utilities.</p>
      </sec>
   </bdy>
   <bm>
      <ack>
         <sec>
            <st>
               <p>Acknowledgements</p>
            </st>
            <p>This work was supported by the National Science Foundation Plant Genome Research Program grant DBI-0321600 to VB. MW worked in part under a cooperative agreement with University of Missouri, SCA #58 3622-3-152.</p>
         </sec>
      </ack>
      <refgrp>
         <bibl id="B1">
            <title>
               <p>The evolving roles of alternative splicing.</p>
            </title>
            <aug>
               <au>
                  <snm>Lareau</snm>
                  <fnm>LF</fnm>
               </au>
               <au>
                  <snm>Green</snm>
                  <fnm>RE</fnm>
               </au>
               <au>
                  <snm>Bhatnagar</snm>
                  <fnm>RS</fnm>
               </au>
               <au>
                  <snm>Brenner</snm>
                  <fnm>SE</fnm>
               </au>
            </aug>
            <source>Curr Opin Struct Biol</source>
            <pubdate>2004</pubdate>
            <volume>14</volume>
            <fpage>273</fpage>
            <lpage>282</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/j.sbi.2004.05.002</pubid>
                  <pubid idtype="pmpid" link="fulltext">15193306</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B2">
            <title>
               <p>Function of alternative splicing.</p>
            </title>
            <aug>
               <au>
                  <snm>Stamm</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Ben-Ari</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Rafalska</snm>
                  <fnm>I</fnm>
               </au>
               <au>
                  <snm>Tang</snm>
                  <fnm>Y</fnm>
               </au>
               <au>
                  <snm>Zhang</snm>
                  <fnm>Z</fnm>
               </au>
               <au>
                  <snm>Toiber</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Thanaraj</snm>
                  <fnm>TA</fnm>
               </au>
               <au>
                  <snm>Soreq</snm>
                  <fnm>H</fnm>
               </au>
            </aug>
            <source>Gene</source>
            <pubdate>2005</pubdate>
            <volume>344</volume>
            <fpage>1</fpage>
            <lpage>20</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/j.gene.2004.10.022</pubid>
                  <pubid idtype="pmpid" link="fulltext">15656968</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B3">
            <title>
               <p>Genome-wide comparative analysis of alternative splicing in plants.</p>
            </title>
            <aug>
               <au>
                  <snm>Wang</snm>
                  <fnm>B-B</fnm>
               </au>
               <au>
                  <snm>Brendel</snm>
                  <fnm>V</fnm>
               </au>
            </aug>
            <source>Proc Natl Acad Sci USA</source>
            <pubdate>2006</pubdate>
            <inpress/>
         </bibl>
         <bibl id="B4">
            <title>
               <p>Annotation of the <it>Drosophila melanogaster</it> euchromatic genome: a systematic review.</p>
            </title>
            <aug>
               <au>
                  <snm>Misra</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Crosby</snm>
                  <fnm>MA</fnm>
               </au>
               <au>
                  <snm>Mungall</snm>
                  <fnm>CJ</fnm>
               </au>
               <au>
                  <snm>Matthews</snm>
                  <fnm>BB</fnm>
               </au>
               <au>
                  <snm>Campbell</snm>
                  <fnm>KS</fnm>
               </au>
               <au>
                  <snm>Hradecky</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Huang</snm>
                  <fnm>Y</fnm>
               </au>
               <au>
                  <snm>Kaminker</snm>
                  <fnm>JS</fnm>
               </au>
               <au>
                  <snm>Millburn</snm>
                  <fnm>GH</fnm>
               </au>
               <au>
                  <snm>Prochnik</snm>
                  <fnm>SE</fnm>
               </au>
               <etal/>
            </aug>
            <source>Genome Biol</source>
            <pubdate>2002</pubdate>
            <volume>3</volume>
            <fpage>RESEARCH0083</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">151185</pubid>
                  <pubid idtype="pmpid" link="fulltext">12537572</pubid>
                  <pubid idtype="doi">10.1186/gb-2002-3-12-research0083</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B5">
            <title>
               <p>Gene annotation: prediction and testing.</p>
            </title>
            <aug>
               <au>
                  <snm>Ashurst</snm>
                  <fnm>JL</fnm>
               </au>
               <au>
                  <snm>Collins</snm>
                  <fnm>JE</fnm>
               </au>
            </aug>
            <source>Annu Rev Genomics Human Genet</source>
            <pubdate>2003</pubdate>
            <volume>4</volume>
            <fpage>69</fpage>
            <lpage>88</lpage>
            <xrefbib>
               <pubid idtype="doi">10.1146/annurev.genom.4.070802.110300</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B6">
            <title>
               <p>Community-based gene structure annotation.</p>
            </title>
            <aug>
               <au>
                  <snm>Schlueter</snm>
                  <fnm>SD</fnm>
               </au>
               <au>
                  <snm>Wilkerson</snm>
                  <fnm>MD</fnm>
               </au>
               <au>
                  <snm>Huala</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Rhee</snm>
                  <fnm>SY</fnm>
               </au>
               <au>
                  <snm>Brendel</snm>
                  <fnm>V</fnm>
               </au>
            </aug>
            <source>Trends Plant Sci</source>
            <pubdate>2005</pubdate>
            <volume>10</volume>
            <fpage>9</fpage>
            <lpage>14</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/j.tplants.2004.11.002</pubid>
                  <pubid idtype="pmpid" link="fulltext">15642518</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B7">
            <title>
               <p>JIGSAW: integration of multiple sources of evidence for gene prediction.</p>
            </title>
            <aug>
               <au>
                  <snm>Allen</snm>
                  <fnm>JE</fnm>
               </au>
               <au>
                  <snm>Salzberg</snm>
                  <fnm>SL</fnm>
               </au>
            </aug>
            <source>Bioinformatics</source>
            <pubdate>2005</pubdate>
            <volume>21</volume>
            <fpage>3596</fpage>
            <lpage>3603</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/bioinformatics/bti609</pubid>
                  <pubid idtype="pmpid" link="fulltext">16076884</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B8">
            <title>
               <p>GAZE: a generic framework for the integration of gene-prediction data by dynamic programming.</p>
            </title>
            <aug>
               <au>
                  <snm>Howe</snm>
                  <fnm>KL</fnm>
               </au>
               <au>
                  <snm>Chothia</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Durbin</snm>
                  <fnm>R</fnm>
               </au>
            </aug>
            <source>Genome Res</source>
            <pubdate>2002</pubdate>
            <volume>12</volume>
            <fpage>1418</fpage>
            <lpage>1427</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">186661</pubid>
                  <pubid idtype="pmpid" link="fulltext">12213779</pubid>
                  <pubid idtype="doi">10.1101/gr.149502</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B9">
            <title>
               <p>Integrating alternative splicing detection into gene prediction.</p>
            </title>
            <aug>
               <au>
                  <snm>Foissac</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Schiex</snm>
                  <fnm>T</fnm>
               </au>
            </aug>
            <source>BMC Bioinformatics</source>
            <pubdate>2005</pubdate>
            <volume>6</volume>
            <fpage>25</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">550657</pubid>
                  <pubid idtype="pmpid" link="fulltext">15705189</pubid>
                  <pubid idtype="doi">10.1186/1471-2105-6-25</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B10">
            <title>
               <p>Complete reannotation of the <it>Arabidopsis</it> genome: methods, tools, protocols and the final release.</p>
            </title>
            <aug>
               <au>
                  <snm>Haas</snm>
                  <fnm>BJ</fnm>
               </au>
               <au>
                  <snm>Wortman</snm>
                  <fnm>JR</fnm>
               </au>
               <au>
                  <snm>Ronning</snm>
                  <fnm>CM</fnm>
               </au>
               <au>
                  <snm>Hannick</snm>
                  <fnm>LI</fnm>
               </au>
               <au>
                  <snm>Smith</snm>
                  <fnm>RK</fnm>
                  <suf>Jr</suf>
               </au>
               <au>
                  <snm>Maiti</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Chan</snm>
                  <fnm>AP</fnm>
               </au>
               <au>
                  <snm>Yu</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Farzad</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Wu</snm>
                  <fnm>D</fnm>
               </au>
               <etal/>
            </aug>
            <source>BMC Biol</source>
            <pubdate>2005</pubdate>
            <volume>3</volume>
            <fpage>7</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1082884</pubid>
                  <pubid idtype="pmpid" link="fulltext">15784138</pubid>
                  <pubid idtype="doi">10.1186/1741-7007-3-7</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B11">
            <title>
               <p>The institute for genomic research Osa1 rice genome annotation database.</p>
            </title>
            <aug>
               <au>
                  <snm>Yuan</snm>
                  <fnm>Q</fnm>
               </au>
               <au>
                  <snm>Ouyang</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Wang</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Zhu</snm>
                  <fnm>W</fnm>
               </au>
               <au>
                  <snm>Maiti</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Lin</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Hamilton</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Haas</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Sultana</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Cheung</snm>
                  <fnm>F</fnm>
               </au>
               <etal/>
            </aug>
            <source>Plant Physiol</source>
            <pubdate>2005</pubdate>
            <volume>138</volume>
            <fpage>18</fpage>
            <lpage>26</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1104156</pubid>
                  <pubid idtype="pmpid" link="fulltext">15888674</pubid>
                  <pubid idtype="doi">10.1104/pp.104.059063</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B12">
            <title>
               <p>The Vertebrate Genome Annotation (Vega) database.</p>
            </title>
            <aug>
               <au>
                  <snm>Ashurst</snm>
                  <fnm>JL</fnm>
               </au>
               <au>
                  <snm>Chen</snm>
                  <fnm>CK</fnm>
               </au>
               <au>
                  <snm>Gilbert</snm>
                  <fnm>JG</fnm>
               </au>
               <au>
                  <snm>Jekosch</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Keenan</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Meidl</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Searle</snm>
                  <fnm>SM</fnm>
               </au>
               <au>
                  <snm>Stalker</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Storey</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Trevanion</snm>
                  <fnm>S</fnm>
               </au>
               <etal/>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>2005</pubdate>
            <volume>33</volume>
            <fpage>D459</fpage>
            <lpage>465</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">540089</pubid>
                  <pubid idtype="pmpid" link="fulltext">15608237</pubid>
                  <pubid idtype="doi">10.1093/nar/gki135</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B13">
            <title>
               <p>Open annotation offers a democratic solution to genome sequencing.</p>
            </title>
            <aug>
               <au>
                  <snm>Hubbard</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Birney</snm>
                  <fnm>E</fnm>
               </au>
            </aug>
            <source>Nature</source>
            <pubdate>2000</pubdate>
            <volume>403</volume>
            <fpage>825</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/35002770</pubid>
                  <pubid idtype="pmpid" link="fulltext">10706254</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B14">
            <title>
               <p>Sequencing solution: use volunteer annotators organized via Internet.</p>
            </title>
            <aug>
               <au>
                  <snm>Brinkman</snm>
                  <fnm>FSL</fnm>
               </au>
               <au>
                  <snm>Hancock</snm>
                  <fnm>REW</fnm>
               </au>
               <au>
                  <snm>Stover</snm>
                  <fnm>CK</fnm>
               </au>
            </aug>
            <source>Nature</source>
            <pubdate>2000</pubdate>
            <volume>406</volume>
            <fpage>933</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/35023188</pubid>
                  <pubid idtype="pmpid" link="fulltext">10984027</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B15">
            <title>
               <p>Genome annotation: from sequence to biology.</p>
            </title>
            <aug>
               <au>
                  <snm>Stein</snm>
                  <fnm>L</fnm>
               </au>
            </aug>
            <source>Nat Rev Genet</source>
            <pubdate>2001</pubdate>
            <volume>2</volume>
            <fpage>493</fpage>
            <lpage>503</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/35080529</pubid>
                  <pubid idtype="pmpid" link="fulltext">11433356</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B16">
            <title>
               <p>ASAP, a systematic annotation package for community analysis of genomes.</p>
            </title>
            <aug>
               <au>
                  <snm>Glasner</snm>
                  <fnm>JD</fnm>
               </au>
               <au>
                  <snm>Liss</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Plunkett</snm>
                  <fnm>G</fnm>
                  <suf>3rd</suf>
               </au>
               <au>
                  <snm>Darling</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Prasad</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Rusch</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Byrnes</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Gilson</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Biehl</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Blattner</snm>
                  <fnm>FR</fnm>
               </au>
               <au>
                  <snm>Perna</snm>
                  <fnm>NT</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>2003</pubdate>
            <volume>31</volume>
            <fpage>147</fpage>
            <lpage>151</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">165572</pubid>
                  <pubid idtype="pmpid" link="fulltext">12519969</pubid>
                  <pubid idtype="doi">10.1093/nar/gkg125</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B17">
            <title>
               <p>PeerGAD: a peer-review-based and community-centric web application for viewing and annotating prokaryotic genome sequences.</p>
            </title>
            <aug>
               <au>
                  <snm>D'Ascenzo</snm>
                  <fnm>MD</fnm>
               </au>
               <au>
                  <snm>Collmer</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Martin</snm>
                  <fnm>GB</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>2004</pubdate>
            <volume>32</volume>
            <fpage>3124</fpage>
            <lpage>3135</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">434426</pubid>
                  <pubid idtype="pmpid" link="fulltext">15184545</pubid>
                  <pubid idtype="doi">10.1093/nar/gkh615</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B18">
            <title>
               <p>Pseudomonas aeruginosa Genome Database and PseudoCAP: facilitating community-based, continually updated, genome annotation.</p>
            </title>
            <aug>
               <au>
                  <snm>Winsor</snm>
                  <fnm>GL</fnm>
               </au>
               <au>
                  <snm>Lo</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Sui</snm>
                  <fnm>SJ</fnm>
               </au>
               <au>
                  <snm>Ung</snm>
                  <fnm>KS</fnm>
               </au>
               <au>
                  <snm>Huang</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Cheng</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Ching</snm>
                  <fnm>WK</fnm>
               </au>
               <au>
                  <snm>Hancock</snm>
                  <fnm>RE</fnm>
               </au>
               <au>
                  <snm>Brinkman</snm>
                  <fnm>FS</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>2005</pubdate>
            <volume>33</volume>
            <fpage>D338</fpage>
            <lpage>343</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">540001</pubid>
                  <pubid idtype="pmpid" link="fulltext">15608211</pubid>
                  <pubid idtype="doi">10.1093/nar/gki047</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B19">
            <title>
               <p>Apollo: a sequence annotation editor.</p>
            </title>
            <aug>
               <au>
                  <snm>Lewis</snm>
                  <fnm>SE</fnm>
               </au>
               <au>
                  <snm>Searle</snm>
                  <fnm>SM</fnm>
               </au>
               <au>
                  <snm>Harris</snm>
                  <fnm>N</fnm>
               </au>
               <au>
                  <snm>Gibson</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Lyer</snm>
                  <fnm>V</fnm>
               </au>
               <au>
                  <snm>Richter</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Wiel</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Bayraktaroglir</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Birney</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Crosby</snm>
                  <fnm>MA</fnm>
               </au>
               <etal/>
            </aug>
            <source>Genome Biol</source>
            <pubdate>2002</pubdate>
            <volume>3</volume>
            <fpage>RESEARCH0082</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">151184</pubid>
                  <pubid idtype="pmpid" link="fulltext">12537571</pubid>
                  <pubid idtype="doi">10.1186/gb-2002-3-12-research0082</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B20">
            <title>
               <p>Database resources of the National Center for Biotechnology Information.</p>
            </title>
            <aug>
               <au>
                  <snm>Wheeler</snm>
                  <fnm>DL</fnm>
               </au>
               <au>
                  <snm>Barrett</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Benson</snm>
                  <fnm>DA</fnm>
               </au>
               <au>
                  <snm>Bryant</snm>
                  <fnm>SH</fnm>
               </au>
               <au>
                  <snm>Canese</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Church</snm>
                  <fnm>DM</fnm>
               </au>
               <au>
                  <snm>DiCuccio</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Edgar</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Federhen</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Helmberg</snm>
                  <fnm>W</fnm>
               </au>
               <etal/>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>2005</pubdate>
            <volume>33</volume>
            <fpage>D39</fpage>
            <lpage>45</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">540016</pubid>
                  <pubid idtype="pmpid" link="fulltext">15608222</pubid>
                  <pubid idtype="doi">10.1093/nar/gki062</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B21">
            <title>
               <p>Annotation for Amateurs</p>
            </title>
            <url>http://www.plantgdb.org/tutorial/annotatemodule</url>
         </bibl>
         <bibl id="B22">
            <title>
               <p>Prediction of complete gene structures in human genomic DNA.</p>
            </title>
            <aug>
               <au>
                  <snm>Burge</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Karlin</snm>
                  <fnm>S</fnm>
               </au>
            </aug>
            <source>J Mol Biol</source>
            <pubdate>1997</pubdate>
            <volume>268</volume>
            <fpage>78</fpage>
            <lpage>94</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1006/jmbi.1997.0951</pubid>
                  <pubid idtype="pmpid" link="fulltext">9149143</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B23">
            <title>
               <p>GeneMark: web software for gene finding in prokaryotes, eukaryotes and viruses.</p>
            </title>
            <aug>
               <au>
                  <snm>Besemer</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Borodovsky</snm>
                  <fnm>M</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>2005</pubdate>
            <volume>33</volume>
            <fpage>W451</fpage>
            <lpage>W454</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1160247</pubid>
                  <pubid idtype="pmpid" link="fulltext">15980510</pubid>
                  <pubid idtype="doi">10.1093/nar/gki487</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B24">
            <title>
               <p>GeneSeqer@PlantGDB: Gene structure prediction in plant genomes.</p>
            </title>
            <aug>
               <au>
                  <snm>Schlueter</snm>
                  <fnm>SD</fnm>
               </au>
               <au>
                  <snm>Dong</snm>
                  <fnm>Q</fnm>
               </au>
               <au>
                  <snm>Brendel</snm>
                  <fnm>V</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>2003</pubdate>
            <volume>31</volume>
            <fpage>3597</fpage>
            <lpage>3600</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">168940</pubid>
                  <pubid idtype="pmpid" link="fulltext">12824374</pubid>
                  <pubid idtype="doi">10.1093/nar/gkg533</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B25">
            <title>
               <p>Generic Feature Format Version 3</p>
            </title>
            <url>http://song.sourceforge.net/gff3.shtml</url>
         </bibl>
         <bibl id="B26">
            <title>
               <p>Refined annotation of the <it>Arabidopsis</it> genome by complete expressed sequence tag mapping.</p>
            </title>
            <aug>
               <au>
                  <snm>Zhu</snm>
                  <fnm>W</fnm>
               </au>
               <au>
                  <snm>Schlueter</snm>
                  <fnm>SD</fnm>
               </au>
               <au>
                  <snm>Brendel</snm>
                  <fnm>V</fnm>
               </au>
            </aug>
            <source>Plant Physiol</source>
            <pubdate>2003</pubdate>
            <volume>132</volume>
            <fpage>469</fpage>
            <lpage>484</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">166990</pubid>
                  <pubid idtype="pmpid" link="fulltext">12805580</pubid>
                  <pubid idtype="doi">10.1104/pp.102.018101</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B27">
            <title>
               <p>An <it>Arabidopsis</it> thaliana Plant Genome Database</p>
            </title>
            <url>http://www.plantgdb.org/AtGDB</url>
         </bibl>
         <bibl id="B28">
            <title>
               <p>A <it>Zea mays</it> Plant Genome Database</p>
            </title>
            <url>http://www.plantgdb.org/ZmGDB</url>
         </bibl>
         <bibl id="B29">
            <title>
               <p>An <it>Oryza sativa</it> Genome Database</p>
            </title>
            <url>http://www.plantgdb.org/OsGDB</url>
         </bibl>
         <bibl id="B30">
            <title>
               <p>Gene structure prediction from consensus spliced alignment of multiple ESTs matching the same genomic locus.</p>
            </title>
            <aug>
               <au>
                  <snm>Brendel</snm>
                  <fnm>V</fnm>
               </au>
               <au>
                  <snm>Xing</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Zhu</snm>
                  <fnm>W</fnm>
               </au>
            </aug>
            <source>Bioinformatics</source>
            <pubdate>2004</pubdate>
            <volume>20</volume>
            <fpage>1157</fpage>
            <lpage>1169</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/bioinformatics/bth058</pubid>
                  <pubid idtype="pmpid" link="fulltext">14764557</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B31">
            <title>
               <p>yrGATE @ ZmGDB: Community Annotation Central</p>
            </title>
            <url>http://www.plantgdb.org/ZmGDB_yrGATE-cgi/CommunityCentral.pl</url>
         </bibl>
         <bibl id="B32">
            <title>
               <p>The distributed annotation system.</p>
            </title>
            <aug>
               <au>
                  <snm>Dowell</snm>
                  <fnm>RD</fnm>
               </au>
               <au>
                  <snm>Jokerst</snm>
                  <fnm>RM</fnm>
               </au>
               <au>
                  <snm>Day</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Eddy</snm>
                  <fnm>SR</fnm>
               </au>
               <au>
                  <snm>Stein</snm>
                  <fnm>L</fnm>
               </au>
            </aug>
            <source>BMC Bioinformatics</source>
            <pubdate>2001</pubdate>
            <volume>2</volume>
            <fpage>7</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">58584</pubid>
                  <pubid idtype="pmpid" link="fulltext">11667947</pubid>
                  <pubid idtype="doi">10.1186/1471-2105-2-7</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B33">
            <title>
               <p>The Distributed Annotation System</p>
            </title>
            <url>http://www.biodas.org</url>
         </bibl>
         <bibl id="B34">
            <title>
               <p>yrGATE with DAS input</p>
            </title>
            <url>http://www.plantgdb.org/DAS_yrGATE</url>
         </bibl>
         <bibl id="B35">
            <title>
               <p>The UCSC Genome Browser Database.</p>
            </title>
            <aug>
               <au>
                  <snm>Karolchik</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Baertsch</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Diekhans</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Furey</snm>
                  <fnm>TS</fnm>
               </au>
               <au>
                  <snm>Hinrichs</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Lu</snm>
                  <fnm>YT</fnm>
               </au>
               <au>
                  <snm>Roskin</snm>
                  <fnm>KM</fnm>
               </au>
               <au>
                  <snm>Schwartz</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Sugnet</snm>
                  <fnm>CW</fnm>
               </au>
               <au>
                  <snm>Thomas</snm>
                  <fnm>DJ</fnm>
               </au>
               <etal/>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>2003</pubdate>
            <volume>31</volume>
            <fpage>51</fpage>
            <lpage>54</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">165576</pubid>
                  <pubid idtype="pmpid" link="fulltext">12519945</pubid>
                  <pubid idtype="doi">10.1093/nar/gkg129</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B36">
            <title>
               <p>The UCSC Genome Database</p>
            </title>
            <url>http://genome.cse.ucsc.edu/</url>
         </bibl>
         <bibl id="B37">
            <title>
               <p>NCBI Reference Sequence (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins.</p>
            </title>
            <aug>
               <au>
                  <snm>Pruitt</snm>
                  <fnm>KD</fnm>
               </au>
               <au>
                  <snm>Tatusova</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Maglott</snm>
                  <fnm>DR</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>2005</pubdate>
            <volume>33</volume>
            <fpage>D501</fpage>
            <lpage>504</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">539979</pubid>
                  <pubid idtype="pmpid" link="fulltext">15608248</pubid>
                  <pubid idtype="doi">10.1093/nar/gki025</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B38">
            <title>
               <p>UCSC Genome Browser RefSeq Genes Track</p>
            </title>
            <url>http://genome.ucsc.edu/cgi-bin/hgTrackUi?db=galGal2&amp;g=refGene</url>
         </bibl>
         <bibl id="B39">
            <title>
               <p>Integrating genomic homology into gene structure prediction.</p>
            </title>
            <aug>
               <au>
                  <snm>Korf</snm>
                  <fnm>I</fnm>
               </au>
               <au>
                  <snm>Flicek</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Duan</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Brent</snm>
                  <fnm>MR</fnm>
               </au>
            </aug>
            <source>Bioinformatics</source>
            <pubdate>2001</pubdate>
            <volume>17</volume>
            <issue>Suppl 1</issue>
            <fpage>S140</fpage>
            <lpage>148</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">11473003</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B40">
            <title>
               <p>The Ensembl genome database project.</p>
            </title>
            <aug>
               <au>
                  <snm>Hubbard</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Barker</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Birney</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Cameron</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Chen</snm>
                  <fnm>Y</fnm>
               </au>
               <au>
                  <snm>Clark</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Cox</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Cuff</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Curwen</snm>
                  <fnm>V</fnm>
               </au>
               <au>
                  <snm>Down</snm>
                  <fnm>T</fnm>
               </au>
               <etal/>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>2002</pubdate>
            <volume>30</volume>
            <fpage>38</fpage>
            <lpage>41</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">99161</pubid>
                  <pubid idtype="pmpid" link="fulltext">11752248</pubid>
                  <pubid idtype="doi">10.1093/nar/30.1.38</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B41">
            <title>
               <p>Assembling genes from predicted exons in linear time with dynamic programming.</p>
            </title>
            <aug>
               <au>
                  <snm>Guigo</snm>
                  <fnm>R</fnm>
               </au>
            </aug>
            <source>J Comput Biol</source>
            <pubdate>1998</pubdate>
            <volume>5</volume>
            <fpage>681</fpage>
            <lpage>702</lpage>
            <xrefbib>
               <pubid idtype="pmpid">10072084</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B42">
            <title>
               <p>Comparative gene prediction in human and mouse.</p>
            </title>
            <aug>
               <au>
                  <snm>Parra</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Agarwal</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Abril</snm>
                  <fnm>JF</fnm>
               </au>
               <au>
                  <snm>Wiehe</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Fickett</snm>
                  <fnm>JW</fnm>
               </au>
               <au>
                  <snm>Guigo</snm>
                  <fnm>R</fnm>
               </au>
            </aug>
            <source>Genome Res</source>
            <pubdate>2003</pubdate>
            <volume>13</volume>
            <fpage>108</fpage>
            <lpage>117</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">430976</pubid>
                  <pubid idtype="pmpid" link="fulltext">12529313</pubid>
                  <pubid idtype="doi">10.1101/gr.871403</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B43">
            <title>
               <p>UCSC Genome Browser Non-Chicken RefSeq Genes Track</p>
            </title>
            <url>http://genome.ucsc.edu/cgi-bin/hgTrackUi?db=galGal2&amp;g=xenoRefGene</url>
         </bibl>
         <bibl id="B44">
            <title>
               <p>Your Gene structure Annotation Tool for Eukaryotes</p>
            </title>
            <url>http://www.plantgdb.org/prj/yrGATE</url>
         </bibl>
      </refgrp>
   </bm>
</art>
