<?xml version='1.0'?>
<!DOCTYPE art SYSTEM 'http://www.biomedcentral.com/xml/article.dtd'>
<art>
   <ui>1471-2164-10-62</ui>
   <ji>1471-2164</ji>
   <fm>
      <dochead>Research article</dochead>
      <bibl>
         <title>
            <p>Revisiting the missing protein-coding gene catalog of the domestic dog</p>
         </title>
         <aug>
            <au id="A1">
               <snm>Derrien</snm>
               <fnm>Thomas</fnm>
               <insr iid="I1"/>
               <insr iid="I3"/>
               <email>thomas.derrien@crg.es</email>
            </au>
            <au id="A2">
               <snm>Th&#233;z&#233;</snm>
               <fnm>Julien</fnm>
               <insr iid="I1"/>
               <email>theze.julien@gmail.com</email>
            </au>
            <au id="A3">
               <snm>Vaysse</snm>
               <fnm>Amaury</fnm>
               <insr iid="I1"/>
               <email>amaury.vaysse@univ-rennes1.fr</email>
            </au>
            <au id="A4">
               <snm>Andr&#233;</snm>
               <fnm>Catherine</fnm>
               <insr iid="I1"/>
               <email>catherine.andre@univ-rennes1.fr</email>
            </au>
            <au id="A5">
               <snm>Ostrander</snm>
               <mi>A</mi>
               <fnm>Elaine</fnm>
               <insr iid="I2"/>
               <email>eostrand@mail.nih.gov</email>
            </au>
            <au id="A6">
               <snm>Galibert</snm>
               <fnm>Francis</fnm>
               <insr iid="I1"/>
               <email>francis.galibert@univ-rennes1.fr</email>
            </au>
            <au ca="yes" id="A7">
               <snm>Hitte</snm>
               <fnm>Christophe</fnm>
               <insr iid="I1"/>
               <email>hitte@univ-rennes1.fr</email>
            </au>
         </aug>
         <insg>
            <ins id="I1">
               <p>Institut de G&#233;n&#233;tique et D&#233;veloppement, CNRS UMR6061, Universit&#233; de Rennes1, 2 Av du Pr. L&#233;on Bernard, 35043 Rennes, France</p>
            </ins>
            <ins id="I2">
               <p>Cancer Genetics Branch, National Human Genome Research Institute, National Institutes of Health, 50 South Drive, Bethesda MD 20892, USA</p>
            </ins>
            <ins id="I3">
               <p>Centre for Genomic Regulation (CRG), Bioinformatics Program C/Dr. Aiguader, 88 08003 Barcelona, Spain</p>
            </ins>
         </insg>
         <source>BMC Genomics</source>
         <issn>1471-2164</issn>
         <pubdate>2009</pubdate>
         <volume>10</volume>
         <issue>1</issue>
         <fpage>62</fpage>
         <url>http://www.biomedcentral.com/1471-2164/10/62</url>
         <xrefbib>
            <pubidlist>
               <pubid idtype="pmpid">19193219</pubid>
               <pubid idtype="doi">10.1186/1471-2164-10-62</pubid>
            </pubidlist>
         </xrefbib>
      </bibl>
      <history>
         <rec>
            <date>
               <day>28</day>
               <month>8</month>
               <year>2008</year>
            </date>
         </rec>
         <acc>
            <date>
               <day>04</day>
               <month>2</month>
               <year>2009</year>
            </date>
         </acc>
         <pub>
            <date>
               <day>04</day>
               <month>2</month>
               <year>2009</year>
            </date>
         </pub>
      </history>
      <cpyrt>
         <year>2009</year>
         <collab>Derrien et al; licensee BioMed Central Ltd.</collab>
         <note>This is an Open Access article distributed under the terms of the Creative Commons Attribution License (<url>http://creativecommons.org/licenses/by/2.0</url>), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</note>
      </cpyrt>
      <abs>
         <sec>
            <st>
               <p>Abstract</p>
            </st>
            <sec>
               <st>
                  <p>Background</p>
               </st>
               <p>Among mammals for which there is a high sequence coverage, the whole genome assembly of the dog is unique in that it predicts a low number of protein-coding genes, ~19,000, compared to the over 20,000 reported for other mammalian species. Of particular interest are the more than 400 of genes annotated in primates and rodent genomes, but missing in dog.</p>
            </sec>
            <sec>
               <st>
                  <p>Results</p>
               </st>
               <p>Using over 14,000 orthologous genes between human, chimpanzee, mouse rat and dog, we built multiple pairwise synteny maps to infer short orthologous intervals that were targeted for characterizing the canine missing genes. Based on gene prediction and a functionality test using the ratio of replacement to silent nucleotide substitution rates (<it>d</it><sub>N</sub>/<it>d</it><sub>S</sub>), we provide compelling structural and functional evidence for the identification of 232 new protein-coding genes in the canine genome and 69 gene losses, characterized as undetected gene or pseudogenes. Gene loss phyletic pattern analysis using ten species from chicken to human allowed us to characterize 28 canine-specific gene losses that have functional orthologs continuously from chicken or marsupials through human, and 10 genes that arose specifically in the evolutionary lineage leading to rodent and primates.</p>
            </sec>
            <sec>
               <st>
                  <p>Conclusion</p>
               </st>
               <p>This study demonstrates the central role of comparative genomics for refining gene catalogs and exploring the evolutionary history of gene repertoires, particularly as applied for the characterization of species-specific gene gains and losses.</p>
            </sec>
         </sec>
      </abs>
   </fm>
   <meta>
      <classifications>
         <classification id="endnote" subtype="user_supplied_xml" type="bmc"/>
      </classifications>
   </meta>
   <bdy>
      <sec>
         <st>
            <p>Background</p>
         </st>
         <p>Comparative genomics plays a key role in understanding organism evolution, refining functional annotation and identifying orthology relationships. By taking advantage of whole-genome sequence assemblies with a high level of coverage <abbrgrp><abbr bid="B1">1</abbr><abbr bid="B2">2</abbr><abbr bid="B3">3</abbr><abbr bid="B4">4</abbr></abbrgrp>, one can seek to provide exhaustive and genome-scale level predictions regarding functional sequence <abbrgrp><abbr bid="B5">5</abbr></abbrgrp>. The general approach relies on the exploitation of sequence similarities <abbrgrp><abbr bid="B6">6</abbr><abbr bid="B7">7</abbr><abbr bid="B8">8</abbr></abbrgrp> phylogenetic data <abbrgrp><abbr bid="B9">9</abbr><abbr bid="B10">10</abbr></abbrgrp>, evolutionary models <abbrgrp><abbr bid="B11">11</abbr><abbr bid="B12">12</abbr></abbrgrp> and evidence regarding conservation of gene order <abbrgrp><abbr bid="B13">13</abbr><abbr bid="B14">14</abbr><abbr bid="B15">15</abbr></abbrgrp>. These often complementary comparative approaches have been developed to estimate and improve the identification of functional sequences for both newly sequenced species as well as reference species, such as human and mouse <abbrgrp><abbr bid="B16">16</abbr><abbr bid="B17">17</abbr><abbr bid="B18">18</abbr></abbrgrp>. Moreover, multispecies genome scale comparisons allow to refine protein-coding genes annotation <abbrgrp><abbr bid="B19">19</abbr><abbr bid="B20">20</abbr><abbr bid="B21">21</abbr></abbrgrp> as well as better understanding of the timing and the frequency of duplication events for lineage-specific genes called in-paralogs <abbrgrp><abbr bid="B22">22</abbr><abbr bid="B23">23</abbr></abbrgrp>.</p>
         <p>Fine-scale comparative maps constructed using robust orthologous sequences are key for allowing identification, visualization and characterization of conserved segments as well as collinearity of gene order between the species <abbrgrp><abbr bid="B24">24</abbr><abbr bid="B25">25</abbr></abbrgrp>. Gene order between species is not random and this has been shown to correlate with, for example, co-expressed and co-regulated genes suggesting a functional significance <abbrgrp><abbr bid="B26">26</abbr></abbrgrp>. Otherwise, gene order conservation between species could also be exploited to identify relocated protein-coding genes in non-syntenic chromosomal regions <abbrgrp><abbr bid="B27">27</abbr></abbrgrp>, as well as potentially retrotransposed genes given that the latter correspond mostly to pseudogenes inserted in non-syntenic regions <abbrgrp><abbr bid="B10">10</abbr></abbrgrp>. Consequently, as part of the characterization of architecture of a genome, analysis of gene order conservation between species can be a strong indicator for both gene prediction <abbrgrp><abbr bid="B28">28</abbr></abbrgrp> and identification of gene loss <abbrgrp><abbr bid="B29">29</abbr></abbrgrp>.</p>
         <p>In this study, we have analyzed the sequence assembly of the domestic dog for which the annotation process identified less protein-coding genes than expected compared to predictions from the primates and rodent genomes. We focused on a set of 412 genes that are all annotated in four closely related mammals; human, chimpanzee, mouse and rat, but absent in the dog genome in the most recent assembly of the dog (CanFam 2.0). We exploited the property of gene adjacency conservation between related species to target in-depth sequence alignments over a short genomic interval. In addition, our approach includes a functionality test that investigates the ratio of amino acid replacement (nonsynonymous, <it>d</it><sub>N</sub>) to silent (synonymous, <it>d</it><sub>S</sub>) substitution rates, which indicates selective constraints acting on a given genomic regions <abbrgrp><abbr bid="B10">10</abbr></abbrgrp>. As mutations in genes causing amino acid replacements with functional consequences are selected against in contrast to mutations occurring in pseudogenes, we took advantage of the distinctive patterns of <it>d</it><sub>N</sub>/<it>d</it><sub>S </sub>ratios to refine the identification of new gene predictions and gene losses occurring in dog.</p>
         <p>Using the above strategies we identified 232 canine genes for which synteny conservation, cross-species sequence analysis and the neutral rate of evolution based on <it>d</it><sub>N</sub>/<it>d</it><sub>S </sub>results converged strongly to support their existence. In addition, we identified 69 gene-loss candidates of which predictions for which accumulating ORF-disrupting mutations, and significant <it>d</it><sub>N</sub>/<it>d</it><sub>S </sub>ratios support scenarios of 21 genes lost as pseudogenes in the canine species. To further characterize gene losses, we inferred their phyletic pattern in ten species from chicken to human over a period of 310 million years. Therefore, we were able to differentiate canine-specific losses from gene losses that have occurred in others lineage or genes formed after the evolutionary branchpoint leading to dog.</p>
      </sec>
      <sec>
         <st>
            <p>Results</p>
         </st>
         <p>Using all annotated genes from human, chimpanzee, mouse, rat and dog (Ensembl v42) <abbrgrp><abbr bid="B30">30</abbr></abbrgrp>, we extracted 412 genes annotated as protein-coding in all species but dog. These genes exhibit a '1:1:1:1:0' phyletic pattern, that is indicative of the presence/absence of genes with a one-to-one orthologous relationship among the five species. We refer to these as 'missing genes' for purposes of this study. We examined the structural features of the 412 missing genes in the four mammalian reference sequences and compared them to an independent and randomly selected set of 400 genes. The mean length of the protein products of the missing genes set was 722 amino acids (AA), which is significantly smaller than the random set at 905 AA (<it>t </it>test; <it>P </it>= 6.8e - 11). Similarly, the mean transcript size was ~50% smaller than observed in a random set (<it>t </it>test; <it>P </it>= 2.6e - 9). The mean number of exons in missing genes was also smaller (5.8 vs 9.8; <it>t </it>test;<it>P </it>= 3.7e - 13) than the random set and particularly single-exon genes were found to be over represented by 15%. To ensure that single-exon missing genes were functional and not processed pseudogenes, we analyzed each, using the human dataset, for accumulated degenerative mutations (frameshifts and premature stop codons) in their coding sequence and found none. In addition, we identified sequence alignment between single-exon genes and ESTs (sequence similarity > 96% for at least 150 bp) for 95% of them.</p>
         <p>To test the underlying assumption that missing genes may be implicated in particular biological pathways, we examined their functional annotation in the context of Gene Ontology (GO) using the program GO Tree Machine <abbrgrp><abbr bid="B31">31</abbr></abbrgrp>. Using the human sequence as a reference, the results demonstrate that the missing gene set is enriched for genes implicated in physiological pathways of immunity and organism responses to pathogens (12 genes), olfaction (16) and regulation of transcription (63). This classification comprises functional pathways that play an important role in the adaptation of organisms to their environment. Interestingly, these biological functions are often linked to large proteins families that are attractive targets for lineage-specific functions and lineage-specific loss and gain of genes <abbrgrp><abbr bid="B32">32</abbr></abbrgrp>.</p>
         <sec>
            <st>
               <p>Constructing synteny maps with 1:1 orthologs</p>
            </st>
            <p>We extracted pairwise sets of 14,997; 14,798; 14,667 and 14,065 one-to-one (1:1) orthologous protein-coding genes (Ensembl v42) between human and dog (H-D), chimpanzee-dog (C-D), mouse-dog (M-D) and rat-dog (R-D), respectively. Using those 1:1 orthologs as comparative anchors, we built four fine-scale whole-genome pairwise synteny maps (Additional data file <supplr sid="S1">1</supplr>) with the program AutoGRAPH, which we recently developed <abbrgrp><abbr bid="B13">13</abbr></abbrgrp>. We identified 218, 229, 326 and 325 CSOs, i.e. chromosomal segments for which markers are in the same linear order on the chromosome as noted across species <abbrgrp><abbr bid="B25">25</abbr></abbrgrp>, between H-D, C-D, M-D and R-D respectively. The mean distance between two consecutive genes was ~180 kb. In all synteny maps, CSOs cover almost the entire genome while breakpoint regions, areas delimitating CSOs, cover only ~5% of a genome and may contain single-gene segment or very short synteny blocks <abbrgrp><abbr bid="B33">33</abbr></abbrgrp> (Additional data file <supplr sid="S2">2</supplr>).</p>
            <suppl id="S1">
               <title>
                  <p>Additional file 1</p>
               </title>
               <text>
                  <p><b>Human-dog synteny map: Example of human chromosome 5.</b> An example of the synteny map built between human chromosome 5 and the dog genome.</p>
               </text>
               <file name="1471-2164-10-62-S1.pdf">
                  <p>Click here for file</p>
               </file>
            </suppl>
            <suppl id="S2">
               <title>
                  <p>Additional file 2</p>
               </title>
               <text>
                  <p><b>Synteny maps characteristics.</b> The data indicates the main characteristics of the synteny maps.</p>
               </text>
               <file name="1471-2164-10-62-S2.pdf">
                  <p>Click here for file</p>
               </file>
            </suppl>
            <p>In each pairwise synteny map, we localized the missing gene orthologs on the reference sequence (Figure <figr fid="F1">1</figr>). Of the 412 missing genes, the vast majority (mean of 92.3%; range 92 to 94%) mapped within CSOs with only 7.7% mapping within breakpoints. In all reference species the missing genes spanned all chromosomes, although their distribution varied greatly, i.e. one to 44 per human (HSA) chromosome in the case of the human-dog synteny map.</p>
            <fig id="F1">
               <title>
                  <p>Figure 1</p>
               </title>
               <caption>
                  <p>Consensus Ortholog IntervaL identification</p>
               </caption>
               <text>
                  <p><b>Consensus Ortholog IntervaL identification</b>. The figure illustrates the 4-step method to infer targeted interval for gene prediction. (1) is the first step that build the pairwise synteny map (here a schematic Human-dog syntenic map) using 1:1 orthologs that are connected through colored lines. (2) 1:0 gene ('missing gene' in the dog) is positioned on the reference species of the synteny map. (3) indicates the identification of flanking 1:1 orthologs used to define an orthologous interval on the canine chromosome as indicated by red arrows. (4) is the last step that integrates the four orthologous intervals using all pairwise synteny maps (Chimpanzee-dog; Mouse-dog and Rat-dog) to define a Consensus Ortholog IntervaL (COIL) as shown on the right of the figure.</p>
               </text>
               <graphic file="1471-2164-10-62-1"/>
            </fig>
         </sec>
         <sec>
            <st>
               <p>Targeting genomic intervals</p>
            </st>
            <p>We used multiple pairwise synteny maps described above to identify short, targeted, orthologous genomic intervals. On each reference genome, these intervals are delimited by the closest flanking 1:1 orthologs on either side of each missing gene that in turn define orthologous intervals on the canine genome as shown in Figure <figr fid="F1">1</figr>. The use of multiple pairwise maps enabled us to identify the shortest consensus interval on the canine genome to search for genes, that we refer to as Consensus Ortholog IntervaLs (COILs) (Figure <figr fid="F1">1</figr>). From the 412 missing genes, we delimited 383 COILs (92.9%) having a mean size of 347 kb (Additional data file <supplr sid="S3">3</supplr>). For a set of 17 COILs (4.1%) localized in common breakpoint regions (i.e. overlapping between at least two species) <abbrgrp><abbr bid="B24">24</abbr><abbr bid="B34">34</abbr></abbrgrp> and for 12 missing genes, no COIL could be determined because of the absence of a consensus interval.</p>
            <suppl id="S3">
               <title>
                  <p>Additional file 3</p>
               </title>
               <text>
                  <p><b>Characterization of Consensus Orthologous IntervaLs (COILs) containing missing genes.</b> These data file lists the characteristics of the Consensus Orthologous Intervals.</p>
               </text>
               <file name="1471-2164-10-62-S3.pdf">
                  <p>Click here for file</p>
               </file>
            </suppl>
         </sec>
         <sec>
            <st>
               <p>Targeted gene prediction</p>
            </st>
            <p>Within each canine COIL, we used the GeneWise program <abbrgrp><abbr bid="B6">6</abbr></abbrgrp> to splice and align the protein sequence of each reference species in order to most accurately predict the structure of the dog gene. We retained gene predictions produced by at least two reference species protein templates. This produced 231 gene structure predictions with amino acid identity > 40% (Figure <figr fid="F2">2</figr>). Fifty-three genes were predicted using only rodent protein sequence as templates, thus illustrating the complementary contribution of multispecies analysis. We post-processed GeneWise results to detect potential gene features and found the presence of a coding start site for 53.1% of the gene predictions. In addition, amongst the 231 predicted genes, 75% of the predictions with multi-exonic structure exhibit at least a canonical splice site (GT/AG).</p>
            <fig id="F2">
               <title>
                  <p>Figure 2</p>
               </title>
               <caption>
                  <p>Flowchart of the computational analysis</p>
               </caption>
               <text>
                  <p><b>Flowchart of the computational analysis</b>. The left pipeline indicates all steps in the computational analysis of gene predictions and the right pipeline shows a detailed account of the process of undetected genes and pseudogenes. Gray boxes summarize the three main categories (1) new gene predictions, (2) putative artifacts, * indicates pseudogenes identified with low confidence (group I), and (3) gene losses, (**) indicates pseudogenes identified with accumulated mutations (group II) and higher <it>dN/dS </it>support. See text for details.</p>
               </text>
               <graphic file="1471-2164-10-62-2"/>
            </fig>
            <p>To address the question whether COIL delimitation is too restrictive for gene prediction, we aligned the human transcript sequences corresponding to the 383 missing genes for which we defined a COIL, against the assembly of the canine genome sequence (CanFam 2.0) with the Exonerate program <abbrgrp><abbr bid="B35">35</abbr></abbrgrp>. We repeated the analysis with chimpanzee, mouse and rat transcript sequences. We considered the best five matching sequences to relax the limitations of conventional best-match methods <abbrgrp><abbr bid="B29">29</abbr></abbrgrp>. Then, we defined a concordance between the COIL approach and the whole-genome sequence analysis, when matching sequences from the Exonerate-based analysis for at least two species were totally embedded in COILs. Based on this criterion, concordance was obtained for 342 (89.2%) genes. Of the 41 instances with no agreement between the expected syntenic location and the whole-genome sequence analysis, 36 showed weak match (identity &lt; 20%) within the canine genome assembly suggesting unspecific alignment while five showed a significant match, from at least two species suggesting that these genes may have acquired a new location in the dog. Of the latter five instances, we identified only one gene prediction (<it>PLA2G4C</it>) with conservative criteria indicating a relocated gene in a non-syntenic genomic area.</p>
            <p>In this study, we applied Genewise program with a sequence similarity-based method that explicitly models the conservation of gene structure and a high degree of conservation. As such model is known to show a marked decrease in performance for less similar genes <abbrgrp><abbr bid="B36">36</abbr></abbrgrp>, we further investigate the undetected subset of genes using a probabilistic pair hidden Markov model (HMM) that show a weaker dependence on percent identity and performs better to pick out distant homologs. The Genewise HMM based analysis allowed to predict 36 additional genes (Figure <figr fid="F2">2</figr>). Both prediction sets were merged into a single set (n = 268) for further analysis.</p>
            <p>Sequence alignments were next generated between gene predictions and canine transcript sequences (Unigene april 08 <abbrgrp><abbr bid="B37">37</abbr></abbrgrp>). We identified significant alignment (sequence similarity > 96% for at least 150 bp) in 53% of cases with an average of 7.5 ESTs/mRNA per gene prediction (range 1&#8211;99). Using Interproscan, <abbrgrp><abbr bid="B38">38</abbr></abbrgrp> protein motifs were found from InterPro database for 80.5% of the gene predictions, providing additional evidence for dog gene identification.</p>
            <p>As a further validation step, the construction of canine predicted protein three-dimensional models was investigated based on the homologous structure of the human ortholog or paralog (>40% identity), which was used as a template. For the subset of genes for which the 3D structure is solved (n = 21), canine-human comparative modelling was determined using the SWISS-MODEL server <abbrgrp><abbr bid="B39">39</abbr></abbrgrp>. In 16 instances of canine-human comparative modelling, the mean identity obtained between sequences was 70%. Homology-based 3D model for each canine prediction was validated using the Verify 3D graphs <abbrgrp><abbr bid="B40">40</abbr></abbrgrp> (data not shown) that distinguish between homology models of higher and lower accuracy.</p>
            <p>To test for possible overlap between gene predictions obtained in this study and all canine genes annotated in Ensembl (v42), we performed sequence alignment between these two sets of predictions. A total of 232 (88%) predicted genes did not overlap any Ensembl annotated protein-coding genes. Therefore, these were classified as "definite" gene identifications together with the delineation of new orthologous relationships with the four reference species (Additional data file <supplr sid="S4">4</supplr>). The remaining 36 gene predictions overlapped an annotated gene (protein identity > 80%) indicating that these gene predictions correspond to sequences already defined as genes, but with undetected or spurious orthologous relationships (Figure <figr fid="F2">2</figr>). <abbrgrp><abbr bid="B41">41</abbr></abbrgrp>.</p>
            <suppl id="S4">
               <title>
                  <p>Additional file 4</p>
               </title>
               <text>
                  <p><b>List of the 232 new predicted canine genes.</b> This table lists the 232 new gene predictions using the human gene identifiers from Ensembl.</p>
               </text>
               <file name="1471-2164-10-62-S4.pdf">
                  <p>Click here for file</p>
               </file>
            </suppl>
         </sec>
         <sec>
            <st>
               <p>Gene prediction assessment from <it>dN</it>/<it>dS </it>analysis</p>
            </st>
            <p>To assess the validity of gene predictions through the strength and direction of selective constraints, we used a functionality test that uses the ratio of replacement to silent nucleotide substitution rates (<it>d</it><sub>N</sub>/<it>d</it><sub>S</sub>). The ratio <it>d</it><sub>N</sub>/<it>d</it><sub>S</sub>, where <it>d</it><sub>N </sub>is the number of non-synonymous nucleotide substitution per non-synonymous site and <it>d</it><sub>S </sub>the number of synonymous nucleotide substitution per synonymous site, is used as a proxy for the evolutionary constraints that occur on nucleotide substitution <abbrgrp><abbr bid="B42">42</abbr></abbrgrp>. The calculation of the <it>d</it><sub>N</sub>/<it>d</it><sub>S </sub>ratio requires the comparison to a homologous reference sequence. First, we constructed a benchmark set of true orthologous genes using all 1:1 orthologous genes between human and dog (n = 14,994) to obtain a representative <it>d</it><sub>N</sub>/<it>d</it><sub>S </sub>value. From this benchmark set, we calculated the median <it>d</it><sub>N</sub>/<it>d</it><sub>S </sub>ratio of 0.15 using all <it>d</it><sub>N</sub>/<it>d</it><sub>S </sub>values extracted from the pairwise alignments of transcripts (Figure <figr fid="F3">3</figr>). To assess the 232 gene predictions identified in this study with the functionality test, we determined <it>d</it><sub>N</sub>/<it>d</it><sub>S </sub>ratio for each of the gene predictions in comparison to their human functional orthologous gene from pairwise transcripts alignments. We calculated a median <it>d</it><sub>N</sub>/<it>d</it><sub>S </sub>of 0.19, a value highly similar to the benchmark set (0.15). To further assess the <it>d</it><sub>N</sub>/<it>d</it><sub>S</sub>comparison, <it>d</it><sub>N</sub>/<it>d</it><sub>S </sub>values were analyzed through their distributions (as log <it>d</it><sub>N</sub>/<it>d</it><sub>S</sub>) between benchmark and predicted genes sets (Figure <figr fid="F4">4</figr>) and we did not detect statistically significant differences (Mann-Whitney test; <it>P </it>= 0.16). Therefore <it>d</it><sub>N</sub>/<it>d</it><sub>S </sub>similar distributions are indicative of similar high selective constraints and little or no positive selection on both benchmark and predicted genes sets, suggesting the functional properties of the canine gene predictions products involved are conserved.</p>
            <fig id="F3">
               <title>
                  <p>Figure 3</p>
               </title>
               <caption>
                  <p><it>D</it><sub>N</sub>/<it>d</it><sub>S </sub>cumulative frequency distribution of references, gene predictions and pseudogene predictions sets</p>
               </caption>
               <text>
                  <p><b><it>D</it><sub>N</sub>/<it>d</it><sub>S </sub>cumulative frequency distribution of references, gene predictions and pseudogene predictions sets</b>. Benchmark, predicted genes, pseudogenes (with one mutation) and pseudogenes (with accumulated mutations) sets exhibit a median <it>d</it><sub>N</sub>/<it>d</it><sub>S </sub>of 0.15, 0.18, 0.22, 0.47, respectively, compared to their human functional orthologues. While the <it>d</it><sub>N</sub>/<it>d</it><sub>S </sub>distribution of pseudogenes with accumulated mutations sets is clearly shifted upwards to the theoretical value of 0.57 (average between 1.0 for no selection and 0.15 for selection from the benchmark set), the pseudogene set with one mutation is not significantly shifted suggesting this set may contains spurious pseudogene prediction. Predicted and benchmark gene sets have a similar <it>d</it><sub>N</sub>/<it>d</it><sub>S </sub>cumulative frequency distribution indicating comparable selective constraints level.</p>
               </text>
               <graphic file="1471-2164-10-62-3"/>
            </fig>
            <fig id="F4">
               <title>
                  <p>Figure 4</p>
               </title>
               <caption>
                  <p><it>D</it><sub>N</sub>/<it>d</it><sub>S </sub>distributions of benchmark and test sets</p>
               </caption>
               <text>
                  <p><b><it>D</it><sub>N</sub>/<it>d</it><sub>S </sub>distributions of benchmark and test sets</b>. A. The <it>d</it><sub>N</sub>/<it>d</it><sub>S </sub>distribution (as log <it>d</it><sub>N</sub>/<it>d</it><sub>S</sub>) of the test set (new predicted genes) is represented in purple and benchmark set (human-dog 1:1 orthologous) is represented in blue. Test set exhibits a <it>d</it><sub>N</sub>/<it>d</it><sub>S </sub>distribution similar to the benchmark set (Mann-Whitney; <it>P </it>= 0.16) suggesting comparable selective constraints for both sets. B. In contrast the <it>d</it><sub>N</sub>/<it>d</it><sub>S </sub>distribution of the pseudogene (with accumulated mutations) set (red) is significantly shifted upwards (Mann-Whitney; <it>P </it>= 5.17e - 6) in comparison to the benchmark set, indicating relaxation of selective constraints on the predicted pseudogenes.</p>
               </text>
               <graphic file="1471-2164-10-62-4"/>
            </fig>
            <p>To analyze the evolutionary rate of the new canine predicted gene sequences in a phylogenetic context we used the 232 mouse genes in addition to human genes and dog predicted genes to assess the levels of selective constraint of each lineage in comparison to the rest of the tree. In this way, differences or similarity in selective constraints can be predicted on all lineages within the phylogeny. For each of the 232 genes, we inferred the <it>d</it><sub>N </sub>and <it>d</it><sub>S </sub>values and calculated the <it>d</it><sub>N</sub>/<it>d</it><sub>S </sub>ratio. The median <it>d</it><sub>N</sub>/<it>d</it><sub>S </sub>for the dog lineage was found between human and mouse (Table <tblr tid="T1">1</tblr>), a result in agreement to these determined for 13,816 human, mouse and dog genes with 1:1:1 orthologs <abbrgrp><abbr bid="B2">2</abbr></abbrgrp> with similar differences found across the three lineages.</p>
            <tbl id="T1">
               <title>
                  <p>Table 1</p>
               </title>
               <caption>
                  <p>Median and mean <it>dS </it>and <it>dN/d</it>S values of pseudogenes, predicted genes and reference set of human-canine orthologues</p>
               </caption>
               <tblbdy cols="5">
                  <r>
                     <c ca="left">
                        <p>value</p>
                     </c>
                     <c ca="left">
                        <p>Pseudogenes with one mutation</p>
                     </c>
                     <c ca="left">
                        <p>Pseudogenes with several mutations</p>
                     </c>
                     <c ca="left">
                        <p>Predicted genes</p>
                     </c>
                     <c ca="left">
                        <p>Benchmark set 1:1 dog-human orthologs</p>
                     </c>
                  </r>
                  <r>
                     <c cspan="5">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p><it>dS </it>median</p>
                     </c>
                     <c ca="left">
                        <p>0.45</p>
                     </c>
                     <c ca="left">
                        <p>0.44</p>
                     </c>
                     <c ca="left">
                        <p>0.39</p>
                     </c>
                     <c ca="left">
                        <p>0.39</p>
                     </c>
                  </r>
                  <r>
                     <c cspan="5">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p><it>dS </it>mean</p>
                     </c>
                     <c ca="left">
                        <p>0.48</p>
                     </c>
                     <c ca="left">
                        <p>0.46</p>
                     </c>
                     <c ca="left">
                        <p>0.40</p>
                     </c>
                     <c ca="left">
                        <p>0.38</p>
                     </c>
                  </r>
                  <r>
                     <c cspan="5">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p><it>dN/dS </it>median</p>
                     </c>
                     <c ca="left">
                        <p>0.18</p>
                     </c>
                     <c ca="left">
                        <p>0.50</p>
                     </c>
                     <c ca="left">
                        <p>0.19</p>
                     </c>
                     <c ca="left">
                        <p>0.15</p>
                     </c>
                  </r>
                  <r>
                     <c cspan="5">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p><it>dN/dS </it>mean</p>
                     </c>
                     <c ca="left">
                        <p>0.28</p>
                     </c>
                     <c ca="left">
                        <p>0.50</p>
                     </c>
                     <c ca="left">
                        <p>0.26</p>
                     </c>
                     <c ca="left">
                        <p>0.20</p>
                     </c>
                  </r>
               </tblbdy>
            </tbl>
         </sec>
         <sec>
            <st>
               <p>Pseudogene predictions</p>
            </st>
            <p>Off the 412 missing genes, a subset of 55 predictions containing ORF-disrupting mutations lead to pseudogene identification. Among pseudogenes, we determined if protein sequences have different numbers of in-frame stop codons and/or frameshift disruptions. Using such quantitative measures, two mutation levels were apparent. A set of inactivated genes (n = 21) was predicted with accumulated mutations (mean = 4.2; range 2&#8211;11) and a second set (n = 34) was predicted with one mutation (Figure <figr fid="F3">3</figr>). To normalize the mutation rate by taking into account the coding sequence length, we expect proteins of similar lengths to now have similar numbers of stop-codons or a frameshift. We therefore examined the ratio of accumulation of ORF-disrupting mutations per 100 AA in both groups of pseudogenes. A mutation rate of 0.28 was determined for the group of pseudogenes with one mutation and a significant higher rate of 1.21 (Mann-Whitney test; <it>P </it>= 8.052e - 7) was found for the set of pseudogenes with accumulated mutations.</p>
            <p>Although transcribed pseudogenes have been experimentally identified <abbrgrp><abbr bid="B43">43</abbr></abbrgrp>, a significant part of pseudogenes are thought to be transcriptionally silent in comparison to protein-coding genes. We thus searched for sequence alignment with canine transcript sequences (Unigene april 08 <abbrgrp><abbr bid="B37">37</abbr></abbrgrp>) to assess the transcription activity of the pseudogene predictions with two and more mutations. We obtained alignment for 14%, a result in agreement with a recent report <abbrgrp><abbr bid="B44">44</abbr></abbrgrp> showing that 19% of pseudogenes are the sources of novel RNA transcripts. These data indicate that the predicted pseudogenes are mostly undetected as expressed sequences in comparison to gene predictions with intact ORF (53%) and therefore significantly correspond to untranscribed pseudogenes <abbrgrp><abbr bid="B44">44</abbr></abbrgrp>.</p>
         </sec>
         <sec>
            <st>
               <p>Detecting nonfunctionality from <it>dN</it>/<it>dS </it>analysis</p>
            </st>
            <p>To assess independently of the presence of stop codons or frame-shifts, the validity of pseudogene predictions, we used the functionality test that uses the <it>d</it><sub>N</sub>/<it>d</it><sub>S </sub>ratio. Assuming a constant mutation rate, the <it>d</it><sub>N</sub>/<it>d</it><sub>S </sub>ratio between dog pseudogenes, for which a loss of function occurred, and their human functional orthologs should theoretically relax towards 0.57 (as the average of 1.0 in the absence of selection and 0.15 for negative selection as we calculated from the benchmark set) <abbrgrp><abbr bid="B10">10</abbr></abbrgrp>. Thus, we calculated <it>d</it><sub>N</sub>/<it>d</it><sub>S </sub>ratio for each of the candidate pseudogene predictions in comparison to their human functional orthologous gene from pairwise transcripts pair alignments. For the pseudogene set with accumulated mutations, we calculated a median <it>d</it><sub>N</sub>/<it>d</it><sub>S </sub>of 0.50 indicating a considerable relaxation of selective constraints of the canine pseudogenes in comparison to their human functional orthologous (Figure <figr fid="F3">3</figr> and Table <tblr tid="T1">1</tblr>). Furthermore, the <it>d</it><sub>N</sub>/<it>d</it><sub>S </sub>distributions obtained were shifted upwards in comparison to the benchmark set (Figure <figr fid="F4">4</figr>), which is significant to a Mann-Whitney test (<it>P </it>= 5.17e - 6), indicating relaxation of evolutionary constraints on the predicted pseudogenes. For the pseudogene set with one mutation, the median <it>d</it><sub>N</sub>/<it>d</it><sub>S </sub>of 0.18 was observed, suggesting no detectable differences in selective constraints between predicted pseudogenes from the canine sequence and their human functional counterparts. In addition, we analyzed whether the <it>d</it><sub>N</sub>/<it>d</it><sub>S </sub>ratio has an independent value before and after the stop codon among the predicted pseudogenes. In 26/28 instances, no significant differences were detected when comparing <it>d</it><sub>N</sub>/<it>d</it><sub>S </sub>ratio for the two parts of each gene. In two cases, the <it>d</it><sub>N</sub>/<it>d</it><sub>S </sub>value before the stop was indicative of strong selective constraints (&lt;0.1), in comparison to the value detected after the stop (>0.9), which suggest that the biological function may have been preserved.</p>
            <p>We next searched to determine if the canine predicted pseudogenes showed any deviations from the expected rate of evolution using a phylogenetic context that includes human and mouse gene sequences. Such variation in rate may reflect relaxation of constraints in the dog lineage. The deviation between dog predicted pseudogenes with multiple mutations and the human and mouse lineages differs clearly (<it>d</it><sub>N</sub>/<it>d</it><sub>S </sub>= 0.41 for dog, 0.19 for mouse and 0.26 for human; Kruskal-Wallis test: <it>P </it>= 1.04e - 2) while no significant deviation (<it>P </it>= 0.36) was observed for the set of pseudogenes with one mutation (Table <tblr tid="T2">2</tblr>). We therefore retained the 21 pseudogene predictions with both the higher <it>d</it><sub>N</sub>/<it>d</it><sub>S </sub>value as characterized by pairwise and phylogenetic approaches and high mutation rate as gene loss candidates.</p>
            <tbl id="T2">
               <title>
                  <p>Table 2</p>
               </title>
               <caption>
                  <p>Evolutionary constraints (<it>dS </it>and <it>dN/d</it>S) for 1:1:1 orthologs among human, mouse and dog</p>
               </caption>
               <tblbdy cols="4">
                  <r>
                     <c ca="left">
                        <p><it>dN/dS </it>median</p>
                     </c>
                     <c ca="left">
                        <p>Predicted genes</p>
                     </c>
                     <c ca="left">
                        <p>Pseudogenes with several mutations</p>
                     </c>
                     <c ca="left">
                        <p>Pseudogenes with one mutation</p>
                     </c>
                  </r>
                  <r>
                     <c cspan="4">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Human</p>
                     </c>
                     <c ca="left">
                        <p>0.21</p>
                     </c>
                     <c ca="left">
                        <p>0.26</p>
                     </c>
                     <c ca="left">
                        <p>0.19</p>
                     </c>
                  </r>
                  <r>
                     <c cspan="4">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Dog</p>
                     </c>
                     <c ca="left">
                        <p>0.17</p>
                     </c>
                     <c ca="left">
                        <p>0.41</p>
                     </c>
                     <c ca="left">
                        <p>0.16</p>
                     </c>
                  </r>
                  <r>
                     <c cspan="4">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Mouse</p>
                     </c>
                     <c ca="left">
                        <p>0.15</p>
                     </c>
                     <c ca="left">
                        <p>0.19</p>
                     </c>
                     <c ca="left">
                        <p>0.13</p>
                     </c>
                  </r>
               </tblbdy>
            </tbl>
         </sec>
         <sec>
            <st>
               <p>Gene loss identification</p>
            </st>
            <p>In addition to pseudogene identification, 11 gene predictions could not be detected with sufficient protein identity (average = 21.7%), both in the targeted genomic region (COIL) and in the whole canine sequence. For these predictions with no readily identifiable counterparts in dog, we searched for sequence alignment with canine expressed sequences (Unigene april 08) to address the underlying assumption that genes are not transcribed when placed in the context of highly degraded sequence. We identified sequence alignment in only three cases. These results showed that the gene predictions with poor sequence similarity were largely undetected as expressed sequences in comparison to gene predictions with intact ORF.</p>
            <p>For the last subset of 49 canine genes that remained undetected in this study, we address the possibility that gene predictions could have been prevented because of a gap in the canine sequence assembly. We searched for gap content in the COILs that lack canine orthologous genes. For 12 COILs, the gap content was found to account for >10% of the total size of the COIL, seven-fold more than a random expectation set (n = 1000, gap = 1.32%) and manual inspection of sequence content resulted in identifying multiple sequence gaps. The 12 missing genes in those short targeted regions were therefore not retained in further analysis. Based on these results, a total of 37 undetected genes was considered and merged with the 11 gene predictions that could not be detected with sufficient protein identity and the 21 pseudogenes into a single set (n = 69) of gene loss candidates for further analyses (Figure <figr fid="F2">2</figr> and Additional data file <supplr sid="S5">5</supplr>).</p>
            <suppl id="S5">
               <title>
                  <p>Additional file 5</p>
               </title>
               <text>
                  <p><b>List of the 69 candidate gene losses.</b> This table lists the gene losses using the human gene identifiers from Ensembl.</p>
               </text>
               <file name="1471-2164-10-62-S5.pdf">
                  <p>Click here for file</p>
               </file>
            </suppl>
         </sec>
         <sec>
            <st>
               <p>Evolutionary scenarios of the canine gene losses</p>
            </st>
            <p>Do we detect losses of genes that occur specifically in the dog or do such losses occur in other mammalian lineages as well? If so, do such losses correspond to the time the dog branch diverged from the Euarchontoglires (rodent/primate) lineage? One way to analyze these possibilities is to determine their phyletic pattern using ten species from chicken to human and to define the amount of time between gene origin and present. The timing of genes origin was defined by searching for 1:1 orthologs between human and nine species. In addition to human, chimp, mouse and rat genome sequence assemblies, we used scaffold assemblies of elephant, tenrec and armadillo from the Afrotheria and Xenarthra superorder and two non-placental genome assemblies of opossum and platypus. We also included the chicken sequence to infer gene origins that occurred as long as 310 million years ago (MYA) (Figure <figr fid="F5">5</figr>).</p>
            <fig id="F5">
               <title>
                  <p>Figure 5</p>
               </title>
               <caption>
                  <p>Gene origin timing</p>
               </caption>
               <text>
                  <p><b>Gene origin timing</b>. Timing of gene origin is assessed by determining the one-to-one orthologs between human and nine species listed on the left side of the figure. The species belong to Euarchontoglire (Primates and rodents), Xenarthra (Armadillo), Afrotheria (elephant and tenrec), Marsupial and Monotreme (opossum and platypus). Time of species divergence from the lineage leading to human is shown in MYA (million years ago). Filled squares represent the presence of the ortholog in the species. Numbers at the bottom of the figure denote the number of genes that display the presence/absence pattern across species.</p>
               </text>
               <graphic file="1471-2164-10-62-5"/>
            </fig>
            <p>Orthologous genes were detected between human and all species (except dog) for 11 genes. Therefore, they have an origin that occurred before the separation of the mammals and birds lineages and have been functional for 310 million years (My). In addition, 17 genes were identified in all species of the opossum/platypus, elephant-tenrec-armadillo and Euarchontoglires branches, a period of 170 My, 17 in all species of the elephant-tenrec-armadillo and Euarchontoglires branches (100 My), and 10 in Euarchontoglires only (87 My) (Figure <figr fid="F5">5</figr>) <abbrgrp><abbr bid="B45">45</abbr></abbrgrp>.</p>
            <p>Overall, 28 canine gene losses could be characterized as being functional in other species for more than 170 My and 10 genes were not detected before 87 My and therefore specifically arose in rodent and primate lineages. For these genes, postulating that they arose through duplication events of a parental gene, we searched for paralogs among all human genes. For seven genes (<it>ZNF426, WFDC12, ZIK1</it>, <it>HLA-SX-alpha</it>, <it>PNMA5</it>, <it>PNMA3, ZNF251</it>) we identified at least one paralog (sequence identity >30%) in the close vicinity of the parental gene (mean of 71 kb; range: 16&#8211;128 kb).</p>
            <p>We further used the Ensembl reconciliation tree method <abbrgrp><abbr bid="B46">46</abbr></abbrgrp> for checking possible duplication events specific of the primates and rodents lineages. Indeed, assuming that all homologous genes are known, the reconciliation of the gene tree with the species tree allows to distinguish duplication from speciation events and therefore orthologous from paralogous genes. Five genes (<it>ZNF426, ZIK1, HLA-SX-alpha, PNMA5, PNMA3</it>) have in-paralogs in the reference species suggesting a pattern of duplication event (Additional data file <supplr sid="S6">6</supplr>).</p>
            <suppl id="S6">
               <title>
                  <p>Additional file 6</p>
               </title>
               <text>
                  <p><b>Gene/species tree reconcilation.</b> These data provide the gene/species tree reconcilation that show the possible duplication events specific of the primates and rodents lineages.</p>
               </text>
               <file name="1471-2164-10-62-S6.pdf">
                  <p>Click here for file</p>
               </file>
            </suppl>
            <p>These results suggest that tandem duplication events have occurred and lead to specific in-paralogs in the branch leading to human species. Another contribution of this analysis is that it permits identification of 10 losses that occur in several lineages indicating multiple and independent gene loss events <abbrgrp><abbr bid="B47">47</abbr></abbrgrp>.</p>
         </sec>
         <sec>
            <st>
               <p>Functional characteristics of gene losses</p>
            </st>
            <p>For the 28 canine-specific gene losses that have been functional for more than 170 My, we determined the functional annotation of the human genes using WebGestalt, a Web-based gene set analysis toolkit <abbrgrp><abbr bid="B48">48</abbr></abbrgrp>. The classification using the GOTree sub-module includes seven genes that belong to the biological process of response to stimulus with <it>PROZ</it>, a vitamin K-dependent protein Z precursor involved in blood coagulation pathway and <it>SERPINA10 </it>a protein Z-dependant protease inhibitor that regulates factor Xa involved in blood coagulation. Moreover, it includes five genes involved in response to stimulus pathways that play a role in sensory function such as <it>UGT2A </it>which encodes an enzyme with transferase activity that may catalyze inactivation and facilitate elimination of odorants, <it>OR1Q1</it>, <it>OR1B1</it>, <it>ORN1 </it>which arethree olfactory receptors, and Noggin, a secreted polypeptide encoded by the <it>NOG </it>gene that appears to have pleiotropic effect, both early in development as well as in sensory perception of sound. Other genes of interest belong to families with at least six members such as <it>TBX22 </it>a transcription factor involved in the regulation of various aspects of embryonic development, in particular cell type specification and regulation of morphogenetic movements <abbrgrp><abbr bid="B49">49</abbr></abbrgrp>, and <it>MS4A3 </it>which is a subset of the superfamily of tetraspan transmembrane protein encoding genes. Several genes were classified with function involved in DNA repair, apoptosis and tumor formation such as <it>BOK </it>which encodes a Bcl-2 related protein and <it>PDE1B </it>which may play a role in apoptosis. To address the question of which tissue might be significantly affected by gene loss, we determined a gene-expression profile characterization per tissue based on the occurrence frequency of the ESTs profiles of human genes corresponding to the gene lost set using the tissue expression profile sub-module of WebGestalt. Testis-expressed gene expression profiles showed a significant over or under representation and, to a lesser extent, expression profiles related to placenta and kidney tissues did as well (Additional data file <supplr sid="S7">7</supplr>).</p>
            <suppl id="S7">
               <title>
                  <p>Additional file 7</p>
               </title>
               <text>
                  <p><b>Gene-expression profile characterization per tissue with significant over and under representation.</b> The data provided show gene-expression profile characterization per tissue.</p>
               </text>
               <file name="1471-2164-10-62-S7.pdf">
                  <p>Click here for file</p>
               </file>
            </suppl>
         </sec>
      </sec>
      <sec>
         <st>
            <p>Discussion</p>
         </st>
         <p>This study describes a multispecies comparative genomics approach that provides a methodology for improving genes prediction and detecting putative gene losses. When coupled to a strategy of phyletic pattern analysis, the approach allows differentiation of species-specific gene loss from multiple independent gene loss. Here, focusing on genes that were not detected in the whole-genome assembly of the dog but annotated in four rodents and primates species, we identified 232 new gene and we predicted 69 canine gene loss candidates of which 21 are identified as pseudogenes,</p>
         <sec>
            <st>
               <p>Targeted gene prediction: strengths and limitations</p>
            </st>
            <p>A basic application of gene order-based approaches is the capacity to detect short conserved genomic context based on robust orthologous gene pair annotation. Therefore, results are limited by the source of gene annotation. In this study, we used the Ensembl annotation because of its good gene prediction coverage of the four species used as reference genomes. Since annotation of mammalian genome is a continuous process, our gene order-based approach may be improved over the course of time.</p>
            <p>The use of short orthologous genomic intervals filtering has been well documented <abbrgrp><abbr bid="B28">28</abbr></abbrgrp>. First, it reduces the cost of detecting false-positives as it filters out paralogs, with the exception of those caused by tandem gene duplication, and alignments to processed pseudogenes. Second, it allows a balance between sequence alignment sensitivity versus accuracy <abbrgrp><abbr bid="B50">50</abbr></abbrgrp>. Alternatively, for more divergent sequences, alignment criteria may be relaxed in short pre-defined space where the background noise is significantly reduced compared to a genome scale search.</p>
            <p>In our analysis, predictions may not provide an exhaustive list of gene predictions as inaccuracies may be generated by sequence artifacts that typically exist in draft sequence assemblies. Another issue related to prediction accuracy is the unexpected and unknown level of highly divergence at the nucleotide level. While scenarios of functional sequences with different evolutionary rate in different species exist <abbrgrp><abbr bid="B51">51</abbr></abbrgrp>, we postulated that using protein coding genes with a comparable evolutionary rate amongst four reference species reduces the possibility that a gene evolves independently in the dog species.</p>
         </sec>
         <sec>
            <st>
               <p>Computational prediction of gene loss</p>
            </st>
            <p>A corollary to targeted gene prediction is that the absence of prediction strongly predicts gene relocation to a different region or chromosome or a gene loss event. Gene losses arise through retrotransposition or segmental or tandem duplications events followed by inactivation of one copy, or by degenerative mutations. We used a computational analysis to identify genes lost as pseudogenes based on various detrimental sequence mutations such as in-frame stop codons and frameshifts causing or resulting from loss of function. In this study, pseudogenes were separated in two groups, with the group of pseudogenes with one mutation (showing a low mutation rate) and the second group with an elevated mutation rate (>4 mutations, on average). Pseudogene predictions with one mutation could be overstated due to sequence artifacts that exist in the assembly. Indeed, stop codons and frameshifts are accommodated by algorithm like GeneWise. Other programs specifically designed for aligning pseudogenes such as GeneMapper <abbrgrp><abbr bid="B52">52</abbr></abbrgrp> may be useful for addressing this problem. Another hypothesis is that pseudogene predictions have existed as pseudogenes (i.e. inactivated) for different amounts of time in the carnivore lineage. The formation of pseudogenes present in the canine genome could have been initiated by different or multiple events rather than have resulted from a continuous process over the course of time. Pseudogene characterization through the ratio of silent to replacement nucleotide substitution rates (<it>dN/dS</it>) may be a good indicator of changes in selective constraint that tend to be recent <abbrgrp><abbr bid="B53">53</abbr></abbrgrp>. It is clear from our analysis that the <it>dN/dS </it>approach is useful to assess the evolutionary constraints that occur on nucleotide substitution. However, inferences of selection need to be treated with extreme caution.</p>
         </sec>
         <sec>
            <st>
               <p>Functional impact of gene loss</p>
            </st>
            <p>We identified 28 gene losses that have been functional for more than 170 million years, a time period that extends from platypus to human (Figure <figr fid="F5">5</figr>). Losses of gene in a given species can be considered an adaptive event that may confer selective advantages to an organism <abbrgrp><abbr bid="B54">54</abbr></abbrgrp>. Similarly to neutral losses, adaptive losses occurring ~95 MYA (for lineage leading to canid) are expected to leave genomic signatures with ORF-disrupting sequence mutations accumulation due to sequence degeneration. Here, the losses identified are based on ORF-disrupting sequence mutations, absence of EST validation and absence of significant similarity at the protein level. Although highly speculative, one hypothesis is that species-specific gene loss may confer a selective advantage in dog. Among the gene losses we identified were <it>PROZ</it>, a vitamin K-dependent protein Z precursor gene involved in response to stimulus that plays a role in blood coagulation. Mammalian blood coagulation is initiated and regulated by a complex network of interactions involved in normal hemostasis. Interestingly, Lindberg <it>et al</it>. describes a decrease of the expression of heme and globin related genes that correlate with the tameness trait in silver foxes suggesting that differences in behavior have a genetic basis <abbrgrp><abbr bid="B55">55</abbr></abbrgrp>. A second hypothesis, is that gene loss may be a direct reflection of the loss of redundancy, where functionally overlapping genes cover for the loss of function as for genes involved in sensory functions <abbrgrp><abbr bid="B56">56</abbr><abbr bid="B57">57</abbr></abbrgrp>.</p>
         </sec>
      </sec>
      <sec>
         <st>
            <p>Conclusion</p>
         </st>
         <p>Among mammals, one-to-one orthologous correspondence can be defined for a large part of gene repertoires. Complex homologous relationships such as one-to-zero and many-to-many ones remain to be deciphered within gene families, for genes with divergent sequence as well as for species-specific genes that have emerged or have been lost through evolution. The combination of multispecies comparative genomics with in-depth gene prediction, accurate consideration of phylogenetic relationship, and timing of gene origin events can predict both gene structure and gene losses in newly sequenced genomes. This, in turn, enhances the integrity of reference genomes. The end result is a higher quality product for all sequenced genomes, regardless of the depth of sequence. We aim to see this approach applied to many other model organisms, thus enhancing the utility of the new sequencing resources throughout the comparative genomics community.</p>
      </sec>
      <sec>
         <st>
            <p>Methods</p>
         </st>
         <sec>
            <st>
               <p>Gene datasets</p>
            </st>
            <p>Biomart <abbrgrp><abbr bid="B58">58</abbr></abbrgrp> version 0.5 (Ensembl v.42) was used to collect orthologous protein-coding genes from the five genomes of interest: human (NCBI 36), chimp (Chimp 2.1), mouse (NCBI m36), rat (RGSC 3.4) and dog (CanFam 2.0). Ensembl Gene Id, orthologous relationships, locations in base pair for each species were downloaded and deposited into a MySQL database (v.4.1.12). The set of 412 protein-coding genes not annotated on the dog genome assembly with a 1:1:1:1:0 Human:Chimp:Mouse:Rat:Dog match was then extracted from the MySQL database.</p>
         </sec>
         <sec>
            <st>
               <p>Synteny maps</p>
            </st>
            <p>We used the program AutoGRAPH <abbrgrp><abbr bid="B13">13</abbr></abbrgrp> to construct pairwise synteny maps between reference genomes and tested genome. AutoGRAPH has been designed to construct synteny maps using genomic coordinates of ortholog pairs. The program transposes genomic coordinates into sequence of ordinal numbers and positions genes on an ordinal scale in relation to others on their respective chromosomes. Conserved segments ordered (CSO) can then be identified with respect to the ranking order. We only considered CSO containing a minimum of three genes. AutoGRAPH inferred the collinearity rate within CSO corresponding to the longest increasing gene order sequence between the two species divided by the total number of orthologs. We discarded CSO that had a collinearity rate less than 0.5. All synteny maps (n = 88) built in this work are presented in Additional data file <supplr sid="S1">1</supplr> and can be downloaded.</p>
         </sec>
         <sec>
            <st>
               <p>Gene structure prediction</p>
            </st>
            <p>The GeneWise program <abbrgrp><abbr bid="B6">6</abbr></abbrgrp> (wise2-2-0) was used with default parameters to align each reference protein on the dog COIL forward and reverse strands (option -both) sequence. Predictions were post processed to pick up the highest genewise prediction, to compute sequence identity/similarity against reference proteins and to analyze splice sites conservation. Only predictions exhibiting at least 40% identity with reference proteins were retained. GeneWise was also used with the Hidden Markov model that uses HMM profiles generated with the HMMER package <abbrgrp><abbr bid="B59">59</abbr></abbrgrp>. HMM-based prediction considers exons, introns and UTR regions as different states of gene structure that occupy subsequences of a sequence. A gene structure can be considered as an ordered set of state/sub-sequence pairs. A HMM-based prediction is considered as a predicted gene structure if probability of generating a gene structure is maximal over all possible states. Dynamic programming method for finding an optimal parse, or the best sequence of states has <abbrgrp><abbr bid="B10">10</abbr></abbrgrp> been computed with the HMMER package.</p>
         </sec>
         <sec>
            <st>
               <p>Homology searches</p>
            </st>
            <p>Reference transcript sequences were collated from Ensembl (v.42) and aligned against the canine sequence assembly (CanFam2) with the program Exonerate v1.2 <abbrgrp><abbr bid="B35">35</abbr></abbrgrp>. Exonerate includes various models for aligning splice sites, combining speed and accuracy. We used the est2genome model, with a minimum perfect match of 18 bases to trigger alignments (dnawordlen 18). For each reference transcript, we retained the best five matching sequences.</p>
            <p>Canine proteins inferred from the gene predictions were aligned against all canine transcripts with Exonerate using the coding2coding model. Canine predicted proteins were aligned on canine dbEST (est.fa 05/19/07 from UCSC) and UNIGENE (April 2008) using Exonerate with the protein2genome model.</p>
            <p>The protein three-dimensional structure was available for 21 human genes. The sequences were retrieved via the Protein Data Bank. The amino sequences for the corresponding canine predictions were obtained from the genewise program prediction. Canine-human comparative modelling was determined using the SWISS-MODEL server <abbrgrp><abbr bid="B39">39</abbr></abbrgrp>. Amino acid sequences are aligned between the primary structure of the human and the canine sequence. The three-dimensional model is constructed through the process implemented in the SWISS-MODEL server.</p>
         </sec>
         <sec>
            <st>
               <p><it>DN</it>/<it>dS </it>analysis</p>
            </st>
            <p><it>DN</it>/<it>dS </it>analyses were conducted using the maximum-likelihood-based CODEML program (model = 0; PAML package) <abbrgrp><abbr bid="B60">60</abbr></abbrgrp>. Sequence alignments of the whole coding region of the human orthologous sequence with canine prediction were realized with clustalW program. <it>Ds </it>values were calculated from pairwise alignments using all transcripts. To filter for possible inconsistencies among orthologous trancripts, we selected the transcript with the smallest phylogenetic distance using the smallest <it>dS</it>. For each dataset, we calculated a threshold on <it>dS </it>which two fold the median <it>dS</it>; all <it>dS </it>larger than this threshold were not used for the <it>dN</it>/<it>dS </it>calculation. <it>DN</it>/<it>dS </it>values of the benchmark set were extracted from Ensembl. <it>DN</it>/<it>dS </it>ratio in the phylogenetics context were calculated using CODEML program using the branch model set as model = 1 and run mode = 0. Sequence alignments of the whole coding region of the human, mouse and canine prediction orthologous sequence were realized with clustalW program</p>
         </sec>
         <sec>
            <st>
               <p>Gene Ontology annotation</p>
            </st>
            <p>The Gene Ontology Tree Machine (GOTM) and WebGestalt programs <abbrgrp><abbr bid="B31">31</abbr><abbr bid="B48">48</abbr></abbrgrp> were used to retrieve GO term associated with ensembl gene ID. A hypergeometric test computes the statistical significance of overrepresentations of GO term compared to a reference complete list of genes. Only GO terms that were significantly over-represented (<it>P </it>&lt; 1.0e - 3) were considered.</p>
         </sec>
         <sec>
            <st>
               <p>Determining gene origin</p>
            </st>
            <p>For each of the 69 candidate gene losses, one-to-one orthologous gene was searched between human and nine species using the complete collection of orthologous protein-coding genes (Ensembl). Genome sequence assemblies were used for human, chimp, mouse, rat, monodelphis, platypus and chicken and scaffold assemblies for elephant, tenrec and armadillo. Timing of gene origin was inferred by determining the longest serie of one-to-one orthologs between the human and each of the nine species.</p>
         </sec>
         <sec>
            <st>
               <p>P value calculation</p>
            </st>
            <p>We used the R package (R Development Core Team 2006. R: A language and environment for statistical computing. <url>http://www.R-project.org</url>) to test the statistical significance in comparing distinct distributions at each step of the method (Mann-Whitney, Kruskal-Wallis and Student's test).</p>
         </sec>
      </sec>
      <sec>
         <st>
            <p>Abbreviations</p>
         </st>
         <p>ESTs: Expressed Sequence Tag; dbEST: database of EST; ORF: Open Reading Frame.</p>
      </sec>
      <sec>
         <st>
            <p>Authors' contributions</p>
         </st>
         <p>TD prepared the data, carried out the comparative data analysis and contributed to the writing of the manuscript, JT worked on gene prediction analysis, AV carried out dN/dS analysis, CA participated in study design, EAO provided feedback throughput, suggested various analysis and worked on all drafts of the paper, FG participated in the data interpretation, and contributed to the writing of the manuscript, CH conceived of the study, participated in the data analysis and interpretation, and contributed to the writing of the manuscript. All authors read and approved the final manuscript.</p>
      </sec>
   </bdy>
   <bm>
      <ack>
         <sec>
            <st>
               <p>Acknowledgements</p>
            </st>
            <p>We are grateful to Roderic Guigo and to the reviewers for providing useful suggestions and helpful comments. We thank the OUEST-genopole bioinformatics plate-form for technical help and assistance. We acknowledge for support the Centre National de la Recherche Scientifique (JT, AV, CA, FG and CH) and the Conseil R&#233;gional de Bretagne for supporting TD with a fellowship and the Intramural Program of the National Institutes of Health (EAO).</p>
         </sec>
      </ack>
      <refgrp>
         <bibl id="B1">
            <title>
               <p>Initial sequencing and comparative analysis of the mouse genome</p>
            </title>
            <aug>
               <au>
                  <snm>Waterston</snm>
                  <fnm>RH</fnm>
               </au>
               <au>
                  <snm>Lindblad-Toh</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Birney</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Rogers</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Abril</snm>
                  <fnm>JF</fnm>
               </au>
               <au>
                  <snm>Agarwal</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Agarwala</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Ainscough</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Alexandersson</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>An</snm>
                  <fnm>P</fnm>
               </au>
               <etal/>
            </aug>
            <source>Nature</source>
            <pubdate>2002</pubdate>
            <volume>420</volume>
            <issue>6915</issue>
            <fpage>520</fpage>
            <lpage>562</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">12466850</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B2">
            <title>
               <p>Genome sequence, comparative analysis and haplotype structure of the domestic dog</p>
            </title>
            <aug>
               <au>
                  <snm>Lindblad-Toh</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Wade</snm>
                  <fnm>CM</fnm>
               </au>
               <au>
                  <snm>Mikkelsen</snm>
                  <fnm>TS</fnm>
               </au>
               <au>
                  <snm>Karlsson</snm>
                  <fnm>EK</fnm>
               </au>
               <au>
                  <snm>Jaffe</snm>
                  <fnm>DB</fnm>
               </au>
               <au>
                  <snm>Kamal</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Clamp</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Chang</snm>
                  <fnm>JL</fnm>
               </au>
               <au>
                  <snm>Kulbokas</snm>
                  <fnm>EJ</fnm>
                  <suf>3rd</suf>
               </au>
               <au>
                  <snm>Zody</snm>
                  <fnm>MC</fnm>
               </au>
               <etal/>
            </aug>
            <source>Nature</source>
            <pubdate>2005</pubdate>
            <volume>438</volume>
            <issue>7069</issue>
            <fpage>803</fpage>
            <lpage>819</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">16341006</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B3">
            <title>
               <p>Initial sequencing and analysis of the human genome</p>
            </title>
            <aug>
               <au>
                  <snm>Lander</snm>
                  <fnm>ES</fnm>
               </au>
               <au>
                  <snm>Linton</snm>
                  <fnm>LM</fnm>
               </au>
               <au>
                  <snm>Birren</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Nusbaum</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Zody</snm>
                  <fnm>MC</fnm>
               </au>
               <au>
                  <snm>Baldwin</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Devon</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Dewar</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Doyle</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>FitzHugh</snm>
                  <fnm>W</fnm>
               </au>
               <etal/>
            </aug>
            <source>Nature</source>
            <pubdate>2001</pubdate>
            <volume>409</volume>
            <issue>6822</issue>
            <fpage>860</fpage>
            <lpage>921</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">11237011</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B4">
            <title>
               <p>Genome sequence of the Brown Norway rat yields insights into mammalian evolution</p>
            </title>
            <aug>
               <au>
                  <snm>Gibbs</snm>
                  <fnm>RA</fnm>
               </au>
               <au>
                  <snm>Weinstock</snm>
                  <fnm>GM</fnm>
               </au>
               <au>
                  <snm>Metzker</snm>
                  <fnm>ML</fnm>
               </au>
               <au>
                  <snm>Muzny</snm>
                  <fnm>DM</fnm>
               </au>
               <au>
                  <snm>Sodergren</snm>
                  <fnm>EJ</fnm>
               </au>
               <au>
                  <snm>Scherer</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Scott</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Steffen</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Worley</snm>
                  <fnm>KC</fnm>
               </au>
               <au>
                  <snm>Burch</snm>
                  <fnm>PE</fnm>
               </au>
               <etal/>
            </aug>
            <source>Nature</source>
            <pubdate>2004</pubdate>
            <volume>428</volume>
            <issue>6982</issue>
            <fpage>493</fpage>
            <lpage>521</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">15057822</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B5">
            <title>
               <p>Steady progress and recent breakthroughs in the accuracy of automated genome annotation</p>
            </title>
            <aug>
               <au>
                  <snm>Brent</snm>
                  <fnm>MR</fnm>
               </au>
            </aug>
            <source>Nat Rev Genet</source>
            <pubdate>2008</pubdate>
            <volume>9</volume>
            <issue>1</issue>
            <fpage>62</fpage>
            <lpage>73</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">18087260</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B6">
            <title>
               <p>GeneWise and Genomewise</p>
            </title>
            <aug>
               <au>
                  <snm>Birney</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Clamp</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Durbin</snm>
                  <fnm>R</fnm>
               </au>
            </aug>
            <source>Genome Res</source>
            <pubdate>2004</pubdate>
            <volume>14</volume>
            <issue>5</issue>
            <fpage>988</fpage>
            <lpage>995</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">479130</pubid>
                  <pubid idtype="pmpid" link="fulltext">15123596</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B7">
            <title>
               <p>Integrating genomic homology into gene structure prediction</p>
            </title>
            <aug>
               <au>
                  <snm>Korf</snm>
                  <fnm>I</fnm>
               </au>
               <au>
                  <snm>Flicek</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Duan</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Brent</snm>
                  <fnm>MR</fnm>
               </au>
            </aug>
            <source>Bioinformatics</source>
            <pubdate>2001</pubdate>
            <volume>17</volume>
            <issue>Suppl 1</issue>
            <fpage>S140</fpage>
            <lpage>148</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">11473003</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B8">
            <title>
               <p>Comparative gene prediction in human and mouse</p>
            </title>
            <aug>
               <au>
                  <snm>Parra</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Agarwal</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Abril</snm>
                  <fnm>JF</fnm>
               </au>
               <au>
                  <snm>Wiehe</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Fickett</snm>
                  <fnm>JW</fnm>
               </au>
               <au>
                  <snm>Guigo</snm>
                  <fnm>R</fnm>
               </au>
            </aug>
            <source>Genome Res</source>
            <pubdate>2003</pubdate>
            <volume>13</volume>
            <issue>1</issue>
            <fpage>108</fpage>
            <lpage>117</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">430976</pubid>
                  <pubid idtype="pmpid" link="fulltext">12529313</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B9">
            <title>
               <p>Tree pattern matching in phylogenetic trees: automatic search for orthologs or paralogs in homologous gene sequence databases</p>
            </title>
            <aug>
               <au>
                  <snm>Dufayard</snm>
                  <fnm>JF</fnm>
               </au>
               <au>
                  <snm>Duret</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Penel</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Gouy</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Rechenmann</snm>
                  <fnm>F</fnm>
               </au>
               <au>
                  <snm>Perriere</snm>
                  <fnm>G</fnm>
               </au>
            </aug>
            <source>Bioinformatics</source>
            <pubdate>2005</pubdate>
            <volume>21</volume>
            <issue>11</issue>
            <fpage>2596</fpage>
            <lpage>2603</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">15713731</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B10">
            <title>
               <p>Phylogenetic reconstruction of orthology, paralogy, and conserved synteny for dog and human</p>
            </title>
            <aug>
               <au>
                  <snm>Goodstadt</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Ponting</snm>
                  <fnm>CP</fnm>
               </au>
            </aug>
            <source>PLoS Comput Biol</source>
            <pubdate>2006</pubdate>
            <volume>2</volume>
            <issue>9</issue>
            <fpage>e133</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1584324</pubid>
                  <pubid idtype="pmpid" link="fulltext">17009864</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B11">
            <title>
               <p>Genome-wide identification of human functional DNA using a neutral indel model</p>
            </title>
            <aug>
               <au>
                  <snm>Lunter</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Ponting</snm>
                  <fnm>CP</fnm>
               </au>
               <au>
                  <snm>Hein</snm>
                  <fnm>J</fnm>
               </au>
            </aug>
            <source>PLoS Comput Biol</source>
            <pubdate>2006</pubdate>
            <volume>2</volume>
            <issue>1</issue>
            <fpage>e5</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1326222</pubid>
                  <pubid idtype="pmpid" link="fulltext">16410828</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B12">
            <title>
               <p>An RNA gene expressed during cortical development evolved rapidly in humans</p>
            </title>
            <aug>
               <au>
                  <snm>Pollard</snm>
                  <fnm>KS</fnm>
               </au>
               <au>
                  <snm>Salama</snm>
                  <fnm>SR</fnm>
               </au>
               <au>
                  <snm>Lambert</snm>
                  <fnm>N</fnm>
               </au>
               <au>
                  <snm>Lambot</snm>
                  <fnm>MA</fnm>
               </au>
               <au>
                  <snm>Coppens</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Pedersen</snm>
                  <fnm>JS</fnm>
               </au>
               <au>
                  <snm>Katzman</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>King</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Onodera</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Siepel</snm>
                  <fnm>A</fnm>
               </au>
               <etal/>
            </aug>
            <source>Nature</source>
            <pubdate>2006</pubdate>
            <volume>443</volume>
            <issue>7108</issue>
            <fpage>167</fpage>
            <lpage>172</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">16915236</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B13">
            <title>
               <p>AutoGRAPH: an interactive web server for automating and visualizing comparative genome maps</p>
            </title>
            <aug>
               <au>
                  <snm>Derrien</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Andre</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Galibert</snm>
                  <fnm>F</fnm>
               </au>
               <au>
                  <snm>Hitte</snm>
                  <fnm>C</fnm>
               </au>
            </aug>
            <source>Bioinformatics</source>
            <pubdate>2007</pubdate>
            <volume>23</volume>
            <issue>4</issue>
            <fpage>498</fpage>
            <lpage>499</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">17145741</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B14">
            <title>
               <p>The fragile breakage versus random breakage models of chromosome evolution</p>
            </title>
            <aug>
               <au>
                  <snm>Peng</snm>
                  <fnm>Q</fnm>
               </au>
               <au>
                  <snm>Pevzner</snm>
                  <fnm>PA</fnm>
               </au>
               <au>
                  <snm>Tesler</snm>
                  <fnm>G</fnm>
               </au>
            </aug>
            <source>PLoS Comput Biol</source>
            <pubdate>2006</pubdate>
            <volume>2</volume>
            <issue>2</issue>
            <fpage>e14</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1378107</pubid>
                  <pubid idtype="pmpid" link="fulltext">16501665</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B15">
            <title>
               <p>GRIMM: genome rearrangements web server</p>
            </title>
            <aug>
               <au>
                  <snm>Tesler</snm>
                  <fnm>G</fnm>
               </au>
            </aug>
            <source>Bioinformatics</source>
            <pubdate>2002</pubdate>
            <volume>18</volume>
            <issue>3</issue>
            <fpage>492</fpage>
            <lpage>493</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">11934753</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B16">
            <title>
               <p>Distinguishing protein-coding and noncoding genes in the human genome</p>
            </title>
            <aug>
               <au>
                  <snm>Clamp</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Fry</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Kamal</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Xie</snm>
                  <fnm>X</fnm>
               </au>
               <au>
                  <snm>Cuff</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Lin</snm>
                  <fnm>MF</fnm>
               </au>
               <au>
                  <snm>Kellis</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Lindblad-Toh</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Lander</snm>
                  <fnm>ES</fnm>
               </au>
            </aug>
            <source>Proc Natl Acad Sci USA</source>
            <pubdate>2007</pubdate>
            <volume>104</volume>
            <issue>49</issue>
            <fpage>19428</fpage>
            <lpage>19433</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">2148306</pubid>
                  <pubid idtype="pmpid" link="fulltext">18040051</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B17">
            <title>
               <p>Comparison of mouse and human genomes followed by experimental verification yields an estimated 1,019 additional genes</p>
            </title>
            <aug>
               <au>
                  <snm>Guigo</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Dermitzakis</snm>
                  <fnm>ET</fnm>
               </au>
               <au>
                  <snm>Agarwal</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Ponting</snm>
                  <fnm>CP</fnm>
               </au>
               <au>
                  <snm>Parra</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Reymond</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Abril</snm>
                  <fnm>JF</fnm>
               </au>
               <au>
                  <snm>Keibler</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Lyle</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Ucla</snm>
                  <fnm>C</fnm>
               </au>
               <etal/>
            </aug>
            <source>Proc Natl Acad Sci USA</source>
            <pubdate>2003</pubdate>
            <volume>100</volume>
            <issue>3</issue>
            <fpage>1140</fpage>
            <lpage>1145</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">298740</pubid>
                  <pubid idtype="pmpid" link="fulltext">12552088</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B18">
            <title>
               <p>Targeted discovery of novel human exons by comparative genomics</p>
            </title>
            <aug>
               <au>
                  <snm>Siepel</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Diekhans</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Brejova</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Langton</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Stevens</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Comstock</snm>
                  <fnm>CL</fnm>
               </au>
               <au>
                  <snm>Davis</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Ewing</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Oommen</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Lau</snm>
                  <fnm>C</fnm>
               </au>
               <etal/>
            </aug>
            <source>Genome Res</source>
            <pubdate>2007</pubdate>
            <volume>17</volume>
            <issue>12</issue>
            <fpage>1763</fpage>
            <lpage>1773</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">2099585</pubid>
                  <pubid idtype="pmpid" link="fulltext">17989246</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B19">
            <title>
               <p>Evolution of genes and genomes on the Drosophila phylogeny</p>
            </title>
            <aug>
               <au>
                  <snm>Clark</snm>
                  <fnm>AG</fnm>
               </au>
               <au>
                  <snm>Eisen</snm>
                  <fnm>MB</fnm>
               </au>
               <au>
                  <snm>Smith</snm>
                  <fnm>DR</fnm>
               </au>
               <au>
                  <snm>Bergman</snm>
                  <fnm>CM</fnm>
               </au>
               <au>
                  <snm>Oliver</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Markow</snm>
                  <fnm>TA</fnm>
               </au>
               <au>
                  <snm>Kaufman</snm>
                  <fnm>TC</fnm>
               </au>
               <au>
                  <snm>Kellis</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Gelbart</snm>
                  <fnm>W</fnm>
               </au>
               <au>
                  <snm>Iyer</snm>
                  <fnm>VN</fnm>
               </au>
               <etal/>
            </aug>
            <source>Nature</source>
            <pubdate>2007</pubdate>
            <volume>450</volume>
            <issue>7167</issue>
            <fpage>203</fpage>
            <lpage>218</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">17994087</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B20">
            <title>
               <p>Evolutionary rate analyses of orthologs and paralogs from 12 Drosophila genomes</p>
            </title>
            <aug>
               <au>
                  <snm>Heger</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Ponting</snm>
                  <fnm>CP</fnm>
               </au>
            </aug>
            <source>Genome Res</source>
            <pubdate>2007</pubdate>
            <volume>17</volume>
            <issue>12</issue>
            <fpage>1837</fpage>
            <lpage>1849</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">2099592</pubid>
                  <pubid idtype="pmpid" link="fulltext">17989258</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B21">
            <title>
               <p>Revisiting the protein-coding gene catalog of Drosophila melanogaster using 12 fly genomes</p>
            </title>
            <aug>
               <au>
                  <snm>Lin</snm>
                  <fnm>MF</fnm>
               </au>
               <au>
                  <snm>Carlson</snm>
                  <fnm>JW</fnm>
               </au>
               <au>
                  <snm>Crosby</snm>
                  <fnm>MA</fnm>
               </au>
               <au>
                  <snm>Matthews</snm>
                  <fnm>BB</fnm>
               </au>
               <au>
                  <snm>Yu</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Park</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Wan</snm>
                  <fnm>KH</fnm>
               </au>
               <au>
                  <snm>Schroeder</snm>
                  <fnm>AJ</fnm>
               </au>
               <au>
                  <snm>Gramates</snm>
                  <fnm>LS</fnm>
               </au>
               <au>
                  <snm>St Pierre</snm>
                  <fnm>SE</fnm>
               </au>
               <etal/>
            </aug>
            <source>Genome Res</source>
            <pubdate>2007</pubdate>
            <volume>17</volume>
            <issue>12</issue>
            <fpage>1823</fpage>
            <lpage>1836</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">2099591</pubid>
                  <pubid idtype="pmpid" link="fulltext">17989253</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B22">
            <title>
               <p>InParanoid 6: eukaryotic ortholog clusters with inparalogs</p>
            </title>
            <aug>
               <au>
                  <snm>Berglund</snm>
                  <fnm>AC</fnm>
               </au>
               <au>
                  <snm>Sjolund</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Ostlund</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Sonnhammer</snm>
                  <fnm>EL</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>2008</pubdate>
            <issue>36 Database</issue>
            <fpage>D263</fpage>
            <lpage>266</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">2238924</pubid>
                  <pubid idtype="pmpid" link="fulltext">18055500</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B23">
            <title>
               <p>Orthology, paralogy and proposed classification for paralog subtypes</p>
            </title>
            <aug>
               <au>
                  <snm>Sonnhammer</snm>
                  <fnm>EL</fnm>
               </au>
               <au>
                  <snm>Koonin</snm>
                  <fnm>EV</fnm>
               </au>
            </aug>
            <source>Trends Genet</source>
            <pubdate>2002</pubdate>
            <volume>18</volume>
            <issue>12</issue>
            <fpage>619</fpage>
            <lpage>620</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">12446146</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B24">
            <title>
               <p>Dynamics of mammalian chromosome evolution inferred from multispecies comparative maps</p>
            </title>
            <aug>
               <au>
                  <snm>Murphy</snm>
                  <fnm>WJ</fnm>
               </au>
               <au>
                  <snm>Larkin</snm>
                  <fnm>DM</fnm>
               </au>
               <au>
                  <snm>Everts-van der Wind</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Bourque</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Tesler</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Auvil</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Beever</snm>
                  <fnm>JE</fnm>
               </au>
               <au>
                  <snm>Chowdhary</snm>
                  <fnm>BP</fnm>
               </au>
               <au>
                  <snm>Galibert</snm>
                  <fnm>F</fnm>
               </au>
               <au>
                  <snm>Gatzke</snm>
                  <fnm>L</fnm>
               </au>
               <etal/>
            </aug>
            <source>Science</source>
            <pubdate>2005</pubdate>
            <volume>309</volume>
            <issue>5734</issue>
            <fpage>613</fpage>
            <lpage>617</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">16040707</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B25">
            <title>
               <p>The promise of comparative genomics in mammals</p>
            </title>
            <aug>
               <au>
                  <snm>O'Brien</snm>
                  <fnm>SJ</fnm>
               </au>
               <au>
                  <snm>Menotti-Raymond</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Murphy</snm>
                  <fnm>WJ</fnm>
               </au>
               <au>
                  <snm>Nash</snm>
                  <fnm>WG</fnm>
               </au>
               <au>
                  <snm>Wienberg</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Stanyon</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Copeland</snm>
                  <fnm>NG</fnm>
               </au>
               <au>
                  <snm>Jenkins</snm>
                  <fnm>NA</fnm>
               </au>
               <au>
                  <snm>Womack</snm>
                  <fnm>JE</fnm>
               </au>
               <au>
                  <snm>Marshall Graves</snm>
                  <fnm>JA</fnm>
               </au>
            </aug>
            <source>Science</source>
            <pubdate>1999</pubdate>
            <volume>286</volume>
            <issue>5439</issue>
            <fpage>458</fpage>
            <lpage>462</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">10521336</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B26">
            <title>
               <p>The evolutionary dynamics of eukaryotic gene order</p>
            </title>
            <aug>
               <au>
                  <snm>Hurst</snm>
                  <fnm>LD</fnm>
               </au>
               <au>
                  <snm>Pal</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Lercher</snm>
                  <fnm>MJ</fnm>
               </au>
            </aug>
            <source>Nat Rev Genet</source>
            <pubdate>2004</pubdate>
            <volume>5</volume>
            <issue>4</issue>
            <fpage>299</fpage>
            <lpage>310</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">15131653</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B27">
            <title>
               <p>Genome-scale analysis of positionally relocated genes</p>
            </title>
            <aug>
               <au>
                  <snm>Bhutkar</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Russo</snm>
                  <fnm>SM</fnm>
               </au>
               <au>
                  <snm>Smith</snm>
                  <fnm>TF</fnm>
               </au>
               <au>
                  <snm>Gelbart</snm>
                  <fnm>WM</fnm>
               </au>
            </aug>
            <source>Genome Res</source>
            <pubdate>2007</pubdate>
            <volume>17</volume>
            <issue>12</issue>
            <fpage>1880</fpage>
            <lpage>1887</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">2099595</pubid>
                  <pubid idtype="pmpid" link="fulltext">17989252</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B28">
            <title>
               <p>Using native and syntenically mapped cDNA alignments to improve de novo gene finding</p>
            </title>
            <aug>
               <au>
                  <snm>Stanke</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Diekhans</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Baertsch</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Haussler</snm>
                  <fnm>D</fnm>
               </au>
            </aug>
            <source>Bioinformatics</source>
            <pubdate>2008</pubdate>
            <volume>24</volume>
            <issue>5</issue>
            <fpage>637</fpage>
            <lpage>644</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">18218656</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B29">
            <title>
               <p>Iterative gene prediction and pseudogene removal improves genome annotation</p>
            </title>
            <aug>
               <au>
                  <snm>van Baren</snm>
                  <fnm>MJ</fnm>
               </au>
               <au>
                  <snm>Brent</snm>
                  <fnm>MR</fnm>
               </au>
            </aug>
            <source>Genome Res</source>
            <pubdate>2006</pubdate>
            <volume>16</volume>
            <issue>5</issue>
            <fpage>678</fpage>
            <lpage>685</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1457044</pubid>
                  <pubid idtype="pmpid" link="fulltext">16651666</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B30">
            <title>
               <p>Ensembl 2008</p>
            </title>
            <aug>
               <au>
                  <snm>Flicek</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Aken</snm>
                  <fnm>BL</fnm>
               </au>
               <au>
                  <snm>Beal</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Ballester</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Caccamo</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Chen</snm>
                  <fnm>Y</fnm>
               </au>
               <au>
                  <snm>Clarke</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Coates</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Cunningham</snm>
                  <fnm>F</fnm>
               </au>
               <au>
                  <snm>Cutts</snm>
                  <fnm>T</fnm>
               </au>
               <etal/>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>2008</pubdate>
            <issue>36 Database</issue>
            <fpage>D707</fpage>
            <lpage>714</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">2238821</pubid>
                  <pubid idtype="pmpid" link="fulltext">18000006</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B31">
            <title>
               <p>GOTree Machine (GOTM): a web-based platform for interpreting sets of interesting genes using Gene Ontology hierarchies</p>
            </title>
            <aug>
               <au>
                  <snm>Zhang</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Schmoyer</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Kirov</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Snoddy</snm>
                  <fnm>J</fnm>
               </au>
            </aug>
            <source>BMC Bioinformatics</source>
            <pubdate>2004</pubdate>
            <volume>5</volume>
            <fpage>16</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">373441</pubid>
                  <pubid idtype="pmpid" link="fulltext">14975175</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B32">
            <title>
               <p>An analysis of the gene complement of a marsupial, Monodelphis domestica: evolution of lineage-specific genes and giant chromosomes</p>
            </title>
            <aug>
               <au>
                  <snm>Goodstadt</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Heger</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Webber</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Ponting</snm>
                  <fnm>CP</fnm>
               </au>
            </aug>
            <source>Genome Res</source>
            <pubdate>2007</pubdate>
            <volume>17</volume>
            <issue>7</issue>
            <fpage>969</fpage>
            <lpage>981</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1899124</pubid>
                  <pubid idtype="pmpid" link="fulltext">17495010</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B33">
            <title>
               <p>Genome rearrangements in mammalian evolution: lessons from human and mouse genomes</p>
            </title>
            <aug>
               <au>
                  <snm>Pevzner</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Tesler</snm>
                  <fnm>G</fnm>
               </au>
            </aug>
            <source>Genome Res</source>
            <pubdate>2003</pubdate>
            <volume>13</volume>
            <issue>1</issue>
            <fpage>37</fpage>
            <lpage>45</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">430962</pubid>
                  <pubid idtype="pmpid" link="fulltext">12529304</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B34">
            <title>
               <p>Are there rearrangement hotspots in the human genome?</p>
            </title>
            <aug>
               <au>
                  <snm>Alekseyev</snm>
                  <fnm>MA</fnm>
               </au>
               <au>
                  <snm>Pevzner</snm>
                  <fnm>PA</fnm>
               </au>
            </aug>
            <source>PLoS Comput Biol</source>
            <pubdate>2007</pubdate>
            <volume>3</volume>
            <issue>11</issue>
            <fpage>e209</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">2065889</pubid>
                  <pubid idtype="pmpid" link="fulltext">17997591</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B35">
            <title>
               <p>Automated generation of heuristics for biological sequence comparison</p>
            </title>
            <aug>
               <au>
                  <snm>Slater</snm>
                  <fnm>GS</fnm>
               </au>
               <au>
                  <snm>Birney</snm>
                  <fnm>E</fnm>
               </au>
            </aug>
            <source>BMC Bioinformatics</source>
            <pubdate>2005</pubdate>
            <volume>6</volume>
            <fpage>31</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">553969</pubid>
                  <pubid idtype="pmpid" link="fulltext">15713233</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B36">
            <title>
               <p>Gene structure conservation aids similarity based gene prediction</p>
            </title>
            <aug>
               <au>
                  <snm>Meyer</snm>
                  <fnm>IM</fnm>
               </au>
               <au>
                  <snm>Durbin</snm>
                  <fnm>R</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>2004</pubdate>
            <volume>32</volume>
            <issue>2</issue>
            <fpage>776</fpage>
            <lpage>783</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">373336</pubid>
                  <pubid idtype="pmpid" link="fulltext">14764925</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B37">
            <title>
               <p>Database resources of the National Center for Biotechnology</p>
            </title>
            <aug>
               <au>
                  <snm>Wheeler</snm>
                  <fnm>DL</fnm>
               </au>
               <au>
                  <snm>Church</snm>
                  <fnm>DM</fnm>
               </au>
               <au>
                  <snm>Federhen</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Lash</snm>
                  <fnm>AE</fnm>
               </au>
               <au>
                  <snm>Madden</snm>
                  <fnm>TL</fnm>
               </au>
               <au>
                  <snm>Pontius</snm>
                  <fnm>JU</fnm>
               </au>
               <au>
                  <snm>Schuler</snm>
                  <fnm>GD</fnm>
               </au>
               <au>
                  <snm>Schriml</snm>
                  <fnm>LM</fnm>
               </au>
               <au>
                  <snm>Sequeira</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Tatusova</snm>
                  <fnm>TA</fnm>
               </au>
               <etal/>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>2003</pubdate>
            <volume>31</volume>
            <issue>1</issue>
            <fpage>28</fpage>
            <lpage>33</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">165480</pubid>
                  <pubid idtype="pmpid" link="fulltext">12519941</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B38">
            <title>
               <p>InterPro and InterProScan: tools for protein sequence classification and comparison</p>
            </title>
            <aug>
               <au>
                  <snm>Mulder</snm>
                  <fnm>N</fnm>
               </au>
               <au>
                  <snm>Apweiler</snm>
                  <fnm>R</fnm>
               </au>
            </aug>
            <source>Methods Mol Biol</source>
            <pubdate>2007</pubdate>
            <volume>396</volume>
            <fpage>59</fpage>
            <lpage>70</lpage>
            <xrefbib>
               <pubid idtype="pmpid">18025686</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B39">
            <title>
               <p>The SWISS-MODEL workspace: a web-based environment for protein structure homology modelling</p>
            </title>
            <aug>
               <au>
                  <snm>Arnold</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Bordoli</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Kopp</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Schwede</snm>
                  <fnm>T</fnm>
               </au>
            </aug>
            <source>Bioinformatics</source>
            <pubdate>2006</pubdate>
            <volume>22</volume>
            <issue>2</issue>
            <fpage>195</fpage>
            <lpage>201</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">16301204</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B40">
            <title>
               <p>Assessment of protein models with three-dimensional profiles</p>
            </title>
            <aug>
               <au>
                  <snm>Luthy</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Bowie</snm>
                  <fnm>JU</fnm>
               </au>
               <au>
                  <snm>Eisenberg</snm>
                  <fnm>D</fnm>
               </au>
            </aug>
            <source>Nature</source>
            <pubdate>1992</pubdate>
            <volume>356</volume>
            <issue>6364</issue>
            <fpage>83</fpage>
            <lpage>85</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">1538787</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B41">
            <title>
               <p>Nested genes in the human genome</p>
            </title>
            <aug>
               <au>
                  <snm>Yu</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Ma</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Xu</snm>
                  <fnm>M</fnm>
               </au>
            </aug>
            <source>Genomics</source>
            <pubdate>2005</pubdate>
            <volume>86</volume>
            <issue>4</issue>
            <fpage>414</fpage>
            <lpage>422</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">16084061</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B42">
            <title>
               <p>A genome-wide survey of human pseudogenes</p>
            </title>
            <aug>
               <au>
                  <snm>Torrents</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Suyama</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Zdobnov</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Bork</snm>
                  <fnm>P</fnm>
               </au>
            </aug>
            <source>Genome Res</source>
            <pubdate>2003</pubdate>
            <volume>13</volume>
            <issue>12</issue>
            <fpage>2559</fpage>
            <lpage>2567</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">403797</pubid>
                  <pubid idtype="pmpid" link="fulltext">14656963</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B43">
            <title>
               <p>Systematic identification of pseudogenes through whole genome expression evidence profiling</p>
            </title>
            <aug>
               <au>
                  <snm>Yao</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Charlab</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Li</snm>
                  <fnm>P</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>2006</pubdate>
            <volume>34</volume>
            <issue>16</issue>
            <fpage>4477</fpage>
            <lpage>4485</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1636364</pubid>
                  <pubid idtype="pmpid" link="fulltext">16945953</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B44">
            <title>
               <p>Pseudogenes in the ENCODE regions: consensus annotation, analysis of transcription, and evolution</p>
            </title>
            <aug>
               <au>
                  <snm>Zheng</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Frankish</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Baertsch</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Kapranov</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Reymond</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Choo</snm>
                  <fnm>SW</fnm>
               </au>
               <au>
                  <snm>Lu</snm>
                  <fnm>Y</fnm>
               </au>
               <au>
                  <snm>Denoeud</snm>
                  <fnm>F</fnm>
               </au>
               <au>
                  <snm>Antonarakis</snm>
                  <fnm>SE</fnm>
               </au>
               <au>
                  <snm>Snyder</snm>
                  <fnm>M</fnm>
               </au>
               <etal/>
            </aug>
            <source>Genome Res</source>
            <pubdate>2007</pubdate>
            <volume>17</volume>
            <issue>6</issue>
            <fpage>839</fpage>
            <lpage>851</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1891343</pubid>
                  <pubid idtype="pmpid" link="fulltext">17568002</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B45">
            <title>
               <p>Resolution among major placental mammal interordinal relationships with genome data imply that speciation influenced their earliest radiations</p>
            </title>
            <aug>
               <au>
                  <snm>Hallstrom</snm>
                  <fnm>BM</fnm>
               </au>
               <au>
                  <snm>Janke</snm>
                  <fnm>A</fnm>
               </au>
            </aug>
            <source>BMC Evol Biol</source>
            <pubdate>2008</pubdate>
            <volume>8</volume>
            <fpage>162</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">2435553</pubid>
                  <pubid idtype="pmpid" link="fulltext">18505555</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B46">
            <title>
               <p>From gene to organismal phylogeny: reconciled trees and the gene tree/species tree problem</p>
            </title>
            <aug>
               <au>
                  <snm>Page</snm>
                  <fnm>RD</fnm>
               </au>
               <au>
                  <snm>Charleston</snm>
                  <fnm>MA</fnm>
               </au>
            </aug>
            <source>Mol Phylogenet Evol</source>
            <pubdate>1997</pubdate>
            <volume>7</volume>
            <issue>2</issue>
            <fpage>231</fpage>
            <lpage>240</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">9126565</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B47">
            <title>
               <p>Gene loss, protein sequence divergence, gene dispensability, expression level, and interactivity are correlated in eukaryotic evolution</p>
            </title>
            <aug>
               <au>
                  <snm>Krylov</snm>
                  <fnm>DM</fnm>
               </au>
               <au>
                  <snm>Wolf</snm>
                  <fnm>YI</fnm>
               </au>
               <au>
                  <snm>Rogozin</snm>
                  <fnm>IB</fnm>
               </au>
               <au>
                  <snm>Koonin</snm>
                  <fnm>EV</fnm>
               </au>
            </aug>
            <source>Genome Res</source>
            <pubdate>2003</pubdate>
            <volume>13</volume>
            <issue>10</issue>
            <fpage>2229</fpage>
            <lpage>2235</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">403683</pubid>
                  <pubid idtype="pmpid" link="fulltext">14525925</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B48">
            <title>
               <p>WebGestalt: an integrated system for exploring gene sets in various biological contexts</p>
            </title>
            <aug>
               <au>
                  <snm>Zhang</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Kirov</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Snoddy</snm>
                  <fnm>J</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>2005</pubdate>
            <issue>33 Web Server</issue>
            <fpage>W741</fpage>
            <lpage>748</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1160236</pubid>
                  <pubid idtype="pmpid" link="fulltext">15980575</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B49">
            <title>
               <p>T-targets: clues to understanding the functions of T-box proteins</p>
            </title>
            <aug>
               <au>
                  <snm>Tada</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Smith</snm>
                  <fnm>JC</fnm>
               </au>
            </aug>
            <source>Dev Growth Differ</source>
            <pubdate>2001</pubdate>
            <volume>43</volume>
            <issue>1</issue>
            <fpage>1</fpage>
            <lpage>11</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">11148447</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B50">
            <title>
               <p>The Ensembl automatic gene annotation system</p>
            </title>
            <aug>
               <au>
                  <snm>Curwen</snm>
                  <fnm>V</fnm>
               </au>
               <au>
                  <snm>Eyras</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Andrews</snm>
                  <fnm>TD</fnm>
               </au>
               <au>
                  <snm>Clarke</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Mongin</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Searle</snm>
                  <fnm>SM</fnm>
               </au>
               <au>
                  <snm>Clamp</snm>
                  <fnm>M</fnm>
               </au>
            </aug>
            <source>Genome Res</source>
            <pubdate>2004</pubdate>
            <volume>14</volume>
            <issue>5</issue>
            <fpage>942</fpage>
            <lpage>950</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">479124</pubid>
                  <pubid idtype="pmpid" link="fulltext">15123590</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B51">
            <title>
               <p>Fast-evolving noncoding sequences in the human genome</p>
            </title>
            <aug>
               <au>
                  <snm>Bird</snm>
                  <fnm>CP</fnm>
               </au>
               <au>
                  <snm>Stranger</snm>
                  <fnm>BE</fnm>
               </au>
               <au>
                  <snm>Liu</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Thomas</snm>
                  <fnm>DJ</fnm>
               </au>
               <au>
                  <snm>Ingle</snm>
                  <fnm>CE</fnm>
               </au>
               <au>
                  <snm>Beazley</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Miller</snm>
                  <fnm>W</fnm>
               </au>
               <au>
                  <snm>Hurles</snm>
                  <fnm>ME</fnm>
               </au>
               <au>
                  <snm>Dermitzakis</snm>
                  <fnm>ET</fnm>
               </au>
            </aug>
            <source>Genome Biol</source>
            <pubdate>2007</pubdate>
            <volume>8</volume>
            <issue>6</issue>
            <fpage>R118</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">2394770</pubid>
                  <pubid idtype="pmpid" link="fulltext">17578567</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B52">
            <title>
               <p>Reference based annotation with GeneMapper</p>
            </title>
            <aug>
               <au>
                  <snm>Chatterji</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Pachter</snm>
                  <fnm>L</fnm>
               </au>
            </aug>
            <source>Genome Biol</source>
            <pubdate>2006</pubdate>
            <volume>7</volume>
            <issue>4</issue>
            <fpage>R29</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1557983</pubid>
                  <pubid idtype="pmpid" link="fulltext">16600017</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B53">
            <title>
               <p>Comparisons of dN/dS are time dependent for closely related bacterial genomes</p>
            </title>
            <aug>
               <au>
                  <snm>Rocha</snm>
                  <fnm>EP</fnm>
               </au>
               <au>
                  <snm>Smith</snm>
                  <fnm>JM</fnm>
               </au>
               <au>
                  <snm>Hurst</snm>
                  <fnm>LD</fnm>
               </au>
               <au>
                  <snm>Holden</snm>
                  <fnm>MT</fnm>
               </au>
               <au>
                  <snm>Cooper</snm>
                  <fnm>JE</fnm>
               </au>
               <au>
                  <snm>Smith</snm>
                  <fnm>NH</fnm>
               </au>
               <au>
                  <snm>Feil</snm>
                  <fnm>EJ</fnm>
               </au>
            </aug>
            <source>J Theor Biol</source>
            <pubdate>2006</pubdate>
            <volume>239</volume>
            <issue>2</issue>
            <fpage>226</fpage>
            <lpage>235</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">16239014</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B54">
            <title>
               <p>Comparative Genomics Search for Losses of Long-Established Genes on the Human Lineage</p>
            </title>
            <aug>
               <au>
                  <snm>Zhu</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Sanborn</snm>
                  <fnm>JZ</fnm>
               </au>
               <au>
                  <snm>Diekhans</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Lowe</snm>
                  <fnm>CB</fnm>
               </au>
               <au>
                  <snm>Pringle</snm>
                  <fnm>TH</fnm>
               </au>
               <au>
                  <snm>Haussler</snm>
                  <fnm>D</fnm>
               </au>
            </aug>
            <source>PLoS Comput Biol</source>
            <pubdate>2007</pubdate>
            <volume>3</volume>
            <issue>12</issue>
            <fpage>e247</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">2134963</pubid>
                  <pubid idtype="pmpid" link="fulltext">18085818</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B55">
            <title>
               <p>Selection for tameness modulates the expression of heme related genes in silver foxes</p>
            </title>
            <aug>
               <au>
                  <snm>Lindberg</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Bjornerfeldt</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Bakken</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Vila</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Jazin</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Saetre</snm>
                  <fnm>P</fnm>
               </au>
            </aug>
            <source>Behav Brain Funct</source>
            <pubdate>2007</pubdate>
            <volume>3</volume>
            <fpage>18</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1858698</pubid>
                  <pubid idtype="pmpid" link="fulltext">17439650</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B56">
            <title>
               <p>Backup without redundancy: genetic interactions reveal the cost of duplicate gene loss</p>
            </title>
            <aug>
               <au>
                  <snm>Ihmels</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Collins</snm>
                  <fnm>SR</fnm>
               </au>
               <au>
                  <snm>Schuldiner</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Krogan</snm>
                  <fnm>NJ</fnm>
               </au>
               <au>
                  <snm>Weissman</snm>
                  <fnm>JS</fnm>
               </au>
            </aug>
            <source>Mol Syst Biol</source>
            <pubdate>2007</pubdate>
            <volume>3</volume>
            <fpage>86</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1847942</pubid>
                  <pubid idtype="pmpid" link="fulltext">17389874</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B57">
            <title>
               <p>The pattern of evolution of smaller-scale gene duplicates in mammalian genomes is more consistent with neo- than subfunctionalisation</p>
            </title>
            <aug>
               <au>
                  <snm>Hughes</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Liberles</snm>
                  <fnm>DA</fnm>
               </au>
            </aug>
            <source>J Mol Evol</source>
            <pubdate>2007</pubdate>
            <volume>65</volume>
            <issue>5</issue>
            <fpage>574</fpage>
            <lpage>588</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">17957399</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B58">
            <title>
               <p>EnsMart: a generic system for fast and flexible access to biological data</p>
            </title>
            <aug>
               <au>
                  <snm>Kasprzyk</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Keefe</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Smedley</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>London</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Spooner</snm>
                  <fnm>W</fnm>
               </au>
               <au>
                  <snm>Melsopp</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Hammond</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Rocca-Serra</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Cox</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Birney</snm>
                  <fnm>E</fnm>
               </au>
            </aug>
            <source>Genome Res</source>
            <pubdate>2004</pubdate>
            <volume>14</volume>
            <issue>1</issue>
            <fpage>160</fpage>
            <lpage>169</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">314293</pubid>
                  <pubid idtype="pmpid" link="fulltext">14707178</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B59">
            <title>
               <p>Maximum discrimination hidden Markov models of sequence consensus</p>
            </title>
            <aug>
               <au>
                  <snm>Eddy</snm>
                  <fnm>SR</fnm>
               </au>
               <au>
                  <snm>Mitchison</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Durbin</snm>
                  <fnm>R</fnm>
               </au>
            </aug>
            <source>J Comput Biol</source>
            <pubdate>1995</pubdate>
            <volume>2</volume>
            <issue>1</issue>
            <fpage>9</fpage>
            <lpage>23</lpage>
            <xrefbib>
               <pubid idtype="pmpid">7497123</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B60">
            <title>
               <p>PAML: a program package for phylogenetic analysis by maximum likelihood</p>
            </title>
            <aug>
               <au>
                  <snm>Yang</snm>
                  <fnm>Z</fnm>
               </au>
            </aug>
            <source>Comput Appl Biosci</source>
            <pubdate>1997</pubdate>
            <volume>13</volume>
            <issue>5</issue>
            <fpage>555</fpage>
            <lpage>556</lpage>
            <xrefbib>
               <pubid idtype="pmpid">9367129</pubid>
            </xrefbib>
         </bibl>
      </refgrp>
   </bm>
</art>
