<?xml version='1.0'?>
<!DOCTYPE art SYSTEM 'http://www.biomedcentral.com/xml/article.dtd'>
<art>
   <ui>1471-2105-5-6</ui>
   <ji>1471-2105</ji>
   <fm>
      <dochead>Research article</dochead>
      <bibl rating="0">
         <title>
            <p>Benchmarking tools for the alignment of functional noncoding DNA</p>
         </title>
         <aug>
            <au id="A1" ca="no" ce="no" pa="no" da="no">
               <snm>Pollard</snm>
               <mi>A</mi>
               <fnm>Daniel</fnm>
               <insr iid="I1"/>
               <email>dpollard@socrates.berkeley.edu</email>
            </au>
            <au id="A2" ca="yes" ce="no" pa="no" da="no">
               <snm>Bergman</snm>
               <mi>M</mi>
               <fnm>Casey</fnm>
               <insr iid="I2"/>
               <insr iid="I3"/>
               <insr iid="I6"/>
               <email>cbergman@gen.cam.ac.uk</email>
            </au>
            <au id="A3" ca="no" ce="no" pa="no" da="no">
               <snm>Stoye</snm>
               <fnm>Jens</fnm>
               <insr iid="I4"/>
               <email>stoye@techfak.uni-bielefeld.de</email>
            </au>
            <au id="A4" ca="no" ce="no" pa="no" da="no">
               <snm>Celniker</snm>
               <mi>E</mi>
               <fnm>Susan</fnm>
               <insr iid="I2"/>
               <insr iid="I3"/>
               <email>celniker@fruitfly.org</email>
            </au>
            <au id="A5" ca="no" ce="no" pa="no" da="no">
               <snm>Eisen</snm>
               <mi>B</mi>
               <fnm>Michael</fnm>
               <insr iid="I2"/>
               <insr iid="I5"/>
               <email>mbeisen@lbl.gov</email>
            </au>
         </aug>
         <insg>
            <ins id="I1">
               <p>Biophysics Graduate Group, University of California, Berkeley, CA 94720, USA</p>
            </ins>
            <ins id="I2">
               <p>Department of Genome Science, Life Science Division, Lawrence Orlando Berkeley National Laboratory, Berkeley, CA 94720, USA</p>
            </ins>
            <ins id="I3">
               <p>Berkeley <it>Drosophila </it>Genome Project, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA</p>
            </ins>
            <ins id="I4">
               <p>Technische Fakult&#228;t, Universit&#228;t Bielefeld, 33594 Bielefeld, Germany</p>
            </ins>
            <ins id="I5">
               <p>Department of Molecular and Cell Biology, University of California, Berkeley, CA 94720, USA</p>
            </ins>
            <ins id="I6">
               <p>Department of Genetics, University of Cambridge, Cambridge, UK CB2 3EH</p>
            </ins>
         </insg>
         <source>BMC Bioinformatics</source>
         <issn>1471-2105</issn>
         <pubdate>2004</pubdate>
         <volume>5</volume>
         <issue>1</issue>
         <fpage>6</fpage>
         <url>http://www.biomedcentral.com/1471-2105/5/6</url>
         <xrefbib>
            <pubidlist>
               <pubid idtype="doi" link="fulltext">10.1186/1471-2105-5-6</pubid>
               <pubid idtype="pmpid" link="fulltext">14736341</pubid>
            </pubidlist>
         </xrefbib>
      </bibl>
      <history>
         <rec>
            <date>
               <day>05</day>
               <month>11</month>
               <year>2003</year>
            </date>
         </rec>
         <acc>
            <date>
               <day>21</day>
               <month>1</month>
               <year>2004</year>
            </date>
         </acc>
         <pub>
            <date>
               <day>21</day>
               <month>1</month>
               <year>2004</year>
            </date>
         </pub>
      </history>
      <cpyrt>
         <year>2004</year>
         <collab>Pollard et al; licensee BioMed Central Ltd. This is an Open Access article: verbatim copying and redistribution of this article are permitted in all media for any purpose, provided this notice is preserved along with the article's original URL.</collab>
      </cpyrt>
      <abs>
         <sec>
            <st>
               <p>Abstract</p>
            </st>
            <sec>
               <st>
                  <p>Background</p>
               </st>
               <p>Numerous tools have been developed to align genomic sequences. However, their relative performance in specific applications remains poorly characterized. Alignments of protein-coding sequences typically have been benchmarked against "correct" alignments inferred from structural data. For noncoding sequences, where such independent validation is lacking, simulation provides an effective means to generate "correct" alignments with which to benchmark alignment tools.</p>
            </sec>
            <sec>
               <st>
                  <p>Results</p>
               </st>
               <p>Using rates of noncoding sequence evolution estimated from the genus <it>Drosophila</it>, we simulated alignments over a range of divergence times under varying models incorporating point substitution, insertion/deletion events, and short blocks of constrained sequences such as those found in <it>cis</it>-regulatory regions. We then compared "correct" alignments generated by a modified version of the ROSE simulation platform to alignments of the simulated derived sequences produced by eight pairwise alignment tools (Avid, BlastZ, Chaos, ClustalW, DiAlign, Lagan, Needle, and WABA) to determine the off-the-shelf performance of each tool. As expected, the ability to align noncoding sequences accurately decreases with increasing divergence for all tools, and declines faster in the presence of insertion/deletion evolution. Global alignment tools (Avid, ClustalW, Lagan, and Needle) typically have higher sensitivity over entire noncoding sequences as well as in constrained sequences. Local tools (BlastZ, Chaos, and WABA) have lower overall sensitivity as a consequence of incomplete coverage, but have high specificity to detect constrained sequences as well as high sensitivity within the subset of sequences they align. Tools such as DiAlign, which generate both local and global outputs, produce alignments of constrained sequences with both high sensitivity and specificity for divergence distances in the range of 1.25&#8211;3.0 substitutions per site.</p>
            </sec>
            <sec>
               <st>
                  <p>Conclusion</p>
               </st>
               <p>For species with genomic properties similar to <it>Drosophila</it>, we conclude that a single pair of optimally diverged species analyzed with a high performance alignment tool can yield accurate and specific alignments of functionally constrained noncoding sequences. Further algorithm development, optimization of alignment parameters, and benchmarking studies will be necessary to extract the maximal biological information from alignments of functional noncoding DNA.</p>
            </sec>
         </sec>
      </abs>
   </fm>
   <meta>
      <classifications>
         <classification type="bmc" subtype="user_supplied_xml" id="endnote"/>
      </classifications>
   </meta>
   <bdy>
      <sec>
         <st>
            <p>Background</p>
         </st>
         <p>The increasing availability of genome sequences of related organisms offers myriad opportunities to address questions in gene function, genome organization and evolution, but also presents new challenges for sequence analysis. Many classical tools for sequence analysis are obsolete, and there has been active effort in recent years to develop tools that work efficiently with whole genome data. Aligning long genomic sequences &#8211; the first step in many analyses &#8211; is substantially more complex and computational taxing than aligning short sequences, and many methods have been developed in recent years to address this challenge (reviewed in <abbrgrp><abbr bid="B1">1</abbr><abbr bid="B2">2</abbr></abbrgrp>). Nevertheless, comparative genomic researchers are still faced with the task of making decisions such as which alignment tools to use and which genomes to compare for their particular application. Benchmarking studies that address both the selection of alignment methods and the choice of species can provide the needed framework for informed application of genomic alignment tools and biological discovery in the field of comparative genomics.</p>
         <p>Research in alignment benchmarking has focused on the alignment of protein-coding sequences <abbrgrp><abbr bid="B3">3</abbr><abbr bid="B4">4</abbr></abbrgrp>, where independent evidence (either the three-dimensional structure of a protein sequence <abbrgrp><abbr bid="B5">5</abbr><abbr bid="B6">6</abbr></abbrgrp> or cDNA sequence <abbrgrp><abbr bid="B7">7</abbr><abbr bid="B8">8</abbr></abbrgrp>) is available to use as a "gold standard" to assess the relative performance of different alignment tools. In contrast, little is known about the relative performance of tools to align noncoding sequences, which comprise the vast majority of metazoan genomes and contain many functional sequences including <it>cis</it>-regulatory elements that control gene regulation. For noncoding sequences, little external evidence is available to evaluate alignment tool performance. Benchmarking, however, can be achieved by simulating sequence divergence <it>in silico </it>where it is possible to generate sequences that are related by a known, "correct" alignment <abbrgrp><abbr bid="B9">9</abbr></abbrgrp>. Simulation experiments have been used extensively to assess the performance of different methods for phylogenetic reconstruction <abbrgrp><abbr bid="B10">10</abbr></abbrgrp>. Yet only a few studies to date have exploited simulated data to benchmark alignment tools <abbrgrp><abbr bid="B11">11</abbr><abbr bid="B12">12</abbr><abbr bid="B13">13</abbr><abbr bid="B14">14</abbr><abbr bid="B15">15</abbr><abbr bid="B16">16</abbr><abbr bid="B17">17</abbr><abbr bid="B18">18</abbr></abbrgrp>, and currently none have done so explicitly for the purposes of functional noncoding sequence alignment.</p>
         <p>Here we present results of a simulation-based benchmarking study designed to assess the performance of eight tools (Avid, BlastZ, Chaos, ClustalW, DiAlign, Lagan, Needle, and WABA) for the pairwise alignment of noncoding sequences. We have chosen to address the question of pairwise alignment since pairwise alignment methods often are used in the construction of multiple alignments, since the evaluation of pairwise alignment performance is more tractable than that of multiple alignment, and since pairwise alignment performance is an important part of a general assessment of noncoding alignment strategies. We have chosen to model noncoding sequence evolution in the genus <it>Drosophila </it>as a biological system for methodological evaluation, because of the high quality sequence and annotations available for <it>D. melanogaster </it><abbrgrp><abbr bid="B19">19</abbr><abbr bid="B20">20</abbr></abbrgrp>, and the recent availability of the genome sequence for the related species, <it>D. pseudoobscura </it><abbrgrp><abbr bid="B21">21</abbr></abbrgrp>. In addition, because of the high rate of deletion as well as the relatively low density of repetitive DNA as compared with mammalian genomes <abbrgrp><abbr bid="B22">22</abbr><abbr bid="B23">23</abbr><abbr bid="B24">24</abbr></abbrgrp>, <it>Drosophila </it>noncoding regions are likely to be enriched for sequences under functional constraint. Previous results indicate that <it>Drosophila </it>noncoding regions contain an abundance of short blocks of highly conserved sequences, but that the detection of these sequences is dependent on the alignment method used <abbrgrp><abbr bid="B25">25</abbr></abbrgrp>. Optimizing strategies for the accurate identification of functionally constrained noncoding sequences will play a critical role in the annotation of <it>cis</it>-regulatory elements and other important noncoding sequences in <it>Drosophila </it>as well as other metazoan genomes.</p>
         <p>In this study, we use empirically-derived estimates to parameterize simulations of noncoding sequence evolution over a range of divergences that includes those between species commonly used in comparative genomics such as <it>H. sapiens-M. musculus </it><abbrgrp><abbr bid="B26">26</abbr><abbr bid="B27">27</abbr></abbrgrp>, <it>C. elegans-C. briggsae </it><abbrgrp><abbr bid="B28">28</abbr><abbr bid="B29">29</abbr></abbrgrp> and <it>D. melanogaster</it>-<it>D. pseudoobscura </it><abbrgrp><abbr bid="B30">30</abbr><abbr bid="B31">31</abbr></abbrgrp>. Alignments of simulated descendent sequences produced by the tools under consideration were compared to correct alignments and various performance measures were calculated. In general, we find that global tools (Avid, ClustalW, DiAlign-G, Lagan, and Needle), which align the entirety of input sequences, tend to have the highest accuracy over entire sequences as well as within interspersed blocks of constrained sequences, but both measures were decreasing functions of divergence. Local tools (BlastZ, Chaos, DiAlign-L, and WABA), which align subsets of input sequences, tend to have the highest accuracy for the portion of the sequences they align, but the proportion of sequences included in their alignments decreased quickly with increasing divergence distance. For intermediate to high divergences, local tools also showed a high specificity for only aligning interspersed blocks of constrained sequences. Despite these general trends, we find that some tools can systematically out-perform others over a wide range of divergence distances. These results should prove useful for comparative genomics researchers and algorithm developers alike.</p>
      </sec>
      <sec>
         <st>
            <p>Results</p>
         </st>
         <sec>
            <st>
               <p>Properties of noncoding DNA in <it>Drosophila</it></p>
            </st>
            <p>To make our simulation results as biologically meaningful as possible, we estimated properties of noncoding regions in <it>D. melanogaster </it>using Release 3 euchromatic genome sequences and annotations <abbrgrp><abbr bid="B19">19</abbr><abbr bid="B20">20</abbr></abbrgrp>. As described in the methods, we masked all annotated coding exons and known transposable elements to derive a data set of unique sequences representative of noncoding regions in the <it>D. melanogaster </it>genome. In total, we obtained 55,325 noncoding regions ranging in size from 1 to 156,299 bp with two modes at approximately 70 and 500 bp (Figure <figr fid="F1">1</figr>). Greater than 95% of noncoding sequences in the <it>D. melanogaster </it>genome are less than 10 Kb in length, thus 10 Kb was used as the sequence length for our simulations. Nucleotide frequencies derived from this set of noncoding regions were used to parameterize both our model of noncoding DNA as well as our substitution model used in our simulations.</p>
            <fig id="F1">
               <title>
                  <p>Figure 1</p>
               </title>
               <caption>
                  <p>Distribution of noncoding sequence lengths in the <it>D. melanogaster </it>Release 3 genome sequence</p>
               </caption>
               <text>
                  <p><b>Distribution of noncoding sequence lengths in the <it>D. melanogaster </it>Release 3 genome sequence. </b>Sequences between coding exons were extracted from the <it>D. melanogaster </it>Release 3 euchromatic genome sequence and annotations, and transposable element sequences were subsequently subtracted to produce the "pre-integration" distribution of noncoding sequence lengths (see Methods for details).</p>
               </text>
               <graphic file="1471-2105-5-6-1"/>
            </fig>
         </sec>
         <sec>
            <st>
               <p>Estimates of divergence between taxa used in comparative genomics</p>
            </st>
            <p>To link our simulations to species commonly used in comparative genomic analyses of noncoding DNA, we estimated silent site divergence (K<sub>s</sub>) between <it>H. sapiens </it>vs. <it>M. musculus</it>, <it>C. elegans </it>vs. <it>C. briggsae</it>, and <it>D. melanogaster </it>vs. <it>D. pseudoobscura </it>(see methods). Since estimates of K<sub>s </sub>are highly dependent on methodology, we sought to generate estimates between these three species pairs using a single method. We estimate the mean (and median) of K<sub>s </sub>measured in expected number of substitutions per silent site, for these species pairs to be: <it>H. sapiens </it>vs. <it>M. musculus </it>0.64 (0.56); <it>C. elegans </it>vs. <it>C. briggsae</it>, 1.39 (1.26); and <it>D. melanogaster </it>vs. <it>D. pseudoobscura</it>, 2.40 (2.24). We note that these divergence estimates do not underlie our simulation, but rather are intended to frame the interpretation of our simulation results in a biological context.</p>
         </sec>
         <sec>
            <st>
               <p>Simulating noncoding sequence evolution</p>
            </st>
            <p>Using a model of noncoding DNA, parameterized with <it>D. melanogaster </it>nucleotide frequencies (see Methods for details), we generated 10 Kb sequences which were used as "ancestral" inputs to the ROSE sequence evolution simulation program <abbrgrp><abbr bid="B9">9</abbr><abbr bid="B32">32</abbr></abbrgrp> to create pairs of "derived" output sequences. It is important to note that ROSE provides both pairs of derived sequences and their correct alignment, and that the modifications to ROSE implemented here allow ancestral constraints to be mapped onto derived sequences. Sequence evolution in ROSE occurred under four simulation regimes: A) without insertion/deletion (indel) evolution and without constrained blocks; B) with indel evolution and without constrained blocks; C) without indel evolution and with constrained blocks; and D) with indel evolution and with constrained blocks. Regime D is the most realistic and relevant for the interpretation of real biological data. Other regimes were used to calibrate the outputs of our simulations and address the effects of different models of evolution on noncoding sequence alignment. Under each regime, 1,000 replicate pairs of sequences were evolved to each of eleven divergence distances ranging from 0.25 to 5.0 substitutions per site. Levels of constraint as well as relative evolutionary rates of constrained to unconstrained sites and of indels to point substitution were chosen based on previously reported estimates from the literature (see Table <tblr tid="T1">1</tblr> and Methods).</p>
            <tbl id="T1">
               <title>
                  <p>Table 1</p>
               </title>
               <caption>
                  <p>Summary of parameters used in simulations of noncoding sequence evolution.</p>
               </caption>
               <tblbdy cols="4">
                  <r>
                     <c ca="center">
                        <p>Parameter</p>
                     </c>
                     <c ca="center">
                        <p>Value</p>
                     </c>
                     <c ca="center">
                        <p>Source</p>
                     </c>
                     <c ca="center">
                        <p>Refs.</p>
                     </c>
                  </r>
                  <r>
                     <c cspan="4">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>Sequence length</p>
                     </c>
                     <c ca="center">
                        <p>10 Kb</p>
                     </c>
                     <c ca="center">
                        <p>D. mel</p>
                     </c>
                     <c ca="center">
                        <p>this work (Fig. <figr fid="F1">1</figr>)</p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>AT : GC</p>
                     </c>
                     <c ca="center">
                        <p>60 : 40</p>
                     </c>
                     <c ca="center">
                        <p>
                           <it>Drosophila spp.</it>
                        </p>
                     </c>
                     <c ca="center">
                        <p>this work, <abbrgrp><abbr bid="B31">31</abbr><abbr bid="B55">55</abbr></abbrgrp></p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>Transition / Transversion Bias</p>
                     </c>
                     <c ca="center">
                        <p>2</p>
                     </c>
                     <c ca="center">
                        <p>
                           <it>Drosophila spp.</it>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <abbrgrp>
                              <abbr bid="B25">25</abbr>
                              <abbr bid="B56">56</abbr>
                           </abbrgrp>
                        </p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>Substitution model</p>
                     </c>
                     <c ca="center">
                        <p>HKY85</p>
                     </c>
                     <c ca="center">
                        <p>-</p>
                     </c>
                     <c ca="center">
                        <p>
                           <abbrgrp>
                              <abbr bid="B54">54</abbr>
                           </abbrgrp>
                        </p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>Point substitutions : Indels</p>
                     </c>
                     <c ca="center">
                        <p>10 : 1</p>
                     </c>
                     <c ca="center">
                        <p>
                           <it>Drosophila spp.</it>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <abbrgrp>
                              <abbr bid="B22">22</abbr>
                              <abbr bid="B23">23</abbr>
                              <abbr bid="B25">25</abbr>
                           </abbrgrp>
                        </p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>Indel spectrum</p>
                     </c>
                     <c ca="center">
                        <p>-</p>
                     </c>
                     <c ca="center">
                        <p>
                           <it>D. mel</it>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <abbrgrp>
                              <abbr bid="B57">57</abbr>
                           </abbrgrp>
                        </p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>Median constrained block length</p>
                     </c>
                     <c ca="center">
                        <p>18 bp</p>
                     </c>
                     <c ca="center">
                        <p><it>D. mel </it>vs. <it>D. vir</it></p>
                     </c>
                     <c ca="center">
                        <p>
                           <abbrgrp>
                              <abbr bid="B25">25</abbr>
                           </abbrgrp>
                        </p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>Mean density of constrained blocks</p>
                     </c>
                     <c ca="center">
                        <p>0.2</p>
                     </c>
                     <c ca="center">
                        <p><it>D. mel </it>vs. <it>D. vir</it></p>
                     </c>
                     <c ca="center">
                        <p>
                           <abbrgrp>
                              <abbr bid="B25">25</abbr>
                           </abbrgrp>
                        </p>
                     </c>
                  </r>
               </tblbdy>
            </tbl>
         </sec>
         <sec>
            <st>
               <p>Characterization of simulation outputs</p>
            </st>
            <p>To characterize simulation outputs, derived pairs of sequences in alignments provided by ROSE were analyzed for the following measures: estimated overall divergence, estimated divergence in constrained blocks, estimated divergence in unconstrained blocks, overall identity, identity in constrained blocks, identity in unconstrained blocks, fraction of ancestral sequence remaining, fraction of sequences constrained, and differences in length. These simulation statistics are summarized in Figure <figr fid="F2">2</figr> and demonstrate that the expected outputs of our simulations are observed. In the absence of constrained blocks, estimated overall divergences correspond well with the input distance parameters up to 3.0&#8211;4.0 substitutions per site (Figure <figr fid="F2">2A</figr> and <figr fid="F2">2B</figr>, black boxes). In the presence of constrained blocks, estimated overall divergences (Figure <figr fid="F2">2C</figr> and <figr fid="F2">2D</figr>, black boxes) are less than the input distance parameters because these sequences are made up of both unconstrained sites evolving at the rate set by the input parameter (Figure <figr fid="F2">2C</figr> and <figr fid="F2">2D</figr>, brown triangles) as well as blocks of constrained sites evolving ten times more slowly (Figure <figr fid="F2">2C</figr> and <figr fid="F2">2D</figr>, grey circles). The more pronounced deviation of the estimated overall divergences from the input distance parameters in the regime with indel evolution (Figure <figr fid="F2">2C</figr> vs. <figr fid="F2">2D</figr>) is due to preferential deletion of sequence under no constraint which enriches for constrained sites and leads to a decrease in estimated divergences.</p>
            <fig id="F2">
               <title>
                  <p>Figure 2</p>
               </title>
               <caption>
                  <p>Simulation statistics</p>
               </caption>
               <text>
                  <p><b>Simulation statistics. </b>Pairwise alignments were simulated for a range of divergence distances,  using a modified version of the ROSE simulation platform under four  different regimes: A) without constrained blocks and without  insertion/deletion evolution; B) without constrained blocks and with  insertion/deletion evolution; C) with constrained blocks and without  insertion/deletion evolution; D) with constrained blocks and with  insertion/deletion evolution. For each divergence distance, 1,000 replicates were used to calculate the mean and standard error for the following statistics: estimated overall divergence (black boxes), estimated divergence in constrained blocks of sites (grey circles), estimated divergence in unconstrained blocks of sites (brown triangles), identity (red crosses), identity in constrained blocks (yellow x's), identity in unconstrained blocks (green diamonds), fraction of ancestral sequence remaining in derived sequences (green triangle), and fraction of constraint (light blue checked boxes). Note that the divergence scale in this and following figures is discontinuous.</p>
               </text>
               <graphic file="1471-2105-5-6-2"/>
            </fig>
            <p>Overall identity between derived pairs in the regimes without constrained blocks decreases to the random background of 0.26 (the sum of the squares of the mononucleotide frequencies) by 5.0 substitutions per site with and without indel evolution (Figure <figr fid="F2">2A</figr> and <figr fid="F2">2B</figr>, red crosses). In the regimes with constrained blocks, unconstrained sites have the same level of identity as entire sequences in the regimes without constrained blocks (Figure <figr fid="F2">2C</figr> and <figr fid="F2">2D</figr>, green diamonds), whereas the identity in the constrained blocks is much greater (Figure <figr fid="F2">2C</figr> and <figr fid="F2">2D</figr>, yellow x's). In the regimes with indel evolution, the fraction of the ancestral sequence remaining diminishes most quickly in the absence of constrained blocks (Figure <figr fid="F2">2B</figr>, green triangles). In regime C (with constrained blocks and without indel evolution), the fraction of constrained sites in derived sequences matches the input parameter of 0.2 (Figure <figr fid="F2">2C</figr>, blue checked-boxes). However, in regime D (with constrained blocks and indel evolution), the fraction of constrained sites in derived sequences decreases below the input parameter of 0.2 at large divergence distances (Figure <figr fid="F2">2D</figr>, blue checked-boxes). This is because the derived sequences are on average longer than ancestral sequences in regime D, differing by 300&#8211;400 bp at 1 substitution per site, 400&#8211;500 bp at 2 substitutions per site and 700&#8211;800 bp at 5 substitutions per site. In our simulation there are equal input rates of insertion and deletion, however deletions are unable to extend into constrained blocks and are omitted, creating a net excess of insertions to deletions. This phenomenon was recently proposed as a possible explanation for differences in observed insertion:deletion ratios in unconstrained dead-on-arrival retrotransposon pseudogenes versus noncoding sequences flanking genes <abbrgrp><abbr bid="B33">33</abbr></abbrgrp>.</p>
         </sec>
         <sec>
            <st>
               <p>Comparative analysis of genomic alignment tools</p>
            </st>
            <p>Unaligned pairs of derived sequences generated by ROSE were used as input to each of the eight genomic alignment tools (see Methods) and resulting alignments were compared to the simulated alignments produced by ROSE. Our objective was to test the off-the-shelf performance of these tools over a wide range of different divergences, so each tool was run using default parameter settings. In addition, BlastZ and Chaos were run using author suggested settings (BlastZ-A and Chaos-A), as described in the Methods. We note that the output of DiAlign can be treated as both a global alignment as well as a local alignment, so we analyzed both (DiAlign-G and DiAlign-L). Alignments produced by each tool were scored for the overall coverage and overall sensitivity for all regimes (A&#8211;D), and were also scored for constraint coverage, constraint sensitivity, constraint specificity, and local constraint sensitivity in the regimes with constrained blocks (C and D) (see Methods for details).</p>
            <sec>
               <st>
                  <p>Coverage</p>
               </st>
               <p>Overall coverage was measured to understand the proportion of ungapped, orthologous pairs of sites in the simulated alignment that were aligned by local tools under various evolutionary scenarios. The coverage of each tool under the four simulation regimes is a decreasing function of divergence for local (but not global) tools (Figure <figr fid="F3">3</figr>). In the absence of constrained blocks, local tools tend to align most or all of the sequences for only small divergence distances (0.25&#8211;1.0 substitutions per site), but little or none of the sequences for intermediate to large divergence distances (Figure <figr fid="F3">3A</figr> and <figr fid="F3">3B</figr>). [For convenience, for the remainder of this report we shall refer to 0.25&#8211;1.0 substitutions per site as small distances, 1.25&#8211;3.0 substitutions per site as intermediate distances, and 4.0&#8211;5.0 substitutions per site as large distances.] One exception is Chaos, which has negligible coverage past 0.25 substitutions per site. In the presence of constrained blocks, the coverage of local tools improves substantially at all but the most extreme divergence distances. WABA, which was typical of local tools in the absence of constrained blocks, maintains high coverage out to more than twice the divergence distance of the rest of the local tools in the presence of constrained blocks. WABA also appears to be relatively unaffected by indel evolution, while the other local tools show a reduction in coverage of about 0.5 substitutions per site in regimes with indel evolution (Figure <figr fid="F3">3A</figr> vs. <figr fid="F3">3B</figr>, <figr fid="F3">3C</figr> vs. <figr fid="F3">3D</figr>).</p>
               <fig id="F3">
                  <title>
                     <p>Figure 3</p>
                  </title>
                  <caption>
                     <p>Overall alignment coverage</p>
                  </caption>
                  <text>
                     <p><b>Overall alignment coverage </b>For each divergence distance and each tool, 1,000 replicates were used to calculate the mean and standard error of overall alignment coverage, which was defined as the fraction of ungapped, orthologous pairs of sites in the simulated alignment that were included in an alignment produced by a tool (see Methods for details). A) overall coverage without constrained blocks and without insertion/deletion evolution; B) overall coverage without constrained blocks and with insertion/deletion evolution; C) overall coverage with constrained blocks and without insertion/deletion evolution; D) overall coverage with constrained blocks and with insertion/deletion evolution.</p>
                  </text>
                  <graphic file="1471-2105-5-6-3"/>
               </fig>
            </sec>
            <sec>
               <st>
                  <p>Sensitivity</p>
               </st>
               <p>Overall sensitivity was measured to understand the accuracy of each tool to align all orthologous nucleotide sites under various evolutionary scenarios. The sensitivity of each tool under the four simulation regimes is a decreasing function of divergence for both local and global tools (Figure <figr fid="F4">4</figr>). It is important to note that the maximum sensitivity a tool can attain is limited by its coverage. Thus for most divergence distances, global tools (which by definition have complete coverage) have greater potential for high sensitivity relative to local tools, which have incomplete coverage (see above, Figure <figr fid="F3">3</figr>). Nevertheless, with the exception of WABA, the sensitivity of local tools tends to remain very close to the maximum set by their coverage. This implies that although local tools have diminishing coverage with divergence, the portion of the sequence they do align is aligned quite accurately (see below). Despite the trend of high sensitivity in aligned regions for local tools, the sensitivity of the top global tools tends to be as good as or better than the sensitivity for the top local tools (Figure <figr fid="F4">4</figr>). This is particularly true for intermediate to high divergence distances in the absence of indel evolution. In each of the four regimes, at least one global tool has a higher sensitivity than the next best local tool for intermediate to high divergence distances. In the most biologically relevant regime D, the sensitivity of the highest performing tools (such as Lagan and DiAlign) plateaus over the range of 1.25&#8211;3.0 substitutions per site at higher than 0.35, implying that sites other than those in constrained blocks are being accurately aligned (Figure <figr fid="F4">4D</figr>). In contrast, in the absence of constraint but with indels (regime B), the sensitivity of all alignment tools is practically nil for divergences greater than 1 substitution per site (Figure <figr fid="F4">4B</figr>).</p>
               <fig id="F4">
                  <title>
                     <p>Figure 4</p>
                  </title>
                  <caption>
                     <p>Overall alignment sensitivity</p>
                  </caption>
                  <text>
                     <p><b>Overall alignment sensitivity </b>For each divergence distance and each tool, 1,000 replicates were used to calculate the mean and standard error of overall alignment sensitivity, which was defined as the fraction of ungapped, orthologous pairs of sites in the simulated alignment that were aligned correctly in an alignment produced by a tool (see Methods for details). A) overall sensitivity without constrained blocks and without insertion/deletion evolution; B) overall sensitivity without constrained blocks and with insertion/deletion evolution; C) overall sensitivity with constrained blocks and without insertion/deletion evolution; D) overall sensitivity with constrained blocks and with insertion/deletion evolution.</p>
                  </text>
                  <graphic file="1471-2105-5-6-4"/>
               </fig>
            </sec>
            <sec>
               <st>
                  <p>Coverage and sensitivity in constrained sequences</p>
               </st>
               <p>Alignment coverage and sensitivity across all orthologous sites are informative for understanding the overall performance of a tool, but, for many applications (such as aligning characterized <it>cis</it>-regulatory elements), researchers may only be interested in accurately aligning functionally constrained sites. To assess the ability of each tool to align potentially functional portions of sequences we measured the coverage and sensitivity only for orthologous nucleotide sites within constrained blocks (Figure <figr fid="F5">5</figr>). Constraint coverage is better than overall coverage for local tools but the degree of improvement varies considerably (Figure <figr fid="F5">5A</figr> and <figr fid="F5">5B</figr>). BlastZ, BlastZ-A and WABA all have very similar overall and constraint coverage, suggesting little discrimination in attempting to align constrained versus unconstrained sites. In contrast, DiAlign-L and Chaos-A have much improved constraint coverage compared with overall coverage, suggesting a preferential alignment of constrained sites. For example in the presence of indels, DiAlign-L accurately aligns 86% and 64% of constrained sequences at divergences between 1.25 and 3.0 substitutions per site.</p>
               <fig id="F5">
                  <title>
                     <p>Figure 5</p>
                  </title>
                  <caption>
                     <p>Constraint coverage and sensitivity</p>
                  </caption>
                  <text>
                     <p><b>Constraint coverage and sensitivity </b>For each divergence distance and each tool, 1,000 replicates were used to calculate the mean and standard error of constraint coverage and constraint sensitivity, which were defined as the coverage and sensitivity within interspersed constrained blocks (see Methods for details). A) constraint coverage without insertion/deletion evolution; B) constraint coverage with insertion/deletion evolution; C) constraint sensitivity without insertion/deletion evolution; D) constraint sensitivity with insertion/deletion evolution.</p>
                  </text>
                  <graphic file="1471-2105-5-6-5"/>
               </fig>
               <p>Constraint sensitivity of all tools is much better than overall sensitivity but, as with constraint coverage, the degree of improvement varies considerably across tools (Figure <figr fid="F5">5C</figr> and <figr fid="F5">5D</figr>). Similar to overall sensitivity, global tools tend to maintain the highest sensitivity out to large divergence distances in the presence of constrained sites. It is of note that in the presence of indel evolution (Figure <figr fid="F5">5D</figr>), constraint sensitivity of the best performing global tools (as well as the local Dialign-L) closely parallels the decrease in identity of constrained sites (Figure <figr fid="F2">2D</figr>), suggesting that they are attaining near-maximal constraint sensitivity. Most tools show only moderate decreases in constraint sensitivity in the presence of indel evolution but a few, like ClustalW, Chaos-A, and BlastZ have dramatic decreases in constraint sensitivity in the presence of indel evolution.</p>
            </sec>
            <sec>
               <st>
                  <p>Specificity to detect constrained sequences</p>
               </st>
               <p>Constraint coverage and constraint sensitivity reveal the ability of alignment tools to detect and align <it>all </it>orthologous nucleotides sites within constrained blocks, but for some purposes (like <it>cis</it>-regulatory element prediction) researchers may want to align <it>only </it>constrained nucleotide sites and nothing else, even at the expense of missing some functionally constrained sites. To evaluate the ability of each tool to provide high quality alignments of just potential functionally constrained sites, we measured their constraint specificity and local constraint sensitivity. As shown in Figure <figr fid="F6">6</figr>, constraint specificity is an increasing function of divergence for most tools because unconstrained sequences accumulate mismatches and indels more quickly than the constrained blocks and are thus more likely to be gapped or left out of local alignments. This is particularly true for local tools where decreasing coverage can increase constraint specificity, and less so for global tools for which it is gap parameters that predominantly affect constraint specificity at different divergence distances. Most tools have higher constraint specificity in the presence of indel evolution, although this trend is less pronounced in the highest specificity tools, Chaos and DiAlign-L. All local tools except WABA increase quickly until they reach a constraint specificity of 0.8&#8211;0.9 at which point their constraint specificity plateaus. In the presence of indel evolution, near-maximal constraint specificity is achieved between 1.25 and 3.0 substitutions per site.</p>
               <fig id="F6">
                  <title>
                     <p>Figure 6</p>
                  </title>
                  <caption>
                     <p>Constraint specificity and local constraint sensitivity</p>
                  </caption>
                  <text>
                     <p><b>Constraint specificity and local constraint sensitivity </b>For each divergence distance and each tool, 1,000 replicates were used to calculate a mean and standard error of constraint specificity and local constraint sensitivity. Constraint specificity was defined as the fraction of unconstrained sites in the simulated alignment that were unaligned or gapped in an alignment produced by a tool. Local constraint specificity was defined as the constraint sensitivity for just the sites contained in an alignment produced by a tool (see Methods for details). A) constraint specificity without insertion/deletion evolution; B) constraint specificity with insertion/deletion evolution; C) local constraint sensitivity without insertion/deletion evolution; D) local constraint sensitivity with insertion/deletion evolution.</p>
                  </text>
                  <graphic file="1471-2105-5-6-6"/>
               </fig>
               <p>Local constraint sensitivity (Figure <figr fid="F6">6</figr>) is equivalent to constraint sensitivity (Figure <figr fid="F5">5</figr>) for the global tools, but for the local tools it differs in that it is a measure of their constraint sensitivity just within the subsequences they align. For BlastZ, BlastZ-A, Chaos, and DiAlign-L, local constraint sensitivity is nearly maximal (1.0) with and without indel evolution across all divergences studied. For Chaos-A and WABA, local constraint sensitivity varies with divergence distance and is less than the other local tools. Thus local tools can produce nearly perfect alignments within constraint blocks while maintaining relatively high constraint specificity, though it is important to note that this may not be meaningful if the coverage of a tool is extremely low (e.g. BlastZ, BlastZ-A, Chaos).</p>
            </sec>
         </sec>
      </sec>
      <sec>
         <st>
            <p>Discussion</p>
         </st>
         <p>In this report we investigate the performance of eight pairwise genomic alignment tools to align functional noncoding DNA such as that found in metazoan <it>cis</it>-regulatory regions. To do so, we have used a biologically-informed simulation approach to determine off-the-shelf performance over a range of divergence distances. This study provides important information regarding the ability of genomic alignment tools to identify and align constrained sequences in noncoding regions, which would not otherwise be possible. We argue that a simulation study is necessary to achieve our goal since large datasets of functionally annotated noncoding sequences are not available to use as "gold standards" of alignment accuracy. Likewise, datasets of large orthologous genomic regions spanning a range of divergence distances are only recently becoming available <abbrgrp><abbr bid="B31">31</abbr><abbr bid="B34">34</abbr></abbrgrp>. As is common in alignment benchmarking <abbrgrp><abbr bid="B4">4</abbr><abbr bid="B17">17</abbr><abbr bid="B35">35</abbr></abbrgrp>, we have studied performance of alignment tools using default parameters since fundamental differences in objective functions, scoring matrices, the type and values of parameters, and algorithmic design prevent a systematic exploration of parameter space.</p>
         <p>We have attempted to construct a realistic simulation of noncoding sequence evolution and test alignment performance for species with genomic properties similar to <it>Drosophila</it>. Noncoding alignment assessment for mammalian and other species with large, repeat-rich genomes would require modifications to our current simulation, such as the inclusion of ancestral repeats and lineage-specific transposition events. Moreover, as more becomes known about the substitution process in noncoding regions (especially those under weak primary sequence constraint), it will be important to implement more realistic models such as context-dependent substitution <abbrgrp><abbr bid="B36">36</abbr><abbr bid="B37">37</abbr><abbr bid="B38">38</abbr></abbrgrp>. It would be also instructive to assess alignment performance based on a simulation that decouples suppression of indel rates from substitution rates, given the possibility that the spacing (but not the primary sequence) between conserved noncoding segments may be constrained <abbrgrp><abbr bid="B31">31</abbr></abbrgrp>. In addition, though we have attempted to be systematic in our evaluation of tools, we unfortunately cannot have included all available pairwise alignment tools. As new pairwise alignment tools emerge and old tools are modified or brought to our attention, we will update our results periodically on the web using the same set of simulated alignments presented here <abbrgrp><abbr bid="B39">39</abbr></abbrgrp>. Moreover, assessment of tools which take advantage of the phylogenetic information and higher signal-to-noise inherent in multiple alignments will be an essential extension to this work to provide a more general evaluation of strategies for noncoding alignment.</p>
         <p>From the standpoint of the most biologically relevant simulation regime studied here (D, which includes indel evolution and interspersed blocks of constrained sequences), our results indicate that global alignment tools have the highest sensitivity in general to align orthologous sites accurately in noncoding sequences, as well as blocks of constrained sites (Figures <figr fid="F4">4D</figr>, <figr fid="F5">5D</figr>). We find that constraint sensitivity of the top global tools can be quite high (>75%) and limited only by sequence identity in constrained sites at intermediate divergence distances (1.25&#8211;3.0 substitutions per site), whereas overall sensitivity is relatively low beyond such intermediate divergence distances. The improved performance of global tools over local tools is largely a consequence of incomplete coverage of both constrained and unconstrained sites in alignments produced by local tools (Figure <figr fid="F3">3</figr>). The subset of sequences aligned by the highest performing local tools, however, is accurately aligned and specifically corresponds to constrained sites (Figure <figr fid="F6">6</figr>). In fact, most local tools can effectively discriminate between constrained and unconstrained sites to greater than 80% specificity at intermediate divergence distances while the constrained portions of their alignments are nearly perfectly aligned at large divergence distances. Finally, when compared with regime C (which excludes indel evolution but includes interspersed constrained blocks), it is clear that our model of indel evolution affects alignment coverage, sensitivity and specificity, but not enough to overturn these major trends.</p>
         <p>These results have important implications for the analysis of functional noncoding sequences. First, if a researcher's goal is to align <ul>all</ul> constrained sites in a noncoding region, then a global tool like Lagan will reliably produce the best results, but will require post-processing to identify constrained sequences <abbrgrp><abbr bid="B40">40</abbr><abbr bid="B41">41</abbr></abbrgrp>. Conversely, if one's goal is to align <ul>only</ul> constrained blocks in a noncoding region, then a local tool like Chaos will reliably produce the best results, provided that complete recovery of all constrained sequences is not required. The distinct virtues of both global and local tools are currently incorporated in the output of only one alignment tool, DiAlign. For this reason, use of the global parse of DiAlign (DiAlign-G) can provide high coverage and sensitivity across entire noncoding regions, while use of the local parse of DiAlign (DiAlign-L) will specifically provide highly accurate alignments of blocks of constrained sites. In light of these results, we recommend the further development of global alignment tools that also output a local parse of high confidence local alignments contained within, which should be possible since local anchors are often used in the construction of the global alignment (e.g. <abbrgrp><abbr bid="B7">7</abbr><abbr bid="B8">8</abbr></abbrgrp>).</p>
         <p>Our results also indicate that for species with structural and evolutionary constraints on noncoding sequences such as those found in <it>Drosophila</it>, DiAlign can produce alignments with high coverage and sensitivity, as well as high specificity to detect constrained sites in the range of 1.25&#8211;3.0 substitutions per site. Since the divergence between <it>D. melanogaster </it>vs. <it>D. pseudoobscura </it>and between <it>C. elegans </it>vs. <it>C. briggsae </it>falls within this range, we suggest that the use of DiAlign for detecting functionally constrained noncoding sequences will prove successful in these taxa on a genomic scale. In contrast, our results also indicate that species pairs such as <it>H. sapiens </it>and. <it>M. musculus </it>may not be sufficiently diverged for a single pairwise comparison to provide the needed resolution to detect functionally constrained noncoding sequences, though differences in genome organization and evolution between flies and mammals require a more thorough evaluation of this claim. This conclusion, however, supports results based on Poisson modelling of point substitution that approximately 3 substitutions per site would be needed to detect functional constrained sites reliably in mammalian noncoding DNA <abbrgrp><abbr bid="B42">42</abbr></abbrgrp>.</p>
         <p>Finally, the results presented here also imply that biological and technical conditions exist with which to study with accuracy the evolutionary events underlying the process of <it>cis</it>-regulatory evolution in flies and worms. Current evolutionary models of <it>cis</it>-regulatory sequence divergence posit the gain and loss of transcription factor binding sites, even under constant functional constraints <abbrgrp><abbr bid="B43">43</abbr><abbr bid="B44">44</abbr></abbrgrp>. However, the absence of alignable binding sites in comparisons of divergent sequences may result from inaccuracies in alignment as well as the <it>bona fide </it>loss of transcription factor binding sites. We suggest that alignments of noncoding sequences using tools such as DiAlign in the range of 1.25&#8211;3.0 substitutions per site are of sufficient accuracy to measure binding site loss among divergent species pairs, such as the high levels recently reported in the genus <it>Drosophila </it><abbrgrp><abbr bid="B45">45</abbr><abbr bid="B46">46</abbr></abbrgrp>.</p>
      </sec>
      <sec>
         <st>
            <p>Conclusions</p>
         </st>
         <p>Our study demonstrates that recently developed alignment tools have the potential to produce biologically meaningful alignments of functional noncoding DNA on a genome scale. Continued development of alignment algorithms in conjunction with parameter optimization and continued benchmarking will be necessary to provide the highest quality genomic alignments under the wide diversity of genomic and evolutionary scenarios to be studied.</p>
      </sec>
      <sec>
         <st>
            <p>Methods</p>
         </st>
         <sec>
            <st>
               <p>Modelling input sequences for the simulation of <it>Drosophila </it>noncoding DNA</p>
            </st>
            <p>To generate biologically relevant input sequences for our simulation, we estimated properties of noncoding sequences in the genome sequences of the fruitfly, <it>D. melanogaster</it>. First we extracted all noncoding regions from the Release 3 <it>D. melanogaster </it>genomic sequences based on annotations in the Gadfly database <abbrgrp><abbr bid="B19">19</abbr><abbr bid="B20">20</abbr><abbr bid="B47">47</abbr></abbrgrp>. This was accomplished by masking all DNA corresponding to coding exons, producing inter-coding-exon intervals. Subsequent to extracting noncoding regions, transposable elements were masked using annotations in Gadfly to create "pre-integration" noncoding sequences. In our analysis, we chose to treat all noncoding sequences (intergenic, intronic, untranslated region) together since many noncoding sequences cannot be unambiguously categorized because of alternative splicing or alternative promoter usage. Moreover, previous results revealed that similar evolutionary constraints act on intergenic and intronic sequences in <it>Drosophila </it><abbrgrp><abbr bid="B25">25</abbr></abbrgrp>. Summary statistics of noncoding sequence lengths were calculated using the R statistical package (Figure <figr fid="F1">1</figr>) <abbrgrp><abbr bid="B48">48</abbr></abbrgrp>.</p>
            <p>The probabilistic dependence of adjacent bases in <it>D. melanogaster </it>noncoding sequences was assessed by Markov chain analysis in order to create an accurate model of random noncoding sequences <abbrgrp><abbr bid="B49">49</abbr></abbrgrp>. TE-masked noncoding sequences were concatenated, and n-mers of size 1 to 10 were counted. Counts of reverse complementing n-mers were averaged, and used to estimate frequencies of each n-mer <abbrgrp><abbr bid="B50">50</abbr></abbrgrp>. Based on these counts and frequencies, we determined the likelihood of Markov chains of orders 1 through 9 describing <it>Drosophila </it>noncoding sequences, and evaluated the likelihood of each Markov chain using the Bayesian information criterion <abbrgrp><abbr bid="B49">49</abbr><abbr bid="B51">51</abbr></abbrgrp>. This analysis revealed that <it>D. melanogaster </it>noncoding sequences are best modeled by a 7<sup>th</sup>-order Markov chain (data not shown). We therefore created the ancestral input sequences for our evolution simulations using a 7<sup>th</sup>-order Markov chain. We note that because our evolutionary simulation models bases independently (see below), the higher order structure of these ancestral input sequences was not maintained in the more divergent derived output sequences. Nevertheless, sequences generated by a 0<sup>th</sup>-order Markov chain gave qualitatively and quantitatively similar simulation and alignment results, with correlation among performance measures for the 0<sup>th</sup>-order and 7<sup>th</sup>-order generated sequences exceeding an r<sup>2 </sup>of 0.97 (data not shown).</p>
         </sec>
         <sec>
            <st>
               <p>Divergence estimates in flies, worms and mammals</p>
            </st>
            <p>Estimates of silent site divergence (K<sub>s</sub>) between <it>H. sapiens </it>vs. <it>M. musculus</it>, <it>C. elegans </it>vs. <it>C. briggsae</it>, and <it>D. melanogaster </it>vs. <it>D. pseudoobscura </it>were obtained using the yn00 method in PAML (version 3.13) <abbrgrp><abbr bid="B52">52</abbr><abbr bid="B53">53</abbr></abbrgrp>. The mean and median of K<sub>s </sub>were calculated for 29 fly, 193 worm, and 153 mammalian coding sequence alignments taken from references <abbrgrp><abbr bid="B31">31</abbr><abbr bid="B28">28</abbr></abbrgrp> and <abbrgrp><abbr bid="B26">26</abbr></abbrgrp>, respectively.</p>
         </sec>
         <sec>
            <st>
               <p>Simulating noncoding sequence divergence</p>
            </st>
            <p>Noncoding sequence evolution was simulated using a modified version of the sequence simulation program ROSE <abbrgrp><abbr bid="B9">9</abbr></abbrgrp>. In general, in the absence of large datasets of noncoding sequences from closely related <it>Drosophila </it>species, we have taken estimates of noncoding evolution from previous results reported in the literature. Beginning with ancestral sequences, evolution occurred on two descendent branches of equal length under the HKY model of point substitution <abbrgrp><abbr bid="B54">54</abbr></abbrgrp>, with a transition/transversion bias of 2 to reflect the nucleotide and transition biases observed in <it>Drosophila </it>noncoding sequences <abbrgrp><abbr bid="B25">25</abbr><abbr bid="B55">55</abbr><abbr bid="B56">56</abbr></abbrgrp>. The substitution rate was set to 0.01 such that a branch length unit was on average 0.01 substitutions per site. Total branch lengths spanned a range of divergence times from 0.25 to 5.0 substitutions per site. Insertion/deletion evolution was based on the length distribution of polymorphic indels estimated in <abbrgrp><abbr bid="B57">57</abbr></abbrgrp>, and occurred at a 10-fold lower rate than point substitution, approximating relative rates estimated in <abbrgrp><abbr bid="B22">22</abbr><abbr bid="B23">23</abbr></abbrgrp>.</p>
            <p>To model the evolution of constrained blocks in noncoding sequences a modification of the ROSE sequence simulation program was developed to map constraints on ancestral sequences onto derived sequences (available for download as ROSE version 1.3 from <abbrgrp><abbr bid="B58">58</abbr></abbrgrp>). Constraints on noncoding sequences were modelled as short blocks of highly conserved sequences typical of <it>cis</it>-regulatory sequences, and follow a lognormal distribution with parameters estimated in <abbrgrp><abbr bid="B25">25</abbr></abbrgrp>. On average, interspersed blocks of constrained sites accounted for 20% of the sites in ancestral sequences, a conservative estimate of constraint in <it>Drosophila </it>noncoding DNA <abbrgrp><abbr bid="B25">25</abbr></abbrgrp>. Parameters used in our simulations are summarized in Table <tblr tid="T1">1</tblr>.</p>
            <p>Estimation of evolutionary distance for simulated alignments was performed using the F84 model of sequence evolution in the DnaDist program of the PHYLIP package <abbrgrp><abbr bid="B59">59</abbr></abbrgrp> with a transition:transversion ratio of 1.0 (note that a transition:transversion ratio of 1.0 in PHYLIP is equivalent to a transition/transversion bias of 2 in ROSE, see discussion in <abbrgrp><abbr bid="B53">53</abbr></abbrgrp>). Summary statistics for the simulations were calculated using the R statistical package (Figure <figr fid="F2">2</figr>) <abbrgrp><abbr bid="B48">48</abbr></abbrgrp>.</p>
         </sec>
         <sec>
            <st>
               <p>Tools for aligning noncoding DNA</p>
            </st>
            <p>The alignment tools tested in this study were chosen based on the criteria that they are (1) publicly available, (2) run in batch mode from the command line and are able to produce (3) strictly co-linear, (4) error-free, pairwise genomic alignments of sequences (5) up to 10 Kb in length. Tools like BBA <abbrgrp><abbr bid="B60">60</abbr></abbrgrp> (5), Bl2seq <abbrgrp><abbr bid="B61">61</abbr></abbrgrp> (3), DBA <abbrgrp><abbr bid="B62">62</abbr></abbrgrp> (4), MUMmer <abbrgrp><abbr bid="B63">63</abbr></abbrgrp> (3), Owen <abbrgrp><abbr bid="B64">64</abbr></abbrgrp> (2) and SSEARCH <abbrgrp><abbr bid="B65">65</abbr></abbrgrp> (3) were not evaluated since they do not satisfy one of these criteria. We now briefly describe the tools that we tested.</p>
            <p>Avid <abbrgrp><abbr bid="B7">7</abbr></abbrgrp> is a pairwise global alignment tool whose general strategy for aligning two sequences is to anchor and align iteratively. A set of maximal (but not necessarily unique) matches between the sequences is constructed using a suffix tree. Dynamic programming is used to order and orient the longest matches, which are then fixed. For each subsequence remaining between the fixed matches, the process is repeated until every base is aligned. When sequences are short and the matches make up less than half of the total sequence, the program defaults to the Needleman-Wunsch algorithm <abbrgrp><abbr bid="B66">66</abbr></abbrgrp>.</p>
            <p>The Chaos/Lagan <abbrgrp><abbr bid="B8">8</abbr></abbrgrp> suite of tools consists of a pairwise local alignment tool, Chaos, and a global alignment tool, Lagan. Chaos starts by finding all words between the two sequences of a specified length and a specified maximum number of mismatches. These words are then chained together if they are close together in both sequences. These maximal chains are then scored and all chains that are above a specified threshold are returned. Lagan starts by running Chaos with conservative parameter settings and then finds the optimal path through the maximal chains using dynamic programming. Lagan then recursively calls Chaos with increasingly more permissive parameters on the regions between each maximal chain in the optimal path. When the recursion has created a dense map of maximal chains that have been ordered with dynamic programming, Lagan runs the Needleman-Wunsch algorithm on the whole length of both sequences but puts close bounds around the maximal chains to provide the final global alignment. Chaos was run on default parameters as well as using parameters suggested by the authors: word length = 7, number of degeneracies = 1, score cut-off = 20 and extension mode on.</p>
            <p>BlastZ <abbrgrp><abbr bid="B67">67</abbr></abbrgrp> is a pairwise local alignment tool that is based on the gapped BLAST algorithm that has been redesigned for the alignment of long genomic sequences. BlastZ first removes lineage-specific interspersed repeats from each sequence, then searches for short near-perfect matches between the two sequences. Each match is extended first using gap-free dynamic programming and if it scores above a specified threshold it will be extended using dynamic programming with gaps; extended matches that score above a specified threshold are then kept. Part of the unique implementation of BlastZ is that it can be forced to return alignments that are both unique within each sequence as well as collinear with respect to each other. To satisfy our strict collinear requirement, we ran BlastZ with both of these options. Blastz was also run using the author's suggestion of lowering the score cut-off (k) to 2000 (BlastZ-A).</p>
            <p>DiAlign (v. 2.1) <abbrgrp><abbr bid="B68">68</abbr></abbrgrp> is a segment-to-segment alignment algorithm. Like the BLAST algorithms, DiAlign looks for short ungapped segments that have a similarity that deviates from what would be expected by random chance, keeping segments with a score above a certain threshold. These high scoring segments are then aligned into a collinear global alignment using a dynamic programming algorithm. DiAlign produces a global alignment but distinguishes high confidence columns of an alignment from low confidence columns. We used DiAlign as both a global (DiAlign-G) and a local (DiAlign-L) alignment tool.</p>
            <p>ClustalW (v. 1.8) <abbrgrp><abbr bid="B71">71</abbr></abbrgrp> was used on default settings. ClustalW is a progressive multiple alignment tool that reduces to the Needleman-Wunsch algorithm in the pair-wise case with default parameters of a match score of 1.9, mismatch penalty of 0, a gap open penalty of 10 and a gap extension penalty of 0.1.</p>
            <p>The second implementation of the Needleman-Wunsch algorithm used in this study is the needle program in the EMBOSS suite of tools <abbrgrp><abbr bid="B70">70</abbr></abbrgrp>. needle was used with default parameter settings of a match score of 5, a mismatch penalty of 4, a gap open penalty of 10 and a gap extension penalty of 0.5.</p>
            <p>The final tool tested, WABA <abbrgrp><abbr bid="B71">71</abbr></abbrgrp>, is a three-tier alignment algorithm. The first tier partitions the first sequence into overlapping windows of 2 Kb and then defines a synteny map of high scoring 2 Kb windows of the first sequence onto the second sequence. The second tier then carefully aligns syntenic regions using a seven-state, pair Hidden Markov Model that includes separate query and database insertion/deletion states, high and low noncoding conservation states, as well as three coding states (one for each position in a codon). The final tier then attempts to assemble individual alignments together into a more global alignment.</p>
         </sec>
         <sec>
            <st>
               <p>Alignment performance measures</p>
            </st>
            <p>The performance of alignment tools was assessed using six basic measures: overall coverage, overall sensitivity, constraint coverage, constraint sensitivity, constraint specificity and local constraint sensitivity. Overall coverage and overall sensitivity were measured for all four evolutionary regimes (A-D) while the constraint measures were only measured in the two regimes that included constrained blocks (C, D). Alignments produced by each alignment tool were parsed to generate the statistics, which were then used to calculate each performance measure.</p>
            <p>Each site in an alignment produced by a tool (a site being a base in one strand of a column of an alignment) can have two simulated alignment states, two constraint states, three tool alignment states, and two conditional tool alignment states. The two simulated alignment states are "homolog" (h), ungapped sites in the simulated alignments, and "no homolog" (nh), gapped sites in the simulated alignments. Simulations without indel evolution have only homolog sites since there are no gaps in the simulated alignments. The two constraint states are "constrained" (c), sites in constraint blocks, and "unconstrained" (u), sites not in constrained blocks. The three tool alignment states are "aligned" (a), sites aligned in the tool alignment, "gapped" (g), sites gapped in the tool alignment, and "not aligned" (na), sites not included in a local tool alignment. The two conditional tool alignment states are "aligned correctly" (ac), sites aligned to the same site in both the tool and simulated alignments, and "aligned incorrectly" (ai), sites aligned to different sites in the tool and simulated alignments. There are fourteen possible combinations of these states (e.g. homolog constrained aligned correctly, h_c_ac), giving us fourteen statistics to calculate for each estimated alignment. Counts for each statistic were used to calculate the following measures:</p>
            <p>Overall coverage is the fraction of ungapped sites in a simulated alignment that are included in a tool alignment. Overall Coverage = (h_c_ac + h_c_ai + h_c_g + h_u_ac + h_u_ai + h_u_g) / (h_c_ac + h_c_ai + h_c_g + h_c_na + h_u_ac + h_u_ai + h_u_g + h_u_na)</p>
            <p>Overall sensitivity is the fraction of ungapped sites in a simulated alignment that are aligned to the correct base in a tool alignment. Overall Sensitivity = (h_c_ac + h_u_ac) / (h_c_ac + h_c_ai + h_c_g + h_c_na + h_u_ac + h_u_ai + h_u_g + h_u_na)</p>
            <p>Constraint coverage is the fraction of ungapped constrained sites in a simulated alignment that are included in a tool alignment. Constraint Coverage = (h_c_ac + h_c_ai + h_c_g) / (h_c_ac + h_c_ai + h_c_g + h_c_na)</p>
            <p>Constraint sensitivity is the fraction of ungapped constrained sites in a simulated alignment that are aligned to the correct base in a tool alignment. Constraint Sensitivity = (h_c_ac) / (h_c_ac + h_c_ai + h_c_g + h_c_na)</p>
            <p>Constraint specificity is the fraction of unconstrained sites in a simulated alignment that are gapped or not included in a tool alignment. Constraint Specificity = (h_u_g + h_u_na + nh_u_g + nh_u_na) / (h_u_ac + h_u_ai + h_u_g + h_u_na + nh_u_a + nh_u_g + nh_u_na)</p>
            <p>Local constraint sensitivity is the fraction of sites that are both, contained in a tool alignment and are ungapped constrained sites in a simulated alignment, that are aligned to the correct base in the tool alignment. Local Constraint Sensitivity = (h_c_ac) / (h_c_ac + h_c_ai + h_c_g)</p>
            <p>For each of these six measures, a mean and standard error of the mean were calculated for up to 1000 replicates (local tools do not always return an alignment and replicates which produced no alignment were not counted toward the mean) using R.</p>
         </sec>
      </sec>
      <sec>
         <st>
            <p>Authors' contributions</p>
         </st>
         <p>DAP conducted the sequence simulation, alignment accuracy experiments and analyses and drafted the manuscript. CMB conceived of the study, participated in its design and analyses, and drafted the manuscript. JS developed the simulation software. SEC and MBE provided computational infrastructure and participated in the coordination of the study. All authors read and approved the final manuscript.</p>
      </sec>
   </bdy>
   <bm>
      <ack>
         <sec>
            <st>
               <p>Acknowledgements</p>
            </st>
            <p>We thank Mark Yandell and Chris Mungall for providing scripts to extract noncoding sequences, Erwin Frise for assistance with cluster computing, and Cristian Castillo-Davis and Anton Nekrutenko for kindly providing worm and mammalian coding sequence alignments. We thank Venky Nandagopal for discussions on alignment specificity, Hunter Fraser, Emily Hare, Alan Moses, and Monty Slatkin for critical reading of the manuscript, and Alex Kondrashov and one anonymous reviewer for helpful comments on the manuscript. This work was supported by NIH grant HG00750 to G. Rubin. DAP is supported by NIH training grant T32 HG00047 to D. Rokhsar and J. Rine. CMB is supported by NIH training grant T32 HL07279 to E. Rubin and by a Royal Society USA Research Fellowship. MBE is a Pew Scholar in the Biomedical Sciences. Research was conducted at the Lawrence Berkeley National Laboratory under Department of Energy contract DE-AC0376SF00098.</p>
         </sec>
      </ack>
      <refgrp>
         <bibl id="B1" rating="0">
            <title>
               <p>Comparison of genomic DNA sequences: solved and unsolved problems.</p>
            </title>
            <aug>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Miller</snm>
                  <fnm>W</fnm>
               </au>
            </aug>
            <source>Bioinformatics</source>
            <pubdate>2001</pubdate>
            <volume>17</volume>
            <fpage>391</fpage>
            <lpage>397</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi" link="fulltext">10.1093/bioinformatics/17.5.391</pubid>
                  <pubid idtype="pmpid" link="fulltext">11331233</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B2" rating="0">
            <title>
               <p>Cross-species sequence comparisons: a review of methods and available resources</p>
            </title>
            <aug>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Frazer</snm>
                  <fnm>KA</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Elnitski</snm>
                  <fnm>L</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Church</snm>
                  <fnm>DM</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Dubchak</snm>
                  <fnm>I</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Hardison</snm>
                  <fnm>RC</fnm>
               </au>
            </aug>
            <source>Genome Res</source>
            <pubdate>2003</pubdate>
            <volume>13</volume>
            <fpage>1</fpage>
            <lpage>12</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi" link="fulltext">10.1101/gr.222003</pubid>
                  <pubid idtype="pmpid" link="fulltext">12529301</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B3" rating="0">
            <title>
               <p>Comparative analysis of multiple protein-sequence alignment methods</p>
            </title>
            <aug>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>McClure</snm>
                  <fnm>MA</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Vasi</snm>
                  <fnm>TK</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Fitch</snm>
                  <fnm>WM</fnm>
               </au>
            </aug>
            <source>Mol Biol Evol</source>
            <pubdate>1994</pubdate>
            <volume>11</volume>
            <fpage>571</fpage>
            <lpage>592</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">8078398</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B4" rating="0">
            <title>
               <p>A comprehensive comparison of multiple sequence alignment programs</p>
            </title>
            <aug>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Thompson</snm>
                  <fnm>JD</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Plewniak</snm>
                  <fnm>F</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Poch</snm>
                  <fnm>O</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>1999</pubdate>
            <volume>27</volume>
            <fpage>2682</fpage>
            <lpage>2690</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid" link="fulltext">148477</pubid>
                  <pubid idtype="pmpid" link="fulltext">10373585</pubid>
                  <pubid idtype="doi" link="fulltext">10.1093/nar/27.13.2682</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B5" rating="0">
            <title>
               <p>Large-scale comparison of protein sequence alignment algorithms with structure alignments</p>
            </title>
            <aug>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Sauder</snm>
                  <fnm>JM</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Arthur</snm>
                  <fnm>JW</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Dunbrack Jr</snm>
                  <fnm>RL</fnm>
               </au>
            </aug>
            <source>Proteins</source>
            <pubdate>2000</pubdate>
            <volume>40</volume>
            <fpage>6</fpage>
            <lpage>22</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi" link="fulltext">10.1002/(SICI)1097-0134(20000701)40:1&lt;6::AID-PROT30>3.0.CO;2-7</pubid>
                  <pubid idtype="pmpid" link="fulltext">10813826</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B6" rating="0">
            <title>
               <p>Assessing sequence comparison methods with reliable structurally identified distant evolutionary relationships</p>
            </title>
            <aug>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Brenner</snm>
                  <fnm>SE</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Chothia</snm>
                  <fnm>C</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Hubbard</snm>
                  <fnm>TJ</fnm>
               </au>
            </aug>
            <source>Proc Natl Acad Sci U S A</source>
            <pubdate>1998</pubdate>
            <volume>95</volume>
            <fpage>6073</fpage>
            <lpage>6078</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid" link="fulltext">27587</pubid>
                  <pubid idtype="pmpid" link="fulltext">9600919</pubid>
                  <pubid idtype="doi" link="fulltext">10.1073/pnas.95.11.6073</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B7" rating="0">
            <title>
               <p>AVID: A Global Alignment Program</p>
            </title>
            <aug>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Bray</snm>
                  <fnm>N</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Dubchak</snm>
                  <fnm>I</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Pachter</snm>
                  <fnm>L</fnm>
               </au>
            </aug>
            <source>Genome Res</source>
            <pubdate>2003</pubdate>
            <volume>13</volume>
            <fpage>97</fpage>
            <lpage>102</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi" link="fulltext">10.1101/gr.789803</pubid>
                  <pubid idtype="pmpid" link="fulltext">12529311</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B8" rating="0">
            <title>
               <p>LAGAN and Multi-LAGAN: Efficient Tools for Large-Scale Multiple Alignment of Genomic DNA</p>
            </title>
            <aug>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Brudno</snm>
                  <fnm>M</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Do</snm>
                  <fnm>CB</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Cooper</snm>
                  <fnm>GM</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Kim</snm>
                  <fnm>MF</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Davydov</snm>
                  <fnm>E</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Program</snm>
                  <fnm>NC</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Green</snm>
                  <fnm>ED</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Sidow</snm>
                  <fnm>A</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Batzoglou</snm>
                  <fnm>S</fnm>
               </au>
            </aug>
            <source>Genome Res</source>
            <pubdate>2003</pubdate>
         </bibl>
         <bibl id="B9" rating="0">
            <title>
               <p>Rose: generating sequence families</p>
            </title>
            <aug>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Stoye</snm>
                  <fnm>J</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Evers</snm>
                  <fnm>D</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Meyer</snm>
                  <fnm>F</fnm>
               </au>
            </aug>
            <source>Bioinformatics</source>
            <pubdate>1998</pubdate>
            <volume>14</volume>
            <fpage>157</fpage>
            <lpage>163</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi" link="fulltext">10.1093/bioinformatics/14.2.157</pubid>
                  <pubid idtype="pmpid" link="fulltext">9545448</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B10" rating="0">
            <title>
               <p>Application and accuracy of molecular phylogenies</p>
            </title>
            <aug>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Hillis</snm>
                  <fnm>DM</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Huelsenbeck</snm>
                  <fnm>JP</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Cunningham</snm>
                  <fnm>CW</fnm>
               </au>
            </aug>
            <source>Science</source>
            <pubdate>1994</pubdate>
            <volume>264</volume>
            <fpage>671</fpage>
            <lpage>677</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">8171318</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B11" rating="0">
            <title>
               <p>An evolutionary model for maximum likelihood alignment of DNA sequences</p>
            </title>
            <aug>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Thorne</snm>
                  <fnm>JL</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Kishino</snm>
                  <fnm>H</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Felsenstein</snm>
                  <fnm>J</fnm>
               </au>
            </aug>
            <source>J Mol Evol</source>
            <pubdate>1991</pubdate>
            <volume>33</volume>
            <fpage>114</fpage>
            <lpage>124</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">1920447</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B12" rating="0">
            <title>
               <p>Inching toward reality: an improved likelihood model of sequence evolution</p>
            </title>
            <aug>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Thorne</snm>
                  <fnm>JL</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Kishino</snm>
                  <fnm>H</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Felsenstein</snm>
                  <fnm>J</fnm>
               </au>
            </aug>
            <source>J Mol Evol</source>
            <pubdate>1992</pubdate>
            <volume>34</volume>
            <fpage>3</fpage>
            <lpage>16</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">1556741</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B13" rating="0">
            <title>
               <p>Dynamic programming alignment accuracy.</p>
            </title>
            <aug>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Holmes</snm>
                  <fnm>I</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Durbin</snm>
                  <fnm>R</fnm>
               </au>
            </aug>
            <source>J Comput Biol</source>
            <pubdate>1998</pubdate>
            <volume>5</volume>
            <fpage>493</fpage>
            <lpage>504</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">9773345</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B14" rating="0">
            <title>
               <p>Multiple sequence alignment with the Divide-and-Conquer method</p>
            </title>
            <aug>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Stoye</snm>
                  <fnm>J</fnm>
               </au>
            </aug>
            <source>Gene</source>
            <pubdate>1998</pubdate>
            <volume>211</volume>
            <fpage>GC45</fpage>
            <lpage>56</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi" link="fulltext">10.1016/S0378-1119(98)00097-3</pubid>
                  <pubid idtype="pmpid" link="fulltext">9669886</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B15" rating="0">
            <title>
               <p>Statistical alignment: computational properties, homology testing and goodness-of-fit</p>
            </title>
            <aug>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Hein</snm>
                  <fnm>J</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Wiuf</snm>
                  <fnm>C</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Knudsen</snm>
                  <fnm>B</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Moller</snm>
                  <fnm>MB</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Wibling</snm>
                  <fnm>G</fnm>
               </au>
            </aug>
            <source>J Mol Biol</source>
            <pubdate>2000</pubdate>
            <volume>302</volume>
            <fpage>265</fpage>
            <lpage>279</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi" link="fulltext">10.1006/jmbi.2000.4061</pubid>
                  <pubid idtype="pmpid" link="fulltext">10964574</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B16" rating="0">
            <title>
               <p>MAFFT: a novel method for rapid multiple sequence alignment based on fast Fourier transform</p>
            </title>
            <aug>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Katoh</snm>
                  <fnm>K</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Misawa</snm>
                  <fnm>K</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Kuma</snm>
                  <fnm>K</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Miyata</snm>
                  <fnm>T</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>2002</pubdate>
            <volume>30</volume>
            <fpage>3059</fpage>
            <lpage>3066</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid" link="fulltext">135756</pubid>
                  <pubid idtype="pmpid" link="fulltext">12136088</pubid>
                  <pubid idtype="doi" link="fulltext">10.1093/nar/gkf436</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B17" rating="0">
            <title>
               <p>Quality assessment of multiple alignment programs</p>
            </title>
            <aug>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Lassmann</snm>
                  <fnm>T</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Sonnhammer</snm>
                  <fnm>EL</fnm>
               </au>
            </aug>
            <source>FEBS Lett</source>
            <pubdate>2002</pubdate>
            <volume>529</volume>
            <fpage>126</fpage>
            <lpage>130</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi" link="fulltext">10.1016/S0014-5793(02)03189-7</pubid>
                  <pubid idtype="pmpid" link="fulltext">12354624</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B18" rating="0">
            <title>
               <p>Statistical alignment based on fragment insertion and deletion models</p>
            </title>
            <aug>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Metzler</snm>
                  <fnm>D</fnm>
               </au>
            </aug>
            <source>Bioinformatics</source>
            <pubdate>2003</pubdate>
            <volume>19</volume>
            <fpage>490</fpage>
            <lpage>499</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi" link="fulltext">10.1093/bioinformatics/btg026</pubid>
                  <pubid idtype="pmpid" link="fulltext">12611804</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B19" rating="0">
            <title>
               <p>Finishing a whole genome shotgun sequence assembly: Release 3 of the Drosophila euchromatic genome sequence.</p>
            </title>
            <aug>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Celniker</snm>
                  <fnm>SE</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Wheeler</snm>
                  <fnm>DA</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Kronmiller</snm>
                  <fnm>B</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Carlson</snm>
                  <fnm>JW</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Halpern</snm>
                  <fnm>A</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Patel</snm>
                  <fnm>S</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Adams</snm>
                  <fnm>M</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Champe</snm>
                  <fnm>M</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Dugan</snm>
                  <fnm>SP</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Frise</snm>
                  <fnm>E</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Hodgson</snm>
                  <fnm>A</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>George</snm>
                  <fnm>RA</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Hoskins</snm>
                  <fnm>RA</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Laverty</snm>
                  <fnm>T</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Muzny</snm>
                  <fnm>DM</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Nelson</snm>
                  <fnm>CR</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Pacleb</snm>
                  <fnm>JM</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Park</snm>
                  <fnm>S</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Pfeiffer</snm>
                  <fnm>BD</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Richards</snm>
                  <fnm>S</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Svirskas</snm>
                  <fnm>R</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Tabor</snm>
                  <fnm>PE</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Wan</snm>
                  <fnm>K</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Scherer</snm>
                  <fnm>SE</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Stapleton</snm>
                  <fnm>M</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Sutton</snm>
                  <fnm>GG</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Venter</snm>
                  <fnm>C</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Weinstock</snm>
                  <fnm>G</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Myers</snm>
                  <fnm>EW</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Gibbs</snm>
                  <fnm>RA</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Rubin</snm>
                  <fnm>GM</fnm>
               </au>
            </aug>
            <source>Genome Biology</source>
            <pubdate>2002</pubdate>
            <volume>3</volume>
            <fpage>research0079.1</fpage>
            <lpage>research0079.14</lpage>
            <xrefbib>
               <pubid idtype="doi" link="fulltext">10.1186/gb-2002-3-12-research0079</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B20" rating="0">
            <title>
               <p>Annotation of the Drosophila euchromatic genome: a systematic review.</p>
            </title>
            <aug>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Misra</snm>
                  <fnm>S</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Crosby</snm>
                  <fnm>MA</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Mungall</snm>
                  <fnm>CJ</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Matthews</snm>
                  <fnm>BB</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Campbell</snm>
                  <fnm>K</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Hradecky</snm>
                  <fnm>P</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Huang</snm>
                  <fnm>Y</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Kaminker</snm>
                  <fnm>JS</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Millburn</snm>
                  <fnm>GH</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Prochnik</snm>
                  <fnm>SE</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Smith</snm>
                  <fnm>CD</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Tupy</snm>
                  <fnm>JL</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Whitfield</snm>
                  <fnm>EJ</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Bayraktaroglu</snm>
                  <fnm>L</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Berman</snm>
                  <fnm>BP</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Celniker</snm>
                  <fnm>SE</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>A.D.N.J.</snm>
                  <fnm>de Grey.</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Drysdale</snm>
                  <fnm>RA</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Harris</snm>
                  <fnm>NL</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Richter</snm>
                  <fnm>J</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Russo</snm>
                  <fnm>S</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Shu</snm>
                  <fnm>S</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Stapleton</snm>
                  <fnm>M</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Yamada</snm>
                  <fnm>C</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Ashburner</snm>
                  <fnm>M</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Gelbart</snm>
                  <fnm>WM</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Rubin</snm>
                  <fnm>GM</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Lewis</snm>
                  <fnm>SE</fnm>
               </au>
            </aug>
            <source>Genome Biology</source>
            <pubdate>2002</pubdate>
            <volume>3</volume>
            <fpage>research0083.1</fpage>
            <lpage>research0083.22</lpage>
            <xrefbib>
               <pubid idtype="doi" link="fulltext">10.1186/gb-2002-3-12-research0083</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B21" rating="0">
            <title>
               <p>Baylor College of Medicine Drosophila Genome Project</p>
            </title>
            <url>http://www.hgsc.bcm.tmc.edu/projects/drosophila/</url>
         </bibl>
         <bibl id="B22" rating="0">
            <title>
               <p>High intrinsic rate of DNA loss in Drosophila.</p>
            </title>
            <aug>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Petrov</snm>
                  <fnm>DA</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Lozovskaya</snm>
                  <fnm>ER</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Hartl</snm>
                  <fnm>DL</fnm>
               </au>
            </aug>
            <source>Nature</source>
            <pubdate>1996</pubdate>
            <volume>384</volume>
            <fpage>346</fpage>
            <lpage>349</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi" link="fulltext">10.1038/384346a0</pubid>
                  <pubid idtype="pmpid" link="fulltext">8934517</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B23" rating="0">
            <title>
               <p>High rate of DNA loss in the Drosophila melanogaster and Drosophila virilis species groups.</p>
            </title>
            <aug>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Petrov</snm>
                  <fnm>DA</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Hartl</snm>
                  <fnm>DL</fnm>
               </au>
            </aug>
            <source>Mol Biol Evol</source>
            <pubdate>1998</pubdate>
            <volume>15</volume>
            <fpage>293</fpage>
            <lpage>302</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">9501496</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B24" rating="0">
            <title>
               <p>The transposable elements of the Drosophila melanogaster euchromatin &#8211; a genomics perspective</p>
            </title>
            <aug>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Kaminker</snm>
                  <fnm>JS</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Bergman</snm>
                  <fnm>CM</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Kronmiller</snm>
                  <fnm>B</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Carlson</snm>
                  <fnm>J</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Svirskas</snm>
                  <fnm>R</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Patel</snm>
                  <fnm>S</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Frise</snm>
                  <fnm>E</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Wheeler</snm>
                  <fnm>DA</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Lewis</snm>
                  <fnm>S</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Rubin</snm>
                  <fnm>GM</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Ashburner</snm>
                  <fnm>A</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Celniker</snm>
                  <fnm>SE</fnm>
               </au>
            </aug>
            <source>Genome Biology</source>
            <pubdate>2002</pubdate>
            <volume>3</volume>
            <fpage>research0084</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid" link="fulltext">151186</pubid>
                  <pubid idtype="pmpid" link="fulltext">12537573</pubid>
                  <pubid idtype="doi" link="fulltext">10.1186/gb-2002-3-12-research0084</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B25" rating="0">
            <title>
               <p>Analysis of conserved noncoding DNA in Drosophila reveals similar constraints in intergenic and intronic sequences.</p>
            </title>
            <aug>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Bergman</snm>
                  <fnm>CM</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Kreitman</snm>
                  <fnm>M</fnm>
               </au>
            </aug>
            <source>Genome Res</source>
            <pubdate>2001</pubdate>
            <volume>11</volume>
            <fpage>1335</fpage>
            <lpage>1345</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi" link="fulltext">10.1101/gr.178701</pubid>
                  <pubid idtype="pmpid" link="fulltext">11483574</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B26" rating="0">
            <title>
               <p>The K(A)/K(S) ratio test for assessing the protein-coding potential of genomic regions: an empirical and simulation study.</p>
            </title>
            <aug>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Nekrutenko</snm>
                  <fnm>A</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Makova</snm>
                  <fnm>KD</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Li</snm>
                  <fnm>WH</fnm>
               </au>
            </aug>
            <source>Genome Res</source>
            <pubdate>2002</pubdate>
            <volume>12</volume>
            <fpage>198</fpage>
            <lpage>202</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid" link="fulltext">155263</pubid>
                  <pubid idtype="pmpid" link="fulltext">11779845</pubid>
                  <pubid idtype="doi" link="fulltext">10.1101/gr.200901</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B27" rating="0">
            <title>
               <p>Initial sequencing and comparative analysis of the mouse genome</p>
            </title>
            <aug>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Waterston</snm>
                  <fnm>RH</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Lindblad-Toh</snm>
                  <fnm>K</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Birney</snm>
                  <fnm>E</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Rogers</snm>
                  <fnm>J</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Abril</snm>
                  <fnm>JF</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Agarwal</snm>
                  <fnm>P</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Agarwala</snm>
                  <fnm>R</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Ainscough</snm>
                  <fnm>R</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Alexandersson</snm>
                  <fnm>M</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>An</snm>
                  <fnm>P</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Antonarakis</snm>
                  <fnm>SE</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Attwood</snm>
                  <fnm>J</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Baertsch</snm>
                  <fnm>R</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Bailey</snm>
                  <fnm>J</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Barlow</snm>
                  <fnm>K</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Beck</snm>
                  <fnm>S</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Berry</snm>
                  <fnm>E</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Birren</snm>
                  <fnm>B</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Bloom</snm>
                  <fnm>T</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Bork</snm>
                  <fnm>P</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Botcherby</snm>
                  <fnm>M</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Bray</snm>
                  <fnm>N</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Brent</snm>
                  <fnm>MR</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Brown</snm>
                  <fnm>DG</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Brown</snm>
                  <fnm>SD</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Bult</snm>
                  <fnm>C</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Burton</snm>
                  <fnm>J</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Butler</snm>
                  <fnm>J</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Campbell</snm>
                  <fnm>RD</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Carninci</snm>
                  <fnm>P</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Cawley</snm>
                  <fnm>S</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Chiaromonte</snm>
                  <fnm>F</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Chinwalla</snm>
                  <fnm>AT</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Church</snm>
                  <fnm>DM</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Clamp</snm>
                  <fnm>M</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Clee</snm>
                  <fnm>C</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Collins</snm>
                  <fnm>FS</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Cook</snm>
                  <fnm>LL</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Copley</snm>
                  <fnm>RR</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Coulson</snm>
                  <fnm>A</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Couronne</snm>
                  <fnm>O</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Cuff</snm>
                  <fnm>J</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Curwen</snm>
                  <fnm>V</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Cutts</snm>
                  <fnm>T</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Daly</snm>
                  <fnm>M</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>David</snm>
                  <fnm>R</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Davies</snm>
                  <fnm>J</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Delehaunty</snm>
                  <fnm>KD</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Deri</snm>
                  <fnm>J</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Dermitzakis</snm>
                  <fnm>ET</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Dewey</snm>
                  <fnm>C</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Dickens</snm>
                  <fnm>NJ</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Diekhans</snm>
                  <fnm>M</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Dodge</snm>
                  <fnm>S</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Dubchak</snm>
                  <fnm>I</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Dunn</snm>
                  <fnm>DM</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Eddy</snm>
                  <fnm>SR</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Elnitski</snm>
                  <fnm>L</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Emes</snm>
                  <fnm>RD</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Eswara</snm>
                  <fnm>P</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Eyras</snm>
                  <fnm>E</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Felsenfeld</snm>
                  <fnm>A</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Fewell</snm>
                  <fnm>GA</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Flicek</snm>
                  <fnm>P</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Foley</snm>
                  <fnm>K</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Frankel</snm>
                  <fnm>WN</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Fulton</snm>
                  <fnm>LA</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Fulton</snm>
                  <fnm>RS</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Furey</snm>
                  <fnm>TS</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Gage</snm>
                  <fnm>D</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Gibbs</snm>
                  <fnm>RA</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Glusman</snm>
                  <fnm>G</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Gnerre</snm>
                  <fnm>S</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Goldman</snm>
                  <fnm>N</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Goodstadt</snm>
                  <fnm>L</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Grafham</snm>
                  <fnm>D</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Graves</snm>
                  <fnm>TA</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Green</snm>
                  <fnm>ED</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Gregory</snm>
                  <fnm>S</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Guigo</snm>
                  <fnm>R</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Guyer</snm>
                  <fnm>M</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Hardison</snm>
                  <fnm>RC</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Haussler</snm>
                  <fnm>D</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Hayashizaki</snm>
                  <fnm>Y</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Hillier</snm>
                  <fnm>LW</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Hinrichs</snm>
                  <fnm>A</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Hlavina</snm>
                  <fnm>W</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Holzer</snm>
                  <fnm>T</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Hsu</snm>
                  <fnm>F</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Hua</snm>
                  <fnm>A</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Hubbard</snm>
                  <fnm>T</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Hunt</snm>
                  <fnm>A</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Jackson</snm>
                  <fnm>I</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Jaffe</snm>
                  <fnm>DB</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Johnson</snm>
                  <fnm>LS</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Jones</snm>
                  <fnm>M</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Jones</snm>
                  <fnm>TA</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Joy</snm>
                  <fnm>A</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Kamal</snm>
                  <fnm>M</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Karlsson</snm>
                  <fnm>EK</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Karolchik</snm>
                  <fnm>D</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Kasprzyk</snm>
                  <fnm>A</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Kawai</snm>
                  <fnm>J</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Keibler</snm>
                  <fnm>E</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Kells</snm>
                  <fnm>C</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Kent</snm>
                  <fnm>WJ</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Kirby</snm>
                  <fnm>A</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Kolbe</snm>
                  <fnm>DL</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Korf</snm>
                  <fnm>I</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Kucherlapati</snm>
                  <fnm>RS</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Kulbokas</snm>
                  <fnm>EJ</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Kulp</snm>
                  <fnm>D</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Landers</snm>
                  <fnm>T</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Leger</snm>
                  <fnm>JP</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Leonard</snm>
                  <fnm>S</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Letunic</snm>
                  <fnm>I</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Levine</snm>
                  <fnm>R</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Li</snm>
                  <fnm>J</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Li</snm>
                  <fnm>M</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Lloyd</snm>
                  <fnm>C</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Lucas</snm>
                  <fnm>S</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Ma</snm>
                  <fnm>B</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Maglott</snm>
                  <fnm>DR</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Mardis</snm>
                  <fnm>ER</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Matthews</snm>
                  <fnm>L</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Mauceli</snm>
                  <fnm>E</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Mayer</snm>
                  <fnm>JH</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>McCarthy</snm>
                  <fnm>M</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>McCombie</snm>
                  <fnm>WR</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>McLaren</snm>
                  <fnm>S</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>McLay</snm>
                  <fnm>K</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>McPherson</snm>
                  <fnm>JD</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Meldrim</snm>
                  <fnm>J</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Meredith</snm>
                  <fnm>B</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Mesirov</snm>
                  <fnm>JP</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Miller</snm>
                  <fnm>W</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Miner</snm>
                  <fnm>TL</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Mongin</snm>
                  <fnm>E</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Montgomery</snm>
                  <fnm>KT</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Morgan</snm>
                  <fnm>M</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Mott</snm>
                  <fnm>R</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Mullikin</snm>
                  <fnm>JC</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Muzny</snm>
                  <fnm>DM</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Nash</snm>
                  <fnm>WE</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Nelson</snm>
                  <fnm>JO</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Nhan</snm>
                  <fnm>MN</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Nicol</snm>
                  <fnm>R</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Ning</snm>
                  <fnm>Z</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Nusbaum</snm>
                  <fnm>C</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>O'Connor</snm>
                  <fnm>MJ</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Okazaki</snm>
                  <fnm>Y</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Oliver</snm>
                  <fnm>K</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Overton-Larty</snm>
                  <fnm>E</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Pachter</snm>
                  <fnm>L</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Parra</snm>
                  <fnm>G</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Pepin</snm>
                  <fnm>KH</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Peterson</snm>
                  <fnm>J</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Pevzner</snm>
                  <fnm>P</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Plumb</snm>
                  <fnm>R</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Pohl</snm>
                  <fnm>CS</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Poliakov</snm>
                  <fnm>A</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Ponce</snm>
                  <fnm>TC</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Ponting</snm>
                  <fnm>CP</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Potter</snm>
                  <fnm>S</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Quail</snm>
                  <fnm>M</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Reymond</snm>
                  <fnm>A</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Roe</snm>
                  <fnm>BA</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Roskin</snm>
                  <fnm>KM</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Rubin</snm>
                  <fnm>EM</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Rust</snm>
                  <fnm>AG</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Santos</snm>
                  <fnm>R</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Sapojnikov</snm>
                  <fnm>V</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Schultz</snm>
                  <fnm>B</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Schultz</snm>
                  <fnm>J</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Schwartz</snm>
                  <fnm>MS</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Schwartz</snm>
                  <fnm>S</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Scott</snm>
                  <fnm>C</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Seaman</snm>
                  <fnm>S</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Searle</snm>
                  <fnm>S</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Sharpe</snm>
                  <fnm>T</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Sheridan</snm>
                  <fnm>A</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Shownkeen</snm>
                  <fnm>R</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Sims</snm>
                  <fnm>S</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Singer</snm>
                  <fnm>JB</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Slater</snm>
                  <fnm>G</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Smit</snm>
                  <fnm>A</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Smith</snm>
                  <fnm>DR</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Spencer</snm>
                  <fnm>B</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Stabenau</snm>
                  <fnm>A</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Stange-Thomann</snm>
                  <fnm>N</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Sugnet</snm>
                  <fnm>C</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Suyama</snm>
                  <fnm>M</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Tesler</snm>
                  <fnm>G</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Thompson</snm>
                  <fnm>J</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Torrents</snm>
                  <fnm>D</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Trevaskis</snm>
                  <fnm>E</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Tromp</snm>
                  <fnm>J</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Ucla</snm>
                  <fnm>C</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Ureta-Vidal</snm>
                  <fnm>A</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Vinson</snm>
                  <fnm>JP</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Von Niederhausern</snm>
                  <fnm>AC</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Wade</snm>
                  <fnm>CM</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Wall</snm>
                  <fnm>M</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Weber</snm>
                  <fnm>RJ</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Weiss</snm>
                  <fnm>RB</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Wendl</snm>
                  <fnm>MC</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>West</snm>
                  <fnm>AP</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Wetterstrand</snm>
                  <fnm>K</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Wheeler</snm>
                  <fnm>R</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Whelan</snm>
                  <fnm>S</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Wierzbowski</snm>
                  <fnm>J</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Willey</snm>
                  <fnm>D</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Williams</snm>
                  <fnm>S</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Wilson</snm>
                  <fnm>RK</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Winter</snm>
                  <fnm>E</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Worley</snm>
                  <fnm>KC</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Wyman</snm>
                  <fnm>D</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Yang</snm>
                  <fnm>S</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Yang</snm>
                  <fnm>SP</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Zdobnov</snm>
                  <fnm>EM</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Zody</snm>
                  <fnm>MC</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Lander</snm>
                  <fnm>ES</fnm>
               </au>
            </aug>
            <source>Nature</source>
            <pubdate>2002</pubdate>
            <volume>420</volume>
            <fpage>520</fpage>
            <lpage>562</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi" link="fulltext">10.1038/nature01262</pubid>
                  <pubid idtype="pmpid" link="fulltext">12466850</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B28" rating="0">
            <title>
               <p>Genome evolution and developmental constraint in Caenorhabditis elegans</p>
            </title>
            <aug>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Castillo-Davis</snm>
                  <fnm>CI</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Hartl</snm>
                  <fnm>DL</fnm>
               </au>
            </aug>
            <source>Mol Biol Evol</source>
            <pubdate>2002</pubdate>
            <volume>19</volume>
            <fpage>728</fpage>
            <lpage>735</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">11961106</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B29" rating="0">
            <title>
               <p>The Genome Sequence of Caenorhabditis briggsae: A Platform for Comparative Genomics</p>
            </title>
            <aug>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Stein</snm>
                  <fnm>LD</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Bao</snm>
                  <fnm>Z</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Blasiar</snm>
                  <fnm>D</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Blumenthal</snm>
                  <fnm>T</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Brent</snm>
                  <fnm>MR</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Chen</snm>
                  <fnm>N</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Chinwalla</snm>
                  <fnm>A</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Clarke</snm>
                  <fnm>L</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Clee</snm>
                  <fnm>C</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Coghlan</snm>
                  <fnm>A</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Coulson</snm>
                  <fnm>A</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>D'Eustachio</snm>
                  <fnm>P</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Fitch</snm>
                  <fnm>DH</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Fulton</snm>
                  <fnm>LA</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Fulton</snm>
                  <fnm>RE</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Griffiths-Jones</snm>
                  <fnm>S</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Harris</snm>
                  <fnm>TW</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Hillier</snm>
                  <fnm>LW</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Kamath</snm>
                  <fnm>R</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Kuwabara</snm>
                  <fnm>PE</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Mardis</snm>
                  <fnm>ER</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Marra</snm>
                  <fnm>MA</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Miner</snm>
                  <fnm>TL</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Minx</snm>
                  <fnm>P</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Mullikin</snm>
                  <fnm>JC</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Plumb</snm>
                  <fnm>RW</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Rogers</snm>
                  <fnm>J</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Schein</snm>
                  <fnm>JE</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Sohrmann</snm>
                  <fnm>M</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Spieth</snm>
                  <fnm>J</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Stajich</snm>
                  <fnm>JE</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Wei</snm>
                  <fnm>C</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Willey</snm>
                  <fnm>D</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Wilson</snm>
                  <fnm>RK</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Durbin</snm>
                  <fnm>R</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Waterston</snm>
                  <fnm>RH</fnm>
               </au>
            </aug>
            <source>PLoS Biol</source>
            <pubdate>2003</pubdate>
            <volume>1</volume>
            <fpage>E45</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid" link="fulltext">261899</pubid>
                  <pubid idtype="pmpid" link="fulltext">14624247</pubid>
                  <pubid idtype="doi" link="fulltext">10.1371/journal.pbio.0000045</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B30" rating="0">
            <title>
               <p>The molecular clock revisited: the rate of synonymous vs. replacement change in Drosophila.</p>
            </title>
            <aug>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Zeng</snm>
                  <fnm>LW</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Comeron</snm>
                  <fnm>JM</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Chen</snm>
                  <fnm>B</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Kreitman</snm>
                  <fnm>M</fnm>
               </au>
            </aug>
            <source>Genetica</source>
            <pubdate>1998</pubdate>
            <volume>102-103</volume>
            <fpage>369</fpage>
            <lpage>382</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi" link="fulltext">10.1023/A:1017035109224</pubid>
                  <pubid idtype="pmpid" link="fulltext">9720289</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B31" rating="0">
            <title>
               <p>Assessing the impact of comparative genomic sequences data on the functional annotation of the Drosophila genome.</p>
            </title>
            <aug>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Bergman</snm>
                  <fnm>CM</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Pfeiffer</snm>
                  <fnm>BD</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Rinc&#243;n-Limas</snm>
                  <fnm>DE</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Hoskins</snm>
                  <fnm>RA</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Gnirke</snm>
                  <fnm>A</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Mungall</snm>
                  <fnm>CJ</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Wang</snm>
                  <fnm>AM</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Kronmiller</snm>
                  <fnm>B</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Pacleb</snm>
                  <fnm>J</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Park</snm>
                  <fnm>S</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Stapleton</snm>
                  <fnm>M</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Wan</snm>
                  <fnm>K</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>George</snm>
                  <fnm>R</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>de Jong</snm>
                  <fnm>PJ</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Botas</snm>
                  <fnm>J</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Rubin</snm>
                  <fnm>GM</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Celniker</snm>
                  <fnm>SE</fnm>
               </au>
            </aug>
            <source>Genome Biology</source>
            <pubdate>2002</pubdate>
            <volume>3</volume>
            <fpage>research0086.1</fpage>
            <lpage>research0086.20</lpage>
            <xrefbib>
               <pubid idtype="doi" link="fulltext">10.1186/gb-2002-3-12-research0086</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B32" rating="0">
            <title>
               <p>Generating benchmarks for multiple sequence alignments and phylogenetic reconstructions</p>
            </title>
            <aug>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Stoye</snm>
                  <fnm>J</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Evers</snm>
                  <fnm>D</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Meyer</snm>
                  <fnm>F</fnm>
               </au>
            </aug>
            <source>Proc Int Conf Intell Syst Mol Biol</source>
            <pubdate>1997</pubdate>
            <volume>5</volume>
            <fpage>303</fpage>
            <lpage>306</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">9322053</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B33" rating="0">
            <title>
               <p>How intron splicing affects the deletion and insertion profile in Drosophila melanogaster</p>
            </title>
            <aug>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Ptak</snm>
                  <fnm>SE</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Petrov</snm>
                  <fnm>DA</fnm>
               </au>
            </aug>
            <source>Genetics</source>
            <pubdate>2002</pubdate>
            <volume>162</volume>
            <fpage>1233</fpage>
            <lpage>1244</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">12454069</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B34" rating="0">
            <title>
               <p>Comparative analyses of multi-species sequences from targeted genomic regions</p>
            </title>
            <aug>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Thomas</snm>
                  <fnm>JW</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Touchman</snm>
                  <fnm>JW</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Blakesley</snm>
                  <fnm>RW</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Bouffard</snm>
                  <fnm>GG</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Beckstrom-Sternberg</snm>
                  <fnm>SM</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Margulies</snm>
                  <fnm>EH</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Blanchette</snm>
                  <fnm>M</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Siepel</snm>
                  <fnm>AC</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Thomas</snm>
                  <fnm>PJ</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>McDowell</snm>
                  <fnm>JC</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Maskeri</snm>
                  <fnm>B</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Hansen</snm>
                  <fnm>NF</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Schwartz</snm>
                  <fnm>MS</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Weber</snm>
                  <fnm>RJ</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Kent</snm>
                  <fnm>WJ</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Karolchik</snm>
                  <fnm>D</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Bruen</snm>
                  <fnm>TC</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Bevan</snm>
                  <fnm>R</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Cutler</snm>
                  <fnm>DJ</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Schwartz</snm>
                  <fnm>S</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Elnitski</snm>
                  <fnm>L</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Idol</snm>
                  <fnm>JR</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Prasad</snm>
                  <fnm>AB</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Lee-Lin</snm>
                  <fnm>SQ</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Maduro</snm>
                  <fnm>VV</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Summers</snm>
                  <fnm>TJ</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Portnoy</snm>
                  <fnm>ME</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Dietrich</snm>
                  <fnm>NL</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Akhter</snm>
                  <fnm>N</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Ayele</snm>
                  <fnm>K</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Benjamin</snm>
                  <fnm>B</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Cariaga</snm>
                  <fnm>K</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Brinkley</snm>
                  <fnm>CP</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Brooks</snm>
                  <fnm>SY</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Granite</snm>
                  <fnm>S</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Guan</snm>
                  <fnm>X</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Gupta</snm>
                  <fnm>J</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Haghighi</snm>
                  <fnm>P</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Ho</snm>
                  <fnm>SL</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Huang</snm>
                  <fnm>MC</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Karlins</snm>
                  <fnm>E</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Laric</snm>
                  <fnm>PL</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Legaspi</snm>
                  <fnm>R</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Lim</snm>
                  <fnm>MJ</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Maduro</snm>
                  <fnm>QL</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Masiello</snm>
                  <fnm>CA</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Mastrian</snm>
                  <fnm>SD</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>McCloskey</snm>
                  <fnm>JC</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Pearson</snm>
                  <fnm>R</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Stantripop</snm>
                  <fnm>S</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Tiongson</snm>
                  <fnm>EE</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Tran</snm>
                  <fnm>JT</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Tsurgeon</snm>
                  <fnm>C</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Vogt</snm>
                  <fnm>JL</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Walker</snm>
                  <fnm>MA</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Wetherby</snm>
                  <fnm>KD</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Wiggins</snm>
                  <fnm>LS</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Young</snm>
                  <fnm>AC</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Zhang</snm>
                  <fnm>LH</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Osoegawa</snm>
                  <fnm>K</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Zhu</snm>
                  <fnm>B</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Zhao</snm>
                  <fnm>B</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Shu</snm>
                  <fnm>CL</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>De Jong</snm>
                  <fnm>PJ</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Lawrence</snm>
                  <fnm>CE</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Smit</snm>
                  <fnm>AF</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Chakravarti</snm>
                  <fnm>A</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Haussler</snm>
                  <fnm>D</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Green</snm>
                  <fnm>P</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Miller</snm>
                  <fnm>W</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Green</snm>
                  <fnm>ED</fnm>
               </au>
            </aug>
            <source>Nature</source>
            <pubdate>2003</pubdate>
            <volume>424</volume>
            <fpage>788</fpage>
            <lpage>793</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi" link="fulltext">10.1038/nature01858</pubid>
                  <pubid idtype="pmpid" link="fulltext">12917688</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B35" rating="0">
            <title>
               <p>DIALIGN: finding local similarities by multiple sequence alignment.</p>
            </title>
            <aug>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Morgenstern</snm>
                  <fnm>B</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Frech</snm>
                  <fnm>K</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Dress</snm>
                  <fnm>A</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Werner</snm>
                  <fnm>T</fnm>
               </au>
            </aug>
            <source>Bioinformatics</source>
            <pubdate>1998</pubdate>
            <volume>14</volume>
            <fpage>290</fpage>
            <lpage>294</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi" link="fulltext">10.1093/bioinformatics/14.3.290</pubid>
                  <pubid idtype="pmpid" link="fulltext">9614273</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B36" rating="0">
            <title>
               <p>Evidence for a high frequency of simultaneous double-nucleotide substitutions</p>
            </title>
            <aug>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Averof</snm>
                  <fnm>M</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Rokas</snm>
                  <fnm>A</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Wolfe</snm>
                  <fnm>KH</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Sharp</snm>
                  <fnm>PM</fnm>
               </au>
            </aug>
            <source>Science</source>
            <pubdate>2000</pubdate>
            <volume>287</volume>
            <fpage>1283</fpage>
            <lpage>1286</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi" link="fulltext">10.1126/science.287.5456.1283</pubid>
                  <pubid idtype="pmpid" link="fulltext">10678838</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B37" rating="0">
            <title>
               <p>DNA sequence evolution with neighbor-dependent mutation</p>
            </title>
            <aug>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Arndt</snm>
                  <fnm>PF</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Burge</snm>
                  <fnm>CB</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Hwa</snm>
                  <fnm>T</fnm>
               </au>
            </aug>
            <source>J Comput Biol</source>
            <pubdate>2003</pubdate>
            <volume>10</volume>
            <fpage>313</fpage>
            <lpage>322</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi" link="fulltext">10.1089/10665270360688039</pubid>
                  <pubid idtype="pmpid" link="fulltext">12935330</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B38" rating="0">
            <title>
               <p>Phylogenetic Estimation of Context-Dependent Substitution Rates by Maximum Likelihood</p>
            </title>
            <aug>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Siepel</snm>
                  <fnm>A</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Haussler</snm>
                  <fnm>D</fnm>
               </au>
            </aug>
            <source>Mol Biol Evol</source>
            <pubdate>2003</pubdate>
         </bibl>
         <bibl id="B39" rating="0">
            <title>
               <p>AlignmentBenchmarking</p>
            </title>
            <url>http://rana.lbl.gov/AlignmentBenchmarking</url>
         </bibl>
         <bibl id="B40" rating="0">
            <title>
               <p>Phylogenetic shadowing of primate sequences to find functional regions of the human genome</p>
            </title>
            <aug>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Boffelli</snm>
                  <fnm>D</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>McAuliffe</snm>
                  <fnm>J</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Ovcharenko</snm>
                  <fnm>D</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Lewis</snm>
                  <fnm>KD</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Ovcharenko</snm>
                  <fnm>I</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Pachter</snm>
                  <fnm>L</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Rubin</snm>
                  <fnm>EM</fnm>
               </au>
            </aug>
            <source>Science</source>
            <pubdate>2003</pubdate>
            <volume>299</volume>
            <fpage>1391</fpage>
            <lpage>1394</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi" link="fulltext">10.1126/science.1081331</pubid>
                  <pubid idtype="pmpid" link="fulltext">12610304</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B41" rating="0">
            <title>
               <p>Distinguishing regulatory DNA from neutral sites</p>
            </title>
            <aug>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Elnitski</snm>
                  <fnm>L</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Hardison</snm>
                  <fnm>RC</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Li</snm>
                  <fnm>J</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Yang</snm>
                  <fnm>S</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Kolbe</snm>
                  <fnm>D</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Eswara</snm>
                  <fnm>P</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>O'Connor</snm>
                  <fnm>MJ</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Schwartz</snm>
                  <fnm>S</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Miller</snm>
                  <fnm>W</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Chiaromonte</snm>
                  <fnm>F</fnm>
               </au>
            </aug>
            <source>Genome Res</source>
            <pubdate>2003</pubdate>
            <volume>13</volume>
            <fpage>64</fpage>
            <lpage>72</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi" link="fulltext">10.1101/gr.817703</pubid>
                  <pubid idtype="pmpid" link="fulltext">12529307</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B42" rating="0">
            <title>
               <p>Quantitative estimates of sequence divergence for comparative analyses of mammalian genomes</p>
            </title>
            <aug>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Cooper</snm>
                  <fnm>GM</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Brudno</snm>
                  <fnm>M</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Green</snm>
                  <fnm>ED</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Batzoglou</snm>
                  <fnm>S</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Sidow</snm>
                  <fnm>A</fnm>
               </au>
            </aug>
            <source>Genome Res</source>
            <pubdate>2003</pubdate>
            <volume>13</volume>
            <fpage>813</fpage>
            <lpage>820</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi" link="fulltext">10.1101/gr.1064503</pubid>
                  <pubid idtype="pmpid" link="fulltext">12727901</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B43" rating="0">
            <title>
               <p>Evidence for stabilizing selection in a eukaryotic cis-regulatory element</p>
            </title>
            <aug>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Ludwig</snm>
                  <fnm>MZ</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Bergman</snm>
                  <fnm>C</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Patel</snm>
                  <fnm>N</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Kreitman</snm>
                  <fnm>M</fnm>
               </au>
            </aug>
            <source>Nature</source>
            <pubdate>2000</pubdate>
            <volume>403</volume>
            <fpage>564</fpage>
            <lpage>567</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi" link="fulltext">10.1038/35000615</pubid>
                  <pubid idtype="pmpid" link="fulltext">10676967</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B44" rating="0">
            <title>
               <p>Species-specific organization of CpG island promoters at mammalian homologous genes.</p>
            </title>
            <aug>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Cuadrado</snm>
                  <fnm>M</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Sacristan</snm>
                  <fnm>M</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Antequera</snm>
                  <fnm>F</fnm>
               </au>
            </aug>
            <source>EMBO Rep</source>
            <pubdate>2001</pubdate>
            <volume>2</volume>
            <fpage>586</fpage>
            <lpage>592</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">11454739</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B45" rating="0">
            <title>
               <p>Turnover of binding sites for transcription factors involved in early Drosophila development</p>
            </title>
            <aug>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Costas</snm>
                  <fnm>J</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Casares</snm>
                  <fnm>F</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Vieira</snm>
                  <fnm>J</fnm>
               </au>
            </aug>
            <source>Gene</source>
            <pubdate>2003</pubdate>
            <volume>310</volume>
            <fpage>215</fpage>
            <lpage>220</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi" link="fulltext">10.1016/S0378-1119(03)00556-0</pubid>
                  <pubid idtype="pmpid" link="fulltext">12801649</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B46" rating="0">
            <title>
               <p>Conservation of regulatory elements between two species of Drosophila</p>
            </title>
            <aug>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Emberly</snm>
                  <fnm>E</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Rajewsky</snm>
                  <fnm>N</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Siggia</snm>
                  <fnm>ED</fnm>
               </au>
            </aug>
            <source>BMC Bioinformatics</source>
            <pubdate>2003</pubdate>
            <volume>4</volume>
            <fpage>57</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid" link="fulltext">302112</pubid>
                  <pubid idtype="pmpid" link="fulltext">14629780</pubid>
                  <pubid idtype="doi" link="fulltext">10.1186/1471-2105-4-57</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B47" rating="0">
            <title>
               <p>An integrated computational pipeline and database to support whole genome sequence annotation.</p>
            </title>
            <aug>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Mungall</snm>
                  <fnm>CJ</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Misra</snm>
                  <fnm>S</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Berman</snm>
                  <fnm>BP</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Carlson</snm>
                  <fnm>J</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Frise</snm>
                  <fnm>E</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Harris</snm>
                  <fnm>N</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Marshall</snm>
                  <fnm>B</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Shu</snm>
                  <fnm>S</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Kaminker</snm>
                  <fnm>JS</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Prochnik</snm>
                  <fnm>SE</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Smith</snm>
                  <fnm>CD</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Smith</snm>
                  <fnm>E</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Tupy</snm>
                  <fnm>JL</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Wiel</snm>
                  <fnm>C</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Rubin</snm>
                  <fnm>G</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Lewis</snm>
                  <fnm>SE</fnm>
               </au>
            </aug>
            <source>Genome Biology</source>
            <pubdate>2002</pubdate>
            <volume>3</volume>
            <fpage>research0081.1</fpage>
            <lpage>research0081.11</lpage>
            <xrefbib>
               <pubid idtype="doi" link="fulltext">10.1186/gb-2002-3-12-research0081</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B48" rating="0">
            <title>
               <p>Comprehensive R Archive Network</p>
            </title>
            <url>http://cran.r-project.org/</url>
         </bibl>
         <bibl id="B49" rating="0">
            <title>
               <p>Genetic Data Analysis II.</p>
            </title>
            <aug>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Weir</snm>
                  <fnm>BS</fnm>
               </au>
            </aug>
            <publisher>Sunderland, MA, Sinauer Associates, Inc.</publisher>
            <pubdate>1996</pubdate>
            <fpage>445</fpage>
         </bibl>
         <bibl id="B50" rating="0">
            <title>
               <p>Over- and under-representation of short oligonucleotides in DNA sequences.</p>
            </title>
            <aug>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Burge</snm>
                  <fnm>C</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Campbell</snm>
                  <fnm>AM</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Karlin</snm>
                  <fnm>S</fnm>
               </au>
            </aug>
            <source>Proc Natl Acad Sci U S A</source>
            <pubdate>1992</pubdate>
            <volume>89</volume>
            <fpage>1358</fpage>
            <lpage>1362</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid" link="fulltext">48449</pubid>
                  <pubid idtype="pmpid" link="fulltext">1741388</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B51" rating="0">
            <title>
               <p>On some criteria for estimating the order of a Markov chain.</p>
            </title>
            <aug>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Katz</snm>
                  <fnm>RW</fnm>
               </au>
            </aug>
            <source>Technometrics</source>
            <pubdate>1981</pubdate>
            <volume>23</volume>
            <fpage>243</fpage>
            <lpage>249</lpage>
         </bibl>
         <bibl id="B52" rating="0">
            <title>
               <p>Estimating synonymous and nonsynonymous substitution rates under realistic evolutionary models.</p>
            </title>
            <aug>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Yang</snm>
                  <fnm>Z</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Nielsen</snm>
                  <fnm>R</fnm>
               </au>
            </aug>
            <source>Mol Biol Evol</source>
            <pubdate>2000</pubdate>
            <volume>17</volume>
            <fpage>32</fpage>
            <lpage>43</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">10666704</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B53" rating="0">
            <title>
               <p>PAML (version 3.13)</p>
            </title>
            <url>http://abacus.gene.ucl.ac.uk/software/paml.html</url>
         </bibl>
         <bibl id="B54" rating="0">
            <title>
               <p>Dating of the human-ape splitting by a molecular clock of mitochondrial DNA.</p>
            </title>
            <aug>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Hasegawa</snm>
                  <fnm>M</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Kishino</snm>
                  <fnm>H</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Yano</snm>
                  <fnm>T</fnm>
               </au>
            </aug>
            <source>J Mol Evol</source>
            <pubdate>1985</pubdate>
            <volume>22</volume>
            <fpage>160</fpage>
            <lpage>174</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">3934395</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B55" rating="0">
            <title>
               <p>Codon usage bias and base composition of nuclear genes in Drosophila.</p>
            </title>
            <aug>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Moriyama</snm>
                  <fnm>EN</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Hartl</snm>
                  <fnm>DL</fnm>
               </au>
            </aug>
            <source>Genetics</source>
            <pubdate>1993</pubdate>
            <volume>134</volume>
            <fpage>847</fpage>
            <lpage>858</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">8349115</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B56" rating="0">
            <title>
               <p>Intraspecific nuclear DNA variation in Drosophila.</p>
            </title>
            <aug>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Moriyama</snm>
                  <fnm>EN</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Powell</snm>
                  <fnm>JR</fnm>
               </au>
            </aug>
            <source>Mol Biol Evol</source>
            <pubdate>1996</pubdate>
            <volume>13</volume>
            <fpage>261</fpage>
            <lpage>277</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">8583899</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B57" rating="0">
            <title>
               <p>The correlation between intron length and recombination in Drosophila. Dynamic equilibrium between mutational and selective forces.</p>
            </title>
            <aug>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Comeron</snm>
                  <fnm>JM</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Kreitman</snm>
                  <fnm>M</fnm>
               </au>
            </aug>
            <source>Genetics</source>
            <pubdate>2000</pubdate>
            <volume>156</volume>
            <fpage>1175</fpage>
            <lpage>1190</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">11063693</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B58" rating="0">
            <title>
               <p>ROSE (version 1.3)</p>
            </title>
            <url>http://bibiserv.techfak.uni-bielefeld.de/rose/</url>
         </bibl>
         <bibl id="B59" rating="0">
            <title>
               <p>PHYLIP (version 3.5c)</p>
            </title>
            <publisher>Seattle</publisher>
            <url>http://evolution.genetics.washington.edu/phylip.html</url>
         </bibl>
         <bibl id="B60" rating="0">
            <title>
               <p>Bayesian adaptive sequence alignment algorithms</p>
            </title>
            <aug>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Zhu</snm>
                  <fnm>J</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Liu</snm>
                  <fnm>JS</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Lawrence</snm>
                  <fnm>CE</fnm>
               </au>
            </aug>
            <source>Bioinformatics</source>
            <pubdate>1998</pubdate>
            <volume>14</volume>
            <fpage>25</fpage>
            <lpage>39</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi" link="fulltext">10.1093/bioinformatics/14.1.25</pubid>
                  <pubid idtype="pmpid" link="fulltext">9520499</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B61" rating="0">
            <title>
               <p>BLAST 2 Sequences, a new tool for comparing protein and nucleotide sequences</p>
            </title>
            <aug>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Tatusova</snm>
                  <fnm>TA</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Madden</snm>
                  <fnm>TL</fnm>
               </au>
            </aug>
            <source>FEMS Microbiol Lett</source>
            <pubdate>1999</pubdate>
            <volume>174</volume>
            <fpage>247</fpage>
            <lpage>250</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi" link="fulltext">10.1016/S0378-1097(99)00149-4</pubid>
                  <pubid idtype="pmpid" link="fulltext">10339815</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B62" rating="0">
            <title>
               <p>Comparative analysis of noncoding regions of 77 orthologous mouse and human gene pairs.</p>
            </title>
            <aug>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Jareborg</snm>
                  <fnm>N</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Birney</snm>
                  <fnm>E</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Durbin</snm>
                  <fnm>R</fnm>
               </au>
            </aug>
            <source>Genome Res</source>
            <pubdate>1999</pubdate>
            <volume>9</volume>
            <fpage>815</fpage>
            <lpage>824</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi" link="fulltext">10.1101/gr.9.9.815</pubid>
                  <pubid idtype="pmpid" link="fulltext">10508839</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B63" rating="0">
            <title>
               <p>Fast algorithms for large-scale genome alignment and comparison</p>
            </title>
            <aug>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Delcher</snm>
                  <fnm>AL</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Phillippy</snm>
                  <fnm>A</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Carlton</snm>
                  <fnm>J</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Salzberg</snm>
                  <fnm>SL</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>2002</pubdate>
            <volume>30</volume>
            <fpage>2478</fpage>
            <lpage>2483</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid" link="fulltext">117189</pubid>
                  <pubid idtype="pmpid" link="fulltext">12034836</pubid>
                  <pubid idtype="doi" link="fulltext">10.1093/nar/30.11.2478</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B64" rating="0">
            <title>
               <p>OWEN: aligning long collinear regions of genomes</p>
            </title>
            <aug>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Ogurtsov</snm>
                  <fnm>AY</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Roytberg</snm>
                  <fnm>MA</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Shabalina</snm>
                  <fnm>SA</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Kondrashov</snm>
                  <fnm>AS</fnm>
               </au>
            </aug>
            <source>Bioinformatics</source>
            <pubdate>2002</pubdate>
            <volume>18</volume>
            <fpage>1703</fpage>
            <lpage>1704</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi" link="fulltext">10.1093/bioinformatics/18.12.1703</pubid>
                  <pubid idtype="pmpid" link="fulltext">12490463</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B65" rating="0">
            <title>
               <p>Improved tools for biological sequence comparison</p>
            </title>
            <aug>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Pearson</snm>
                  <fnm>WR</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Lipman</snm>
                  <fnm>DJ</fnm>
               </au>
            </aug>
            <source>Proc Natl Acad Sci U S A</source>
            <pubdate>1988</pubdate>
            <volume>85</volume>
            <fpage>2444</fpage>
            <lpage>2448</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">280013</pubid>
                  <pubid idtype="pmpid" link="fulltext">3162770</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B66" rating="0">
            <title>
               <p>A general method applicable to the search for similarities in the amino acid sequence of two proteins.</p>
            </title>
            <aug>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Needleman</snm>
                  <fnm>SB</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Wunsch</snm>
                  <fnm>CD</fnm>
               </au>
            </aug>
            <source>J Mol Biol</source>
            <pubdate>1970</pubdate>
            <volume>48</volume>
            <fpage>443</fpage>
            <lpage>453</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">5420325</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B67" rating="0">
            <title>
               <p>Human-mouse alignments with BLASTZ</p>
            </title>
            <aug>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Schwartz</snm>
                  <fnm>S</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Kent</snm>
                  <fnm>WJ</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Smit</snm>
                  <fnm>A</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Zhang</snm>
                  <fnm>Z</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Baertsch</snm>
                  <fnm>R</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Hardison</snm>
                  <fnm>RC</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Haussler</snm>
                  <fnm>D</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Miller</snm>
                  <fnm>W</fnm>
               </au>
            </aug>
            <source>Genome Res</source>
            <pubdate>2003</pubdate>
            <volume>13</volume>
            <fpage>103</fpage>
            <lpage>107</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi" link="fulltext">10.1101/gr.809403</pubid>
                  <pubid idtype="pmpid" link="fulltext">12529312</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B68" rating="0">
            <title>
               <p>DIALIGN 2: improvement of the segment-to-segment approach to multiple sequence alignment.</p>
            </title>
            <aug>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Morgenstern</snm>
                  <fnm>B</fnm>
               </au>
            </aug>
            <source>Bioinformatics</source>
            <pubdate>1999</pubdate>
            <volume>15</volume>
            <fpage>211</fpage>
            <lpage>218</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi" link="fulltext">10.1093/bioinformatics/15.3.211</pubid>
                  <pubid idtype="pmpid" link="fulltext">10222408</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B69" rating="0">
            <title>
               <p>CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice.</p>
            </title>
            <aug>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Thompson</snm>
                  <fnm>JD</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Higgins</snm>
                  <fnm>DG</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Gibson</snm>
                  <fnm>TJ</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>1994</pubdate>
            <volume>22</volume>
            <fpage>4673</fpage>
            <lpage>4680</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">308517</pubid>
                  <pubid idtype="pmpid" link="fulltext">7984417</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B70" rating="0">
            <title>
               <p>EMBOSS: the European Molecular Biology Open Software Suite</p>
            </title>
            <aug>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Rice</snm>
                  <fnm>P</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Longden</snm>
                  <fnm>I</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Bleasby</snm>
                  <fnm>A</fnm>
               </au>
            </aug>
            <source>Trends Genet</source>
            <pubdate>2000</pubdate>
            <volume>16</volume>
            <fpage>276</fpage>
            <lpage>277</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi" link="fulltext">10.1016/S0168-9525(00)02024-2</pubid>
                  <pubid idtype="pmpid" link="fulltext">10827456</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B71" rating="0">
            <title>
               <p>Conservation, regulation, synteny, and introns in a large-scale C. briggsae-C. elegans genomic alignment.</p>
            </title>
            <aug>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Kent</snm>
                  <fnm>WJ</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Zahler</snm>
                  <fnm>AM</fnm>
               </au>
            </aug>
            <source>Genome Res</source>
            <pubdate>2000</pubdate>
            <volume>10</volume>
            <fpage>1115</fpage>
            <lpage>1125</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi" link="fulltext">10.1101/gr.10.8.1115</pubid>
                  <pubid idtype="pmpid" link="fulltext">10958630</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
      </refgrp>
   </bm>
</art>
