<?xml version='1.0'?>
<!DOCTYPE art SYSTEM 'http://www.biomedcentral.com/xml/article.dtd'>
<art>
   <ui>1471-2105-7-376</ui>
   <ji>1471-2105</ji>
   <fm>
      <dochead>Research article</dochead>
      <bibl>
         <title>
            <p>Detecting the limits of regulatory element conservation and divergence estimation using pairwise and multiple alignments</p>
         </title>
         <aug>
            <au id="A1">
               <snm>Pollard</snm>
               <mi>A</mi>
               <fnm>Daniel</fnm>
               <insr iid="I1"/>
               <email>dpollard@berkeley.edu</email>
            </au>
            <au id="A2">
               <snm>Moses</snm>
               <mi>M</mi>
               <fnm>Alan</fnm>
               <insr iid="I1"/>
               <email>amoses@gmail.com</email>
            </au>
            <au id="A3">
               <snm>Iyer</snm>
               <mi>N</mi>
               <fnm>Venky</fnm>
               <insr iid="I2"/>
               <email>venky@berkeley.edu</email>
            </au>
            <au id="A4" ca="yes">
               <snm>Eisen</snm>
               <mi>B</mi>
               <fnm>Michael</fnm>
               <insr iid="I1"/>
               <insr iid="I2"/>
               <insr iid="I3"/>
               <insr iid="I4"/>
               <email>mbeisen@berkeley.edu</email>
            </au>
         </aug>
         <insg>
            <ins id="I1">
               <p>Graduate Group in Biophysics, University of California, Berkeley, CA 94720, USA</p>
            </ins>
            <ins id="I2">
               <p>Department of Molecular and Cell Biology, University of California, Berkeley, CA 94720, USA</p>
            </ins>
            <ins id="I3">
               <p>Department of Genome Sciences, Genomics Division, Ernest Orlando Lawrence Berkeley National Lab, Berkeley, CA 94720, USA</p>
            </ins>
            <ins id="I4">
               <p>Center for Integrative Genomics, University of California, Berkeley, CA 94720, USA</p>
            </ins>
         </insg>
         <source>BMC Bioinformatics</source>
         <issn>1471-2105</issn>
         <pubdate>2006</pubdate>
         <volume>7</volume>
         <issue>1</issue>
         <fpage>376</fpage>
         <url>http://www.biomedcentral.com/1471-2105/7/376</url>
         <xrefbib>
            <pubidlist>
               <pubid idtype="pmpid">16904011</pubid>
               <pubid idtype="doi">10.1186/1471-2105-7-376</pubid>
            </pubidlist>
         </xrefbib>
      </bibl>
      <history>
         <rec>
            <date>
               <day>13</day>
               <month>4</month>
               <year>2006</year>
            </date>
         </rec>
         <acc>
            <date>
               <day>14</day>
               <month>8</month>
               <year>2006</year>
            </date>
         </acc>
         <pub>
            <date>
               <day>14</day>
               <month>8</month>
               <year>2006</year>
            </date>
         </pub>
      </history>
      <cpyrt>
         <year>2006</year>
         <collab>Pollard et al; licensee BioMed Central Ltd.</collab>
         <note>This is an Open Access article distributed under the terms of the Creative Commons Attribution License (<url>http://creativecommons.org/licenses/by/2.0</url>), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</note>
      </cpyrt>
      <abs>
         <sec>
            <st>
               <p>Abstract</p>
            </st>
            <sec>
               <st>
                  <p>Background</p>
               </st>
               <p>Molecular evolutionary studies of noncoding sequences rely on multiple alignments. Yet how multiple alignment accuracy varies across sequence types, tree topologies, divergences and tools, and further how this variation impacts specific inferences, remains unclear.</p>
            </sec>
            <sec>
               <st>
                  <p>Results</p>
               </st>
               <p>Here we develop a molecular evolution simulation platform, CisEvolver, with models of background noncoding and transcription factor binding site evolution, and use simulated alignments to systematically examine multiple alignment accuracy and its impact on two key molecular evolutionary inferences: transcription factor binding site conservation and divergence estimation. We find that the accuracy of multiple alignments is determined almost exclusively by the pairwise divergence distance of the two most diverged species and that additional species have a negligible influence on alignment accuracy. Conserved transcription factor binding sites align better than surrounding noncoding DNA yet are often found to be misaligned at relatively short divergence distances, such that studies of binding site gain and loss could easily be confounded by alignment error. Divergence estimates from multiple alignments tend to be overestimated at short divergence distances but reach a tool specific divergence at which they cease to increase, leading to underestimation at long divergences. Our most striking finding was that overall alignment accuracy, binding site alignment accuracy and divergence estimation accuracy vary greatly across branches in a tree and are most accurate for terminal branches connecting sister taxa and least accurate for internal branches connecting sub-alignments.</p>
            </sec>
            <sec>
               <st>
                  <p>Conclusion</p>
               </st>
               <p>Our results suggest that variation in alignment accuracy can lead to errors in molecular evolutionary inferences that could be construed as biological variation. These findings have implications for which species to choose for analyses, what kind of errors would be expected for a given set of species and how multiple alignment tools and phylogenetic inference methods might be improved to minimize or control for alignment errors.</p>
            </sec>
         </sec>
      </abs>
   </fm>
   <meta>
      <classifications>
         <classification type="bmc" subtype="user_supplied_xml" id="endnote"/>
      </classifications>
   </meta>
   <bdy>
      <sec>
         <st>
            <p>Background</p>
         </st>
         <p>Annotation of <it>cis</it>-regulatory sequences, non-coding RNAs and other functional noncoding sequences is a major challenge in molecular genetics today. Whole genome sequences of closely related species, such as those now available in mammals, flies, worms, yeast and bacteria, provide an opportunity for evolutionary analyses to greatly aid in this effort, but also present new challenges for sequence analysis <abbrgrp><abbr bid="B1">1</abbr></abbrgrp>.</p>
         <p>The first step in studying the evolution of noncoding sequences is alignment. New tools have been developed for fast and accurate alignment of long stretches of genomic sequence (reviewed in <abbrgrp><abbr bid="B2">2</abbr><abbr bid="B3">3</abbr><abbr bid="B4">4</abbr></abbrgrp>) and benchmarking studies have begun to address the accuracy of these pairwise <abbrgrp><abbr bid="B5">5</abbr><abbr bid="B6">6</abbr></abbrgrp> and multiple <abbrgrp><abbr bid="B7">7</abbr><abbr bid="B8">8</abbr></abbrgrp> alignment tools under various evolutionary scenarios. Knowing the nucleotide-level accuracy of alignment tools greatly informs decisions about which tools to use and which species to compare, but the impact of alignment error on evolutionary studies of noncoding sequences is only just beginning to be explored <abbrgrp><abbr bid="B6">6</abbr><abbr bid="B8">8</abbr></abbrgrp>.</p>
         <p>Sophisticated molecular evolution models and tests have been developed over the last few decades to identify various forms of selection and sequence features, yet their application nearly always assumes a perfect alignment <abbrgrp><abbr bid="B9">9</abbr></abbrgrp>. It is commonly appreciated that highly diverged species align poorly and therefore are unsuitable for many alignment based evolutionary inferences. Thus cautious researchers tend to study recently diverged species that align trivially, but which have the potential to not be as informative as more diverged species. Ideally one would use the set of species that maximize information for an acceptable amount of error in an estimate.</p>
         <p>Because of the inferential nature of evolutionary studies, no experiment in extant taxa could generate information about the true orthology of sequences, so simulations offer a tractable alternative. Molecular evolution simulations have been used to assess evolutionary analysis methods, including divergence estimation <abbrgrp><abbr bid="B10">10</abbr><abbr bid="B11">11</abbr></abbrgrp> and phylogeny reconstruction methods <abbrgrp><abbr bid="B12">12</abbr><abbr bid="B13">13</abbr><abbr bid="B14">14</abbr><abbr bid="B15">15</abbr></abbrgrp>, as well as protein <abbrgrp><abbr bid="B16">16</abbr><abbr bid="B17">17</abbr></abbrgrp> and non-coding alignment accuracy <abbrgrp><abbr bid="B5">5</abbr><abbr bid="B6">6</abbr><abbr bid="B7">7</abbr><abbr bid="B8">8</abbr><abbr bid="B18">18</abbr><abbr bid="B19">19</abbr></abbrgrp>.</p>
         <p>Here we present the results from a simulation-based study assessing the accuracy of multiple alignments and the effect of alignment accuracy on two fundamental evolutionary inferences: transcription factor binding site conservation and divergence distance estimation.</p>
         <p>The most frequent noncoding targets of comparative analyses are <it>cis</it>-regulatory DNAs that contain functional binding sites for transcription factors and thereby control gene expression <abbrgrp><abbr bid="B20">20</abbr></abbrgrp>. Although transcription-factor binding sites are generally more conserved than surrounding sequences <abbrgrp><abbr bid="B21">21</abbr><abbr bid="B22">22</abbr><abbr bid="B23">23</abbr><abbr bid="B24">24</abbr><abbr bid="B25">25</abbr><abbr bid="B26">26</abbr><abbr bid="B27">27</abbr><abbr bid="B28">28</abbr><abbr bid="B29">29</abbr><abbr bid="B30">30</abbr><abbr bid="B31">31</abbr><abbr bid="B32">32</abbr><abbr bid="B33">33</abbr><abbr bid="B34">34</abbr></abbrgrp>, they have also been observed to be gained and lost through evolution <abbrgrp><abbr bid="B35">35</abbr><abbr bid="B36">36</abbr><abbr bid="B37">37</abbr><abbr bid="B38">38</abbr><abbr bid="B39">39</abbr><abbr bid="B40">40</abbr><abbr bid="B41">41</abbr><abbr bid="B42">42</abbr></abbrgrp>. Precise measurements of binding site conservation, therefore, are essential for studying their evolutionary dynamics as well as identifying regulatory regions.</p>
         <p>Divergence estimates inform nearly all evolutionary analyses. Accurate measurements of noncoding divergences are used for many purposes including differentiating functional from non-functional sequences based on constraint <abbrgrp><abbr bid="B43">43</abbr><abbr bid="B44">44</abbr><abbr bid="B45">45</abbr><abbr bid="B46">46</abbr><abbr bid="B47">47</abbr><abbr bid="B48">48</abbr><abbr bid="B49">49</abbr><abbr bid="B50">50</abbr><abbr bid="B51">51</abbr></abbrgrp>, showing lineage specific rate changes <abbrgrp><abbr bid="B52">52</abbr><abbr bid="B53">53</abbr></abbrgrp> and as a baseline for comparing other kinds of rates, like binding site gain and loss <abbrgrp><abbr bid="B38">38</abbr></abbrgrp>.</p>
         <p>Below we first examine multiple alignment accuracy across tools, sequence types, trees and divergences. We show that multiple alignment accuracy is primarily determined by the pairwise divergence of the two most diverged species. We next look at alignment accuracy of transcription factor binding sites. We show that although they align better than their surrounding noncoding DNA, they are misaligned at a high enough frequency such that precise studies of gain and loss events could easily be confounded by alignment errors. Finally we look at the impact multiple alignment accuracy has on divergence distance estimation. We show that divergences tend to be overestimated at short distances and cease to increase at a tool specific maximum divergence, corresponding to the point at which alignment accuracy reaches its minimum. We also show that overall alignment accuracy, binding site alignment accuracy and divergence estimation accuracy vary across branches in a tree such that terminal branches are aligned better than internal branches. Implications for method development and evolutionary analysis are discussed.</p>
      </sec>
      <sec>
         <st>
            <p>Results</p>
         </st>
         <sec>
            <st>
               <p>CisEvolver</p>
            </st>
            <p>For the purposes of this study we developed a molecular evolution simulator, CisEvolver, that incorporates several known characteristics of noncoding sequences. CisEvolver takes an ancestral DNA sequence and evolves it along a mutation guide tree, producing sequences for which we know the true alignment. The utility of such a simulation is that the sequences can be re-aligned using standard alignment tools and the accuracy of the tool alignment as well as the accuracy of any inference from the tool alignment can be measured by comparison with the true alignment. In cases where the error in an inference is due to both alignment error and error in the inference method itself, the contribution of alignment error to the total inference error can be directly measured by comparison of inference from the tool alignment and inference from the true alignment.</p>
            <p>We implemented CisEvolver with two types of sequences, background genomic sequence and transcription factor binding sites. Background genomic sequences are evolved according to the Hasegawa Kashina Yano 1985 (HKY85) substitution model <abbrgrp><abbr bid="B54">54</abbr></abbrgrp>, a Poisson insertion/deletion (indel) event model and an empirical indel length frequency distribution <abbrgrp><abbr bid="B55">55</abbr></abbrgrp>. Transcription factor binding sites are evolved according to the Halpern Bruno 1998 (HB98) model of position specific substitution rates <abbrgrp><abbr bid="B56">56</abbr><abbr bid="B57">57</abbr></abbrgrp>, which requires the less degenerate positions in a transcription factor binding site to evolve more slowly and more specifically according to a position specific weight matrix <abbrgrp><abbr bid="B58">58</abbr></abbrgrp> (see Methods for more details).</p>
            <p>CisEvolver is freely available <abbrgrp><abbr bid="B59">59</abbr></abbrgrp>.</p>
         </sec>
         <sec>
            <st>
               <p>Simulations &amp; alignments</p>
            </st>
            <p>Using CisEvolver we simulated a large set of alignments on which downstream analyses were performed. Sequences were simulated over a range of total divergence distances on two, three and four species trees with fixed topologies and fixed branch length proportions as depicted in figure <figr fid="F1">1</figr>. The relative branch lengths in these three topologies were chosen for direct comparisons of branches within the tree, as discussed below (see Alignment Accuracy). Two basic classes of sequences were simulated representing either 10 kb background genomic sequences or variable length enhancer sequences. Background genomic sequences were simulated with uniform substitution and indel rates. Enhancer sequences were evolved from 36 experimentally characterized regulatory regions from <it>Drosophila melanogaster </it><abbrgrp><abbr bid="B26">26</abbr><abbr bid="B60">60</abbr></abbrgrp> containing the binding sites for eight transcription factors with known binding specificity: Bicoid, Caudal, Giant, Hunchback, Knirps, Kruppel, Tailless and Torso-Response Element <abbrgrp><abbr bid="B60">60</abbr><abbr bid="B61">61</abbr><abbr bid="B62">62</abbr></abbrgrp>. Binding sites within the enhancers were evolved using CisEvolver's binding site evolution model with no gain or loss events and surrounding sequences were evolved as genomic background with substitutions and indels (see Methods for more details). One hundred replicates and 25 replicates for each divergence and tree topology were generated for background genomic sequences and each of the 36 enhancers respectively.</p>
            <fig id="F1">
               <title>
                  <p>Figure 1</p>
               </title>
               <caption>
                  <p>Mutation Guide Trees</p>
               </caption>
               <text>
                  <p><b>Mutation Guide Trees</b>. Simulations were performed on two, three and four species trees. Numbers on the branches indicate the fraction of the total tree divergence distance on each branch.</p>
               </text>
               <graphic file="1471-2105-7-376-1"/>
            </fig>
            <p>All alignments were performed using default parameter settings for Clustalw <abbrgrp><abbr bid="B63">63</abbr></abbrgrp>, Mavid <abbrgrp><abbr bid="B64">64</abbr></abbrgrp>, Mlagan <abbrgrp><abbr bid="B65">65</abbr></abbrgrp> and Blastz/Tba <abbrgrp><abbr bid="B7">7</abbr><abbr bid="B66">66</abbr><abbr bid="B67">67</abbr></abbrgrp> (see Methods for details). These tools were chosen based on their usage, availability, speed and ability to produce collinear multiple alignments of large genomic regions and were meant to be representative of algorithms and parameter settings. We note that Blastz/Tba is a local alignment tool and therefore, unlike the global alignment tools, does not always return an alignment. Finally, although we present the relative performance of these specific tools, our focus in this study is on the relationship of their accuracy with evolutionary scenarios and the inferences that can be made from their alignments.</p>
         </sec>
         <sec>
            <st>
               <p>Alignment accuracy</p>
            </st>
            <p>Using simulated true alignments and tool alignments we characterized the variation in alignment accuracy across alignment tools, divergences and trees. Alignment accuracy was defined as the fraction of ungapped columns in a true alignment that were aligned identically in a tool alignment (see Methods &amp; "sensitivity" in <abbrgrp><abbr bid="B5">5</abbr></abbrgrp>). We examined many aspects of pairwise and multiple alignment accuracy and our major observations were:</p>
            <p>i. Alignment accuracy varies across tools and divergences (figure <figr fid="F2">2A</figr>).</p>
            <fig id="F2">
               <title>
                  <p>Figure 2</p>
               </title>
               <caption>
                  <p>Multiple Alignment Accuracy</p>
               </caption>
               <text>
                  <p><b>Multiple Alignment Accuracy</b>. A: Alignment accuracy varies across tools and divergences. Mean four species alignment accuracy for each tool was measured as a function of total divergence distance. B: Alignment accuracy improves with the presence of transcription factor binding sites. Mean improved alignment accuracy of enhancers over background sequences for four species alignments was measured as a function of total divergence distance. C: Dividing a fixed total divergence up with more species improves alignment accuracy. Mean Mlagan alignment accuracy for two, three and four species trees was measured as a function of total divergence distance. D: Adding in-group species to a pair of species has no effect on the alignment accuracy of the pair. Mean improved alignment accuracy of three species alignments over two species alignments, where the divergence distance between Seq1 and Seq3 in the three species alignment was the same as the divergence distances between Seq1 and Seq2 in the two species alignment, was measured as a function of divergence distance. E &amp; F: Alignment accuracy varies across branches in a tree and is best for leaf-to-leaf alignments and worst for node-to-node alignments, with the exception of highly diverged enhancers. Mean Clustalw alignment accuracy along branches in three and four species trees subtracted from mean two species alignment accuracy, where divergence along each branch is the same as the two species divergence, was measured in background sequences (E) and enhancers (F) as a function of divergence distance.</p>
               </text>
               <graphic file="1471-2105-7-376-2"/>
            </fig>
            <p>ii. The presence of transcription factor binding sites leads to higher alignment accuracy (figure <figr fid="F2">2B</figr>).</p>
            <p>iii. More species results in better accuracy when comparing trees of equal total divergence but different numbers of leaves (figure <figr fid="F2">2C</figr>).</p>
            <p>iv. The improvement of adding a fourth species is less than that of adding a third when comparing trees of equal total divergence but different numbers of leaves (figure <figr fid="F2">2C</figr>).</p>
            <p>v. Adding in-group species or out-group species to a pair of species has an insignificant effect on the alignment accuracy of the pair (figures <figr fid="F2">2D, 2E</figr> &amp;<figr fid="F2">2F</figr>).</p>
            <p>In addition to these investigations into alignment accuracy across all species in alignments, we also examined the alignment accuracy for subsets of species within multiple alignments, attempting to relate the accuracy to the tree topology. We measured what we call leaf-to-leaf accuracy, node-to-leaf accuracy and node-to-node accuracy (see Methods). Leaf-to-leaf accuracy refers to the accuracy of the alignment of sister taxa (i.e. seq3 to seq4 in the four species alignments in figure <figr fid="F1">1</figr>), conditioned on the columns being ungapped across all the sequences. Node-to-leaf accuracy refers to the accuracy of the three species alignments, conditioned on the columns containing correct alignments of seq1 to seq2. Node-to-leaf accuracy therefore only depends on the alignment accuracy of node1 to seq3. Similarly, node-to-node accuracy refers to the accuracy of the four species alignments, conditioned on the columns containing correct alignments of seq1 to seq2 and seq3 to seq4. Node-to-node accuracy therefore only depends on the alignment accuracy of node1 to node2. Using these measures also found that:</p>
            <p>vi. Leaf-to-leaf alignments are more accurate than node-to-leaf alignments, which are more accurate than node-to-node alignments, with the exception of highly diverged enhancers (figures <figr fid="F2">2E</figr> &amp;<figr fid="F2">2F</figr>).</p>
            <p>Observations i and ii were consistent with our expectations. Although all four tools in this study use some form of the Needleman-Wunsch algorithm, they each utilize unique algorithmic features and scoring schemes, leading to variation in their alignments and therefore alignment accuracy under different evolutionary conditions (figure <figr fid="F2">2A</figr>). Both, the decrease in alignment accuracy with greater divergence distance (figure <figr fid="F2">2A</figr>) as well as the increase in alignment accuracy with the addition of transcription factor binding sites (figure <figr fid="F2">2B</figr>), are the expected outcome of higher similarity and fewer indels leading to higher alignment accuracy (as we have previously reported for pairwise alignments <abbrgrp><abbr bid="B5">5</abbr></abbrgrp>).</p>
            <p>Our results on the relationship of alignment accuracy to the number of species aligned (observations iii, iv and v) are consistent with the hypothesis that the pairwise distance between the two most diverged species in a tree effectively determines alignment accuracy. Across tools and divergences, adding ingroup or outgroup species to a pair of species of fixed divergence had an insignificant effect on alignment accuracy (t-test, p > 0.05) (figure <figr fid="F2">2D</figr> and leaf-to-leaf accuracy in <figr fid="F2">2E</figr> &amp;<figr fid="F2">2F</figr>). Brudno et al found Mlagan alignments of human and fugu exons were improved by 3% with the addition of mouse as an in-group <abbrgrp><abbr bid="B65">65</abbr></abbrgrp>, which is consistent with the trend we observed with Mlagan alignments improving with in-group addition, but this trend was not found to be highly significant at any divergence. Observations iii and iv, that dividing a fixed total divergence up with more species improves accuracy incrementally (figure <figr fid="F2">2C</figr>), may appear to be in conflict with this hypothesis but are in fact consistent. The increase in alignment accuracy with additional species dividing up a fixed total divergence is due to a decrease in the pairwise divergence between the two most diverged species, not the actual addition of species (figures <figr fid="F2">2D, 2E</figr> &amp;<figr fid="F2">2F</figr>). Thus the span of the two most diverged species, not the number of species in the alignment, appears to be the primary determinant of alignment accuracy.</p>
            <p>Finally, observation vi, that alignment accuracy varies across branches in a tree, is quite unexpected. The progressive alignment steps that these four tools use appear to be biased toward aligning leaf sequences better than internal nodes, where sub-alignments must be aligned (figure <figr fid="F2">2E</figr>). This bias was found to be inconsistent for enhancer sequences, for which alignment accuracy of node-to-node and node-to-leaf branches actually were better than leaf-to-leaf branches at high divergences (figure <figr fid="F2">2F</figr>). This variation is surprising given that the accuracy of the alignment of a node to another node or sequence is conditioned on the sequences below that node (in the tree) having been aligned correctly (see Methods). These results suggest that the step of aligning sub-alignments is harder than aligning sequences, consistent with the idea that progressive alignment heuristics often lead to sub-optimal alignments <abbrgrp><abbr bid="B68">68</abbr></abbrgrp>. Variation of alignment accuracy across branches in a tree has profound implications for phylogenetic analysis.</p>
            <p>To understand the relationship of the observed variation in alignment accuracy with phylogenetic analyses performed using automated alignments, we explored the following two evolutionary inferences.</p>
         </sec>
         <sec>
            <st>
               <p>Transcription factor binding site alignment</p>
            </st>
            <p>Using simulated true alignments and tool alignments of enhancers containing conserved transcription factor binding sites we examined the accuracy of binding site alignment and its relationship with overall alignment accuracy. We used two definitions of binding site alignment. Aligned sites were classified as either perfectly aligned, meaning every base in the binding site was aligned correctly across all species, or overlapping, meaning the binding sites across the species overlapped at at least one position (similar to definitions in <abbrgrp><abbr bid="B34">34</abbr></abbrgrp>).</p>
            <p>We first looked to see if binding site alignment accuracy varies across tools and divergences. Indeed, across tools binding alignment accuracy is a decreasing function of divergence distance. Figure <figr fid="F3">3A</figr> shows the fraction of sites overlapping in four species enhancer alignments.</p>
            <fig id="F3">
               <title>
                  <p>Figure 3</p>
               </title>
               <caption>
                  <p>Transcription Factor Binding Site Alignment Accuracy</p>
               </caption>
               <text>
                  <p><b>Transcription Factor Binding Site Alignment Accuracy</b>. A: Binding site alignment accuracy varies across tools and divergences. Fraction of binding sites overlapping in four species alignments was measured as a function of total divergence distance. B: Binding sites are often still overlapping in alignments even when they are not perfectly aligned. Fraction of binding sites perfectly aligned in four species alignments subtracted from the fraction of binding sites overlapping in four species alignments was measured as a function of total divergence distance. C: Binding site alignment accuracy is highly correlated with overall alignment accuracy and is consistently higher. Fraction of binding sites overlapping in four species alignments was measured as a function of overall alignment accuracy. D: Binding site alignment accuracy varies across branches in a tree and is best for leaf-to-leaf alignments and worst for node-to-node alignments. Fraction of binding sites overlapping along branches in three and four species trees subtracted from the fraction of binding sites overlapping in two species Clustalw alignments, where the divergence along each branch is the same, was measured as a function of divergence distance. E: Binding site alignment accuracy is positively correlated with binding site density in an enhancer. Fraction of binding sites overlapping in replicate four species Mlagan alignments of each of the 36 enhancers was measured as a function of the density of binding sites in the enhancer. F: Binding site alignment accuracy is positively correlated with binding site length. Fraction of binding sites overlapping in four species Mlagan alignments for each of the eight transcription factors was measured as a function of the length of the transcription factors' binding sites.</p>
               </text>
               <graphic file="1471-2105-7-376-3"/>
            </fig>
            <p>We next compared our two binding site alignment scores. We were somewhat surprised to see how different the two scores are, based on the intuition that conserved binding sites should make for good anchors and large indels in flanking sequences therefore ought to be the cause of most alignment errors. Instead it appears that binding sites are often still overlapping in an alignment even if they are not perfectly aligned. Figure <figr fid="F3">3B</figr> shows the difference between our two scores in four species alignments. The large difference between the two scores suggests that evolved binding sites might not be strong anchors and therefore alignment errors in regulatory regions may often be subtle.</p>
            <p>We next looked to see how binding site alignment accuracy is related to overall alignment accuracy. Across tools, divergence distances and trees, binding site alignment accuracy is highly correlated with overall alignment accuracy, however, binding site alignment accuracy is consistently higher than overall alignment accuracy. Figure <figr fid="F3">3C</figr> shows overlap binding site accuracy as a function of overall alignment accuracy for four species alignments. Similar to overall alignment accuracy of enhancers (figure <figr fid="F2">2F</figr>), binding site alignment accuracy also varies across branches in trees (figure <figr fid="F3">3D</figr>).</p>
            <p>Lastly, we looked at properties of enhancers and binding sites to see how they are related to binding site alignment accuracy. We expected that enhancers with a greater density of binding sites would align more easily. Indeed, across tools, divergence distances and trees, binding site alignment accuracy is strongly and significantly correlated with the density of binding sites in an enhancer (figure <figr fid="F3">3E</figr>, Spearman's <it>rho </it>= 0.92 p &lt; 10<sup>-10</sup>). We also looked at the length and average information content of binding sites to see if longer or more highly specified sites tend to align better. Across tools, divergence distances and trees, binding site alignment accuracy is correlated with binding site length (figure <figr fid="F3">3F</figr>, Spearman's <it>rho </it>= 0.44 p &lt; 0.3) and average information content (Spearman's <it>rho </it>= 0.40 p &lt; 0.35) but neither correlation is significant, likely because of the small number of factors used in this study. Thus the greater the density and the longer and more specified the sites in an enhancer, the more likely the sites will be aligned correctly.</p>
         </sec>
         <sec>
            <st>
               <p>Divergence estimation</p>
            </st>
            <p>Using simulated true alignments and tool alignments of 10 kb background noncoding sequences we investigated the effects of alignment errors on divergence estimation. Divergence distances were estimated from alignments using the Baseml program from the PAML package <abbrgrp><abbr bid="B69">69</abbr></abbrgrp> (see Methods for run parameters). We used divergence estimation error, instead of accuracy, so as to capture the directionality of errors (overestimated or underestimated). We defined it as the fractional difference between the Baseml estimate and the true divergence used in the simulation: (Estimate &#8211; True)/True.</p>
            <p>We first checked to see if divergence estimates from the simulated alignments are accurate. Indeed out to high divergence distances, Baseml estimates are very close to input divergences (figure <figr fid="F4">4</figr>).</p>
            <fig id="F4">
               <title>
                  <p>Figure 4</p>
               </title>
               <caption>
                  <p>Divergence Distance Estimation</p>
               </caption>
               <text>
                  <p><b>Divergence Distance Estimation</b>. Divergences estimated from tool alignments are overestimated at short divergence distances and underestimated at large divergence distances while divergences estimated from true simulated alignments are accurate to large divergence distances. A: Mean divergence distance estimated from simulated alignments and tool alignments for four species trees was measured as a function of total true divergence distance. B: Mean divergence estimation error (Estimate &#8211; True/True) for four species trees was measured as a function of total true divergence distance. C: Divergence estimation error from tool alignments is not correlated with alignment error. Mean divergence estimation error for four species trees was measured as a function of mean alignment error. D: Divergence estimation error varies across branches in a tree and is best for leaf-to-leaf alignments and worst for node-to-node alignments. Mean divergence estimation error along branches of equal true divergence from two, three and four species Mlagan alignments was measured as a function of true divergence distance.</p>
               </text>
               <graphic file="1471-2105-7-376-4"/>
            </fig>
            <p>We next looked to see if and how divergence estimation accuracy varies across tools and divergences. Our expectation was that divergence estimation accuracy would steadily decrease with divergence distance at a tool specific rate, as alignment accuracy does. Instead we found estimated divergences tend to be mostly accurate (or somewhat overestimated) at short divergence distances but are always underestimated at long divergence distances. Figure <figr fid="F4">4A</figr> shows divergence estimates from four species alignments across tools and divergences. Figure <figr fid="F4">4B</figr> shows the same data presented as divergence estimation error, as a function of true divergence distance. Perhaps most striking is the asymptotic approach of estimates to tool specific maxima. This result is consistent with Shabalina and Kondrashov's findings that the alignment of random sequences results in a percent identity much greater than the random expectation of the sum of the squared base frequencies <abbrgrp><abbr bid="B70">70</abbr></abbrgrp>. If diverging sequences evolve to a lower identity than that of random sequences then alignment tools treat them like they are random and produce an alignment that has a fixed divergence. Indeed aligned random sequences produce similar divergences as the observed maximum divergences from our simulations (data not shown). Interestingly, the two tools that have the highest maximum divergence (Clustalw and Mlagan) both overestimate divergences at short divergence distances while the two other tools do not. Finally, Tba, the only local alignment tool in our analysis, stops returning alignments before it reaches its maximum divergence, indicating that the algorithm can avoid aligning random alignments but therefore also cannot return weakly informative alignments at large divergence distances.</p>
            <p>Because divergence estimation accuracy appears to vary in different ways than alignment accuracy, we looked directly at their relationship. Figure <figr fid="F4">4C</figr> shows four species divergence estimation error as a function of alignment error. With the exception of Tba, which stops returning alignments while alignment error is still small, tools reach the point at which divergence estimates cease to increase close to the point at which alignment accuracy reaches its minimum. The accuracy of divergence estimates from Mavid may be due to the fact that it requires a tree with branch lengths and we provided the true divergences. The accuracy of divergence estimates from the other three tools is remarkable given the poor quality of the alignments at long divergence distances.</p>
            <p>We last looked to see if divergence estimation accuracy varies across branches in trees as alignment accuracy does. Across tools, divergence estimates were most accurate for leaf-to-leaf branches, less accurate for node-to-leaf branches and least accurate for node-to-node branches. Figure <figr fid="F4">4D</figr> shows the error in divergence estimates from Mlagan alignments of leaf-to-leaf, node-to-leaf and node-to-node branches in two, three and four species trees. Mlagan's tendency to overestimate divergence distances at short divergence distances and to underestimate divergence distances at long divergence distances is least pronounced in leaf-to-leaf alignments and most pronounced in node-to-node alignments. The point at which divergence distances cease to increase also appears to be at a shorter divergence distance for node-to-node branches than leaf-to-leaf branches, reflecting the lower alignment accuracy of those branches. The variation in divergence estimation accuracy across branches in a tree has significant implications for phylogenetic analysis of DNA sequences.</p>
         </sec>
      </sec>
      <sec>
         <st>
            <p>Discussion</p>
         </st>
         <p>Molecular evolutionary studies of noncoding DNA have either relied on the intuition that closely related species can be aligned well or have ignored alignment error all together <abbrgrp><abbr bid="B1">1</abbr><abbr bid="B2">2</abbr><abbr bid="B3">3</abbr><abbr bid="B4">4</abbr><abbr bid="B9">9</abbr></abbrgrp>. To gain perspective on how alignment might impact evolutionary analysis, we investigated multiple alignment accuracy and its relationship with two fundamental evolutionary inferences: transcription factor binding site conservation and divergence estimation.</p>
         <p>Because gold standards for base-level noncoding and regulatory DNA alignment accuracy do not exist, we developed a simulation platform called CisEvolver that can evolve background noncoding DNA, transcription factor binding site DNA or a mixture of the two (enhancers). We implemented CisEvolver with features of background and regulatory sequence evolution that are well modeled and are present in most comparative systems. Certainly more complicated evolutionary phenomena have been observed, and in cases where modeling is successful, ought be the subject of future studies. For instance, substitution rates have been shown to vary across sequences and have been modeled in various ways, most commonly using a gamma distribution <abbrgrp><abbr bid="B71">71</abbr></abbrgrp>. In our study we modeled both substitution and indel rate variation using interspersed transcription factor binding sites, but rates may vary for additional reasons other than regulatory constraints, in which case a gamma distribution in our background model may be appropriate. Interestingly, a recent study showed that using a gamma distribution in simulations has no effect on Clustalw alignment accuracy when comparing sequences with the same overall identity <abbrgrp><abbr bid="B6">6</abbr></abbrgrp>, suggesting that our results are likely robust to rate variation. Compensatory substitutions (like those observed in structural noncoding RNAs) <abbrgrp><abbr bid="B72">72</abbr><abbr bid="B73">73</abbr><abbr bid="B74">74</abbr></abbrgrp>, ancient and lineage specific repetitive sequences (like those common in mammals), inversions and rearrangements <abbrgrp><abbr bid="B75">75</abbr><abbr bid="B76">76</abbr></abbrgrp> could all be incorporated into the CisEvolver platform for alignment analysis. As models of the <it>cis</it>-regulatory code <abbrgrp><abbr bid="B77">77</abbr></abbrgrp> and binding site evolution <abbrgrp><abbr bid="B38">38</abbr><abbr bid="B57">57</abbr></abbrgrp> are developed, they too should be tested for affects on alignment accuracy. Additionally, the trees we chose to study are idealistic, in that they are ultrametric (leaves are equidistant from parent nodes), and they contain relatively few species compared to many real datasets. Trees with rate changes across many lineages would likely present additional problems that should be examined in future studies. A comprehensive analysis of the influence of tree shapes on alignment would be an interesting future direction (see <abbrgrp><abbr bid="B8">8</abbr></abbrgrp> for an initial analysis). Despite the absence of these more complicated or unexplored aspects of noncoding evolution in the current study, our results suggest that even under these simple and ideal circumstances numerous issues arise from alignment error that ought to be qualitatively informative for all systems.</p>
         <p>Using alignments generated by CisEvolver we tested the accuracy of alignments generated by four commonly used genomic alignment tools. All tools were run using their default parameter values (see Methods). It is quite possible that the accuracy of the alignments generated by some of these tools is highly sensitive to parameter settings and scoring schemes. In this study we focused on consistent behavior across tools and also how variation influenced inferences and were therefore not concerned with the relative performance of each tool. In order for users to optimize the use of current tools and also in order for designers of alignment tools to understand which algorithmic innovations actually improve alignment accuracy (beyond parameter settings), a systematic analysis of parameters is needed. In this study, using just default parameters, we found that the primary determinant of multiple alignment accuracy is the pairwise divergence distance between the two most diverged species in the alignment (figure <figr fid="F2">2D</figr>). Although dividing up a given divergence distance by more species improves accuracy (figure <figr fid="F2">2C</figr>), this appears to be simply due to the decrease in pairwise divergence separating the two most diverged species. Although we found that adding additional species (either in-groups or out-groups) to two species of a fixed divergence distance had an insignificant and inconsistent (across tools) impact on alignment accuracy (figure <figr fid="F2">2D</figr>), a concurrent study found that Clustalw alignments are most improved when an additional species is added at a distance equal to one third the pairwise distance separating two other species <abbrgrp><abbr bid="B8">8</abbr></abbrgrp> (which we note is the topology we used in this study; see figure <figr fid="F1">1</figr>). Brudno et al also found that adding mouse to human-fish alignments improved Mlagan alignments by 3% <abbrgrp><abbr bid="B65">65</abbr></abbrgrp>. If there is an affect of adding an in-group, our results suggest that it is weak and is not robust to alignment tool choice. Perhaps our most striking finding is that the accuracy of alignments varies across branches in a tree such that they are most accurate for alignments of sister taxa and least accurate between internal nodes that align sub-alignments. As we discuss below, this variation in accuracy causes variation in inferences across the tree, which could easily be construed as lineage specific biological variation. Future development of tools that minimize this distortion in accuracy across branches in a tree will be extremely valuable.</p>
         <p>The first evolutionary inference we examined was the measurement of the conservation of transcription factor binding sites in regulatory regions. Studies have used conservation of binding sites as either a means of classifying functional from spurious predictions <abbrgrp><abbr bid="B21">21</abbr><abbr bid="B22">22</abbr><abbr bid="B23">23</abbr><abbr bid="B24">24</abbr><abbr bid="B25">25</abbr><abbr bid="B26">26</abbr><abbr bid="B27">27</abbr><abbr bid="B28">28</abbr><abbr bid="B29">29</abbr><abbr bid="B30">30</abbr><abbr bid="B31">31</abbr><abbr bid="B32">32</abbr><abbr bid="B33">33</abbr></abbrgrp> or for the purposes of understanding their rates of change, or turnover <abbrgrp><abbr bid="B35">35</abbr><abbr bid="B36">36</abbr><abbr bid="B37">37</abbr><abbr bid="B38">38</abbr><abbr bid="B39">39</abbr><abbr bid="B40">40</abbr><abbr bid="B41">41</abbr><abbr bid="B42">42</abbr></abbrgrp>. Here we wanted to understand how far out such estimates might be accurate, what approaches might be taken to improve such estimates and also which features of regulatory regions might affect such estimates. We found that binding sites are usually aligned better than their surrounding sequences (figures <figr fid="F2">2B</figr> &amp;<figr fid="F3">3C</figr>) but are still misaligned starting at very short divergence distances (figure <figr fid="F3">3A</figr>). For instance, given the approximate divergence of <it>Drosophila pseudoobscura </it>from <it>Drosophila melanogaster</it>, at 1.79 substitutions per site <abbrgrp><abbr bid="B78">78</abbr></abbrgrp>, according to our results, only about 40% of truly conserved binding sites should even be overlapping in alignments. Unless the rate of binding site turnover is high enough such that the number of sites that have turned over is much larger than the 60% of truly conserved sites that have been misaligned, its unlikely that such a comparison would be useful for studying binding site evolution. If 40% binding site conservation, however, is higher than what might be expected in non-functional regions, then comparing these species might still be useful for predicting functional regulatory regions. Our finding that binding sites are often still overlapping in alignments even if they are not perfectly aligned (figure <figr fid="F3">3B</figr>) suggests that binding sites are not always strong alignment anchors, that small indels could lead to small alignment errors and that methods for identifying conserved binding sites that do not rely on perfect alignments would have greater sensitivity <abbrgrp><abbr bid="B21">21</abbr><abbr bid="B28">28</abbr><abbr bid="B79">79</abbr></abbrgrp> (the specificity of such methods, however, would need to be explored to understand their predictive power). Finally we found that the higher the density of sites in an enhancer, the higher the alignment accuracy of the binding sites within, presumably due to the overall higher constraint and suppression of indels. Bacterial and yeast regulatory regions, for instance, are not understood to contain such high-density arrays of binding sites as metazoans <abbrgrp><abbr bid="B80">80</abbr><abbr bid="B81">81</abbr></abbrgrp> and would therefore be expected to align more poorly, all else being equal. Although we also found that longer binding and more highly specified sites are easier to align, this requires further investigation with a larger panel of transcription factors. The variance in alignment accuracy introduced by such regulatory sequence properties is important to consider before determining the expected error from simulations or before interpreting an evolutionary comparison across regulatory regions.</p>
         <p>The second inference we considered was divergence distance estimation. We were impressed that our estimates using PAML's Baseml program on the true alignments generated in our simulations were highly accurate out to rather high divergences, suggesting that saturation does not lead to inaccuracies at short divergence distances, at least when the right model is used (figure <figr fid="F4">4A</figr> &amp;<figr fid="F4">4B</figr>). Because of the accuracy of the divergence inference step, we were able to look directly at the contribution of alignment error to divergence estimation. Although the tendency of two of the tools to overestimate divergences at short divergence distances is noteworthy (as was observed for Clustalw in <abbrgrp><abbr bid="B8">8</abbr></abbrgrp>), most striking is the behavior that all tools reach a unique divergence distance at which divergence estimates cease to increase (figures <figr fid="F4">4A</figr> &amp;<figr fid="F4">4B</figr>) (this underestimate was also observed for Clustalw in <abbrgrp><abbr bid="B8">8</abbr></abbrgrp>). This point of maximum divergence corresponded with the point at which tools reached their minimum alignment accuracy or where they were essentially randomly aligned (figure <figr fid="F4">4C</figr>). Shabalina and Kondrashov previously reported that unrelated sequences produce alignments that have a greater percent identity than would be theoretically predicted from base composition, suggesting that alignment tools add gaps to create more matches and fewer mismatches in order to maximize their scores <abbrgrp><abbr bid="B70">70</abbr></abbrgrp>. The "twilight zone" (the point where alignments become random) <abbrgrp><abbr bid="B82">82</abbr></abbrgrp> is therefore not 25% identity but instead is a much shorter divergence (or higher identity) which varies across alignment tools. For instance, pairwise alignments generated by Mavid reach the point where divergence estimates cease to increase at about 0.5 substitutions per site, which is approximately the divergence estimated for human and mouse, suggesting that fast evolving human or mouse sequences would on average not be detected as such from Mavid alignments. It is worth noting that Tba, stops returning alignments before it reaches the point where divergence estimates cease to increase, suggesting that the scoring scheme Tba uses to filter its alignments can avoid producing random alignments but also that it might fail to return an alignment with some remaining phylogenetic signal.</p>
         <p>Our findings that overall alignment accuracy, binding site alignment accuracy and divergence estimation accuracy vary greatly across branches in a tree have profound implications for phylogenetic research of noncoding DNA. All four of the tools we examined exhibit systematic biases toward higher accuracy along branches connecting sister taxa relative to branches connecting internal nodes (figures <figr fid="F2">2E, 2F, 3D</figr> &amp;<figr fid="F4">4D</figr>). Consider the example of studying binding site turnover rates relative to substitution rates in human, mouse and rat alignments. Even if these rates were constant across the tree, binding site turnover might be detected as higher along the human branch because of increased alignment error along the longer node-to-leaf branch and substitution rates might be underestimated along the human branch because it is longer than an alignment tool's maximum divergence. Theses two biases combined would then cause turnover events per substitution to be even more distorted along the human branch. These results strongly suggest that either new alignment tools that minimize this bias or new phylogenetic methods that control for this bias need to be developed.</p>
      </sec>
      <sec>
         <st>
            <p>Conclusion</p>
         </st>
         <p>Errors in the alignment of noncoding DNA are systematic phenomena that affect evolutionary inferences, decreasing accuracy and biasing results. In order to use the rich diversity of variation in more diverged sequences, new alignment and phylogenetic methods need to be developed to reduce and control for errors in automated alignment.</p>
      </sec>
      <sec>
         <st>
            <p>Methods</p>
         </st>
         <sec>
            <st>
               <p>CisEvolver</p>
            </st>
            <p>CisEvolver was written in Perl. It is available for download <abbrgrp><abbr bid="B59">59</abbr></abbrgrp>.</p>
         </sec>
         <sec>
            <st>
               <p>Trees</p>
            </st>
            <p>For both the divergence estimation and binding site conservation estimation simulations, each divergence distance tested was transformed into a Newick formatted tree. Figure <figr fid="F1">1</figr> shows how divergences were distributed across trees.</p>
         </sec>
         <sec>
            <st>
               <p>Divergence simulations</p>
            </st>
            <p>For the divergence estimation simulations, 100 simulations were run for each divergence distance. For each simulation, a 10 kb ancestral sequence was randomly generated from the <it>D. melangaster </it>mono-nulceotide base frequencies (60/40 AT/CG). The 10 kb sequences were evolved from the root node of the tree down the branches to leaves using a substitution and indel model. Substitutions occurred according to the HKY85 substitution model <abbrgrp><abbr bid="B54">54</abbr></abbrgrp>, using the <it>D. melanogaster </it>mono-nucleotide base frequencies and kappa set to 2.0 as has been observed in <it>Drosophila </it><abbrgrp><abbr bid="B83">83</abbr></abbrgrp>. Indel events occurred according to a Poisson indel event model:</p>
            <p><it>p</it><sub><it>indel </it></sub>= 1 - <it>e</it><sup>-<it>Rk</it></sup></p>
            <p>where <it>R </it>is the relative rate of indels to substitutions and <it>k </it>is the length of the branch. In <it>Drosophila </it>indels have been found to occur approximately 10% the rate of substitutions so we used <it>R </it>= 0.1 <abbrgrp><abbr bid="B84">84</abbr><abbr bid="B85">85</abbr></abbrgrp>. Indel lengths were determined by a frequency distribution derived from <it>D. melanogaster </it>indel polymorphisms with a maximum of 58 bp <abbrgrp><abbr bid="B55">55</abbr></abbrgrp>. Insertions and deletions were treated identically.</p>
         </sec>
         <sec>
            <st>
               <p>Cis-regulatory sequences</p>
            </st>
            <p>Thirty-six experimentally characterized <it>cis</it>-regulatory regions that have been found to drive expression patterns in reporter assays recapitulating some or all of the expression pattern of an adjacent gene were collected from two recent papers on anterior/posterior patterning in <it>D. melanogaster </it><abbrgrp><abbr bid="B26">26</abbr><abbr bid="B60">60</abbr></abbrgrp>. The sequences were mapped to release 4.0 of <it>D. melanogaster </it>using BLAT <abbrgrp><abbr bid="B86">86</abbr></abbrgrp>. A GFF file with the enhancer coordinates is available in <supplr sid="S1">additional file 1</supplr>: Enhancers.gff.</p>
            <suppl id="S1">
               <title>
                  <p>Additional File 1</p>
               </title>
               <text>
                  <p>This file, in GFF2 format <abbrgrp><abbr bid="B91">91</abbr></abbrgrp>, contains the coordinates of the 36 enhancers used in this study in <it>Drosophila melanogaster </it>release 4 coordinates <abbrgrp><abbr bid="B92">92</abbr></abbrgrp>.</p>
               </text>
               <file name="1471-2105-7-376-S1.gff">
                  <p>Click here for file</p>
               </file>
            </suppl>
         </sec>
         <sec>
            <st>
               <p>Transcription factor binding sites</p>
            </st>
            <p>The 36 <it>cis</it>-regulatory regions used in the study have been reported to be bound or predicted to be bound by some combination of the following factors: Bicoid <abbrgrp><abbr bid="B61">61</abbr></abbrgrp>, Caudal <abbrgrp><abbr bid="B61">61</abbr></abbrgrp>, Giant <abbrgrp><abbr bid="B62">62</abbr></abbrgrp>, Hunchback <abbrgrp><abbr bid="B62">62</abbr></abbrgrp>, Knirps <abbrgrp><abbr bid="B62">62</abbr></abbrgrp>, Kruppel <abbrgrp><abbr bid="B62">62</abbr></abbrgrp>, Tailless <abbrgrp><abbr bid="B62">62</abbr></abbrgrp> and Torso-response element <abbrgrp><abbr bid="B60">60</abbr></abbrgrp>. Position weight matrices (PWMs) were either taken from published resources <abbrgrp><abbr bid="B60">60</abbr><abbr bid="B61">61</abbr></abbrgrp> or were generated from published footprints <abbrgrp><abbr bid="B62">62</abbr></abbrgrp> using MEME <abbrgrp><abbr bid="B87">87</abbr></abbrgrp> (described at <abbrgrp><abbr bid="B88">88</abbr></abbrgrp>). Matrices are available in <supplr sid="S2">additional file 2</supplr>: Matrices.txt.</p>
            <suppl id="S2">
               <title>
                  <p>Additional File 2</p>
               </title>
               <text>
                  <p>This text file contains horizontal counts matrices and vertical frequency matrices for each of the eight transcription factors used in this study.</p>
               </text>
               <file name="1471-2105-7-376-S2.txt">
                  <p>Click here for file</p>
               </file>
            </suppl>
            <p>For each of the 36 <it>cis</it>-regulatory regions, PASTER <abbrgrp><abbr bid="B89">89</abbr></abbrgrp> was used to find sites with a p-value less than 10<sup>-3 </sup>for each of the eight PWMs. If sites were overlapping one was randomly chosen and the others were thrown out.</p>
         </sec>
         <sec>
            <st>
               <p>Transcription factor binding site conservation simulations</p>
            </st>
            <p>For the binding site conservation simulations, 25 replicates for each of the 36 <it>cis</it>-regulatory regions were evolved to each of the divergence distances. Sequences were evolved from the root down the branches of each tree using either a background or binding site mutation model. Non-binding site sequences in the enhancers were evolved according the HKY85 and indel models described above. Binding sites were evolved according to the HB98 substitution model <abbrgrp><abbr bid="B56">56</abbr></abbrgrp>. We have previously shown that there is position-specific variation in substitution rates across functional binding sites and that the HB98 substitution model accurately predicts the relationship between the degeneracy of positions in a PWM and the position specific substitution rate across binding sites <abbrgrp><abbr bid="B28">28</abbr><abbr bid="B57">57</abbr></abbrgrp>. The rate of change from residue a to b at position <it>i </it>in the binding site is given by:</p>
            <p>
               <m:math name="1471-2105-7-376-i1" xmlns:m="http://www.w3.org/1998/Math/MathML">
                  <m:semantics>
                     <m:mrow>
                        <m:mi>R</m:mi>
                        <m:msub>
                           <m:mrow>
                              <m:mo stretchy="false">(</m:mo>
                              <m:mi>i</m:mi>
                              <m:mo stretchy="false">)</m:mo>
                           </m:mrow>
                           <m:mrow>
                              <m:mi>a</m:mi>
                              <m:mi>b</m:mi>
                           </m:mrow>
                        </m:msub>
                        <m:mo>=</m:mo>
                        <m:msub>
                           <m:mi>Q</m:mi>
                           <m:mrow>
                              <m:mi>a</m:mi>
                              <m:mi>b</m:mi>
                           </m:mrow>
                        </m:msub>
                        <m:mfrac>
                           <m:mrow>
                              <m:mi>log</m:mi>
                              <m:mo>&#8289;</m:mo>
                              <m:mrow>
                                 <m:mo>(</m:mo>
                                 <m:mrow>
                                    <m:mfrac>
                                       <m:mrow>
                                          <m:msub>
                                             <m:mi>f</m:mi>
                                             <m:mrow>
                                                <m:mi>i</m:mi>
                                                <m:mi>b</m:mi>
                                             </m:mrow>
                                          </m:msub>
                                          <m:msub>
                                             <m:mi>Q</m:mi>
                                             <m:mrow>
                                                <m:mi>b</m:mi>
                                                <m:mi>a</m:mi>
                                             </m:mrow>
                                          </m:msub>
                                       </m:mrow>
                                       <m:mrow>
                                          <m:msub>
                                             <m:mi>f</m:mi>
                                             <m:mrow>
                                                <m:mi>i</m:mi>
                                                <m:mi>a</m:mi>
                                             </m:mrow>
                                          </m:msub>
                                          <m:msub>
                                             <m:mi>Q</m:mi>
                                             <m:mrow>
                                                <m:mi>a</m:mi>
                                                <m:mi>b</m:mi>
                                             </m:mrow>
                                          </m:msub>
                                       </m:mrow>
                                    </m:mfrac>
                                 </m:mrow>
                                 <m:mo>)</m:mo>
                              </m:mrow>
                           </m:mrow>
                           <m:mrow>
                              <m:mn>1</m:mn>
                              <m:mo>&#8722;</m:mo>
                              <m:mfrac>
                                 <m:mrow>
                                    <m:msub>
                                       <m:mi>f</m:mi>
                                       <m:mrow>
                                          <m:mi>i</m:mi>
                                          <m:mi>a</m:mi>
                                       </m:mrow>
                                    </m:msub>
                                    <m:msub>
                                       <m:mi>Q</m:mi>
                                       <m:mrow>
                                          <m:mi>a</m:mi>
                                          <m:mi>b</m:mi>
                                       </m:mrow>
                                    </m:msub>
                                 </m:mrow>
                                 <m:mrow>
                                    <m:msub>
                                       <m:mi>f</m:mi>
                                       <m:mrow>
                                          <m:mi>i</m:mi>
                                          <m:mi>b</m:mi>
                                       </m:mrow>
                                    </m:msub>
                                    <m:msub>
                                       <m:mi>Q</m:mi>
                                       <m:mrow>
                                          <m:mi>b</m:mi>
                                          <m:mi>a</m:mi>
                                       </m:mrow>
                                    </m:msub>
                                 </m:mrow>
                              </m:mfrac>
                           </m:mrow>
                        </m:mfrac>
                        <m:mo>,</m:mo>
                     </m:mrow>
                     <m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacqWGsbGucqGGOaakcqWGPbqAcqGGPaqkdaWgaaWcbaGaemyyaeMaemOyaigabeaakiabg2da9iabdgfarnaaBaaaleaacqWGHbqycqWGIbGyaeqaaOWaaSaaaeaacyGGSbaBcqGGVbWBcqGGNbWzdaqadiqaamaalaaabaGaemOzay2aaSbaaSqaaiabdMgaPjabdkgaIbqabaGccqWGrbqudaWgaaWcbaGaemOyaiMaemyyaegabeaaaOqaaiabdAgaMnaaBaaaleaacqWGPbqAcqWGHbqyaeqaaOGaemyuae1aaSbaaSqaaiabdggaHjabdkgaIbqabaaaaaGccaGLOaGaayzkaaaabaGaeGymaeJaeyOeI0YaaSaaaeaacqWGMbGzdaWgaaWcbaGaemyAaKMaemyyaegabeaakiabdgfarnaaBaaaleaacqWGHbqycqWGIbGyaeqaaaGcbaGaemOzay2aaSbaaSqaaiabdMgaPjabdkgaIbqabaGccqWGrbqudaWgaaWcbaGaemOyaiMaemyyaegabeaaaaaaaOGaeiilaWcaaa@61F5@</m:annotation>
                  </m:semantics>
               </m:math>
            </p>
            <p>where <it>Q </it>is the background substitution model (HKY85) and <it>f </it>is the PWM for the factor. Indel events were not permitted in binding sites and deletions from background sequences were not allowed to extend into binding sites.</p>
         </sec>
         <sec>
            <st>
               <p>Alignments</p>
            </st>
            <p>Alignments were performed using default parameter values for each of the following tools: Clustalw <abbrgrp><abbr bid="B63">63</abbr></abbrgrp>, Mavid v0.9 <abbrgrp><abbr bid="B64">64</abbr></abbrgrp>, Mlagan v1.2 <abbrgrp><abbr bid="B65">65</abbr></abbrgrp> and Blastz/Tba <abbrgrp><abbr bid="B7">7</abbr><abbr bid="B66">66</abbr><abbr bid="B67">67</abbr></abbrgrp>.</p>
         </sec>
         <sec>
            <st>
               <p>Alignment accuracy</p>
            </st>
            <p>Alignment accuracy was defined as</p>
            <p>
               <m:math name="1471-2105-7-376-i2" xmlns:m="http://www.w3.org/1998/Math/MathML">
                  <m:semantics>
                     <m:mrow>
                        <m:mi>A</m:mi>
                        <m:mi>c</m:mi>
                        <m:mi>c</m:mi>
                        <m:mo>=</m:mo>
                        <m:mfrac>
                           <m:mrow>
                              <m:msub>
                                 <m:mi>C</m:mi>
                                 <m:mrow>
                                    <m:mi>T</m:mi>
                                    <m:mi>S</m:mi>
                                    <m:mi>U</m:mi>
                                 </m:mrow>
                              </m:msub>
                           </m:mrow>
                           <m:mrow>
                              <m:msub>
                                 <m:mi>C</m:mi>
                                 <m:mrow>
                                    <m:mi>S</m:mi>
                                    <m:mi>U</m:mi>
                                 </m:mrow>
                              </m:msub>
                           </m:mrow>
                        </m:mfrac>
                        <m:mo>,</m:mo>
                     </m:mrow>
                     <m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacqWGbbqqcqWGJbWycqWGJbWycqGH9aqpdaWcaaqaaiabdoeadnaaBaaaleaacqWGubavcqWGtbWucqWGvbqvaeqaaaGcbaGaem4qam0aaSbaaSqaaiabdofatjabdwfavbqabaaaaOGaeiilaWcaaa@3ACA@</m:annotation>
                  </m:semantics>
               </m:math>
            </p>
            <p>where <it>C</it><sub><it>SU </it></sub>is the count of the ungapped columns in the simulated alignment and <it>C</it><sub><it>TSU </it></sub>is the count of the ungapped columns in the simulated alignment that are aligned identically in the tool alignment. This measure is the same as "sensitivity" defined in <abbrgrp><abbr bid="B5">5</abbr></abbrgrp>.</p>
            <p>Branch specific alignment accuracy was calculated similarly except that <it>C</it><sub><it>SU </it></sub>was the count of ungapped columns for which the alignment was joining either sequences or correctly aligned sub-alignments and <it>C</it><sub><it>TSU </it></sub>was the count of such columns in the simulated alignment that were aligned identically in the tool alignment. For instance, in a four species alignment, the node-to-node alignment accuracy was only based on the columns for which Seq1 and Seq2 were aligned correctly to each other and Seq3 and Seq4 were aligned correctly to each other (figure <figr fid="F1">1</figr>). Similarly, in a three species alignment, the node-to-leaf alignment accuracy was only based on the columns for which Seq1 and Seq2 were aligned correctly to each other. The motivation for this was to consider only the contribution to alignment accuracy a given branch contributes.</p>
            <p>A script written in PERL that can calculate these measures is available for download <abbrgrp><abbr bid="B59">59</abbr></abbrgrp>.</p>
         </sec>
         <sec>
            <st>
               <p>Binding site alignment measures</p>
            </st>
            <p>Binding site alignment was evaluated based on two measures. Sites that had the same start and stop position in each sequence in an alignment were considered to be perfectly aligned. Sites that were overlapping by at least one base in each of the sequence in an alignment were considered to be overlapping. The fraction of sites that were perfectly aligned and the fraction of sites overlapping in alignments across all <it>cis</it>-regulatory regions and all replicates are reported. The Pearson correlation between the density of binding sites in <it>cis</it>-regulatory regions and each measure as well as the correlation between the length of binding sites for each factor and each measure were calculated using the R statistics package<abbrgrp><abbr bid="B90">90</abbr></abbrgrp>.</p>
         </sec>
         <sec>
            <st>
               <p>Divergence estimation</p>
            </st>
            <p>Divergence estimates were calculated using the baseml program from the PAML package v3.14 <abbrgrp><abbr bid="B69">69</abbr></abbrgrp>. Baseml was run with the HKY85 model, estimating kappa with an initial value of 2, fixed alpha at infinity, no clock and estimating the equilibrium base frequencies from the observed averages.</p>
         </sec>
      </sec>
      <sec>
         <st>
            <p>Authors' contributions</p>
         </st>
         <p>DAP designed the research, performed the research, analyzed the data and wrote the paper. AMM contributed to the development of the CisEvolver program. All authors contributed to the research design and the writing of the paper.</p>
      </sec>
   </bdy>
   <bm>
      <ack>
         <sec>
            <st>
               <p>Acknowledgements</p>
            </st>
            <p>We thank Michael Brudno and Shyam Prabhakar for discussions on alignment accuracy. We thank Casey Bergman for prepublication access to the flyreg.org database as well as for comments on the manuscript. This work was funded by NIH R01-HG002779-02 to MBE.</p>
         </sec>
      </ack>
      <refgrp>
         <bibl id="B1">
            <title>
               <p>Trade-offs in detecting evolutionarily constrained sequence by comparative genomics</p>
            </title>
            <aug>
               <au>
                  <snm>Stone</snm>
                  <fnm>EA</fnm>
               </au>
               <au>
                  <snm>Cooper</snm>
                  <fnm>GM</fnm>
               </au>
               <au>
                  <snm>Sidow</snm>
                  <fnm>A</fnm>
               </au>
            </aug>
            <source>Annu Rev Genomics Hum Genet</source>
            <pubdate>2005</pubdate>
            <volume>6</volume>
            <fpage>143</fpage>
            <lpage>164</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1146/annurev.genom.6.080604.162146</pubid>
                  <pubid idtype="pmpid" link="fulltext">16124857</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B2">
            <title>
               <p>Comparison of genomic DNA sequences: solved and unsolved problems</p>
            </title>
            <aug>
               <au>
                  <snm>Miller</snm>
                  <fnm>W</fnm>
               </au>
            </aug>
            <source>Bioinformatics</source>
            <pubdate>2001</pubdate>
            <volume>17</volume>
            <issue>5</issue>
            <fpage>391</fpage>
            <lpage>397</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/bioinformatics/17.5.391</pubid>
                  <pubid idtype="pmpid" link="fulltext">11331233</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B3">
            <title>
               <p>Comparative genomics</p>
            </title>
            <aug>
               <au>
                  <snm>Miller</snm>
                  <fnm>W</fnm>
               </au>
               <au>
                  <snm>Makova</snm>
                  <fnm>KD</fnm>
               </au>
               <au>
                  <snm>Nekrutenko</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Hardison</snm>
                  <fnm>RC</fnm>
               </au>
            </aug>
            <source>Annu Rev Genomics Hum Genet</source>
            <pubdate>2004</pubdate>
            <volume>5</volume>
            <fpage>15</fpage>
            <lpage>56</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1146/annurev.genom.5.061903.180057</pubid>
                  <pubid idtype="pmpid" link="fulltext">15485342</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B4">
            <title>
               <p>The many faces of sequence alignment</p>
            </title>
            <aug>
               <au>
                  <snm>Batzoglou</snm>
                  <fnm>S</fnm>
               </au>
            </aug>
            <source>Brief Bioinform</source>
            <pubdate>2005</pubdate>
            <volume>6</volume>
            <issue>1</issue>
            <fpage>6</fpage>
            <lpage>22</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/bib/6.1.6</pubid>
                  <pubid idtype="pmpid" link="fulltext">15826353</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B5">
            <title>
               <p>Benchmarking tools for the alignment of functional noncoding DNA</p>
            </title>
            <aug>
               <au>
                  <snm>Pollard</snm>
                  <fnm>DA</fnm>
               </au>
               <au>
                  <snm>Bergman</snm>
                  <fnm>CM</fnm>
               </au>
               <au>
                  <snm>Stoye</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Celniker</snm>
                  <fnm>SE</fnm>
               </au>
               <au>
                  <snm>Eisen</snm>
                  <fnm>MB</fnm>
               </au>
            </aug>
            <source>BMC Bioinformatics</source>
            <pubdate>2004</pubdate>
            <volume>5</volume>
            <issue>1</issue>
            <fpage>6</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">344529</pubid>
                  <pubid idtype="pmpid" link="fulltext">14736341</pubid>
                  <pubid idtype="doi">10.1186/1471-2105-5-6</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B6">
            <title>
               <p>Evolutionary distance estimation and fidelity of pair wise sequence alignment</p>
            </title>
            <aug>
               <au>
                  <snm>Rosenberg</snm>
                  <fnm>MS</fnm>
               </au>
            </aug>
            <source>BMC Bioinformatics</source>
            <pubdate>2005</pubdate>
            <volume>6</volume>
            <issue>1</issue>
            <fpage>102</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1087827</pubid>
                  <pubid idtype="pmpid" link="fulltext">15840174</pubid>
                  <pubid idtype="doi">10.1186/1471-2105-6-102</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B7">
            <title>
               <p>Aligning multiple genomic sequences with the threaded blockset aligner</p>
            </title>
            <aug>
               <au>
                  <snm>Blanchette</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Kent</snm>
                  <fnm>WJ</fnm>
               </au>
               <au>
                  <snm>Riemer</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Elnitski</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Smit</snm>
                  <fnm>AF</fnm>
               </au>
               <au>
                  <snm>Roskin</snm>
                  <fnm>KM</fnm>
               </au>
               <au>
                  <snm>Baertsch</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Rosenbloom</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Clawson</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Green</snm>
                  <fnm>ED</fnm>
               </au>
               <au>
                  <snm>Haussler</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Miller</snm>
                  <fnm>W</fnm>
               </au>
            </aug>
            <source>Genome Res</source>
            <pubdate>2004</pubdate>
            <volume>14</volume>
            <issue>4</issue>
            <fpage>708</fpage>
            <lpage>715</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">383317</pubid>
                  <pubid idtype="pmpid" link="fulltext">15060014</pubid>
                  <pubid idtype="doi">10.1101/gr.1933104</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B8">
            <title>
               <p>Multiple sequence alignment accuracy and evolutionary distance estimation</p>
            </title>
            <aug>
               <au>
                  <snm>Rosenberg</snm>
                  <fnm>MS</fnm>
               </au>
            </aug>
            <source>BMC Bioinformatics</source>
            <pubdate>2005</pubdate>
            <volume>6</volume>
            <issue>1</issue>
            <fpage>278</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1318491</pubid>
                  <pubid idtype="pmpid" link="fulltext">16305750</pubid>
                  <pubid idtype="doi">10.1186/1471-2105-6-278</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B9">
            <title>
               <p>A model of the statistical power of comparative genome sequence analysis</p>
            </title>
            <aug>
               <au>
                  <snm>Eddy</snm>
                  <fnm>SR</fnm>
               </au>
            </aug>
            <source>PLoS Biol</source>
            <pubdate>2005</pubdate>
            <volume>3</volume>
            <issue>1</issue>
            <fpage>e10</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">539325</pubid>
                  <pubid idtype="pmpid" link="fulltext">15660152</pubid>
                  <pubid idtype="doi">10.1371/journal.pbio.0030010</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B10">
            <title>
               <p>Estimation of evolutionary distances between nucleotide sequences</p>
            </title>
            <aug>
               <au>
                  <snm>Zharkikh</snm>
                  <fnm>A</fnm>
               </au>
            </aug>
            <source>J Mol Evol</source>
            <pubdate>1994</pubdate>
            <volume>39</volume>
            <issue>3</issue>
            <fpage>315</fpage>
            <lpage>329</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1007/BF00160155</pubid>
                  <pubid idtype="pmpid">7932793</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B11">
            <title>
               <p>Performance of a divergence time estimation method under a probabilistic model of rate evolution</p>
            </title>
            <aug>
               <au>
                  <snm>Kishino</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Thorne</snm>
                  <fnm>JL</fnm>
               </au>
               <au>
                  <snm>Bruno</snm>
                  <fnm>WJ</fnm>
               </au>
            </aug>
            <source>Mol Biol Evol</source>
            <pubdate>2001</pubdate>
            <volume>18</volume>
            <issue>3</issue>
            <fpage>352</fpage>
            <lpage>361</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">11230536</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B12">
            <title>
               <p>Phylogenies from molecular sequences: inference and reliability</p>
            </title>
            <aug>
               <au>
                  <snm>Felsenstein</snm>
                  <fnm>J</fnm>
               </au>
            </aug>
            <source>Annu Rev Genet</source>
            <pubdate>1988</pubdate>
            <volume>22</volume>
            <fpage>521</fpage>
            <lpage>565</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1146/annurev.ge.22.120188.002513</pubid>
                  <pubid idtype="pmpid" link="fulltext">3071258</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B13">
            <title>
               <p>Relative efficiencies of the maximum-parsimony and distance-matrix methods of phylogeny construction for restriction data</p>
            </title>
            <aug>
               <au>
                  <snm>Lin</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Nei</snm>
                  <fnm>M</fnm>
               </au>
            </aug>
            <source>Mol Biol Evol</source>
            <pubdate>1991</pubdate>
            <volume>8</volume>
            <issue>3</issue>
            <fpage>356</fpage>
            <lpage>365</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">1677154</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B14">
            <title>
               <p>Relative efficiencies of the maximum-likelihood, neighbor-joining, and maximum-parsimony methods when substitution rate varies with site</p>
            </title>
            <aug>
               <au>
                  <snm>Tateno</snm>
                  <fnm>Y</fnm>
               </au>
               <au>
                  <snm>Takezaki</snm>
                  <fnm>N</fnm>
               </au>
               <au>
                  <snm>Nei</snm>
                  <fnm>M</fnm>
               </au>
            </aug>
            <source>Mol Biol Evol</source>
            <pubdate>1994</pubdate>
            <volume>11</volume>
            <issue>2</issue>
            <fpage>261</fpage>
            <lpage>277</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">8170367</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B15">
            <title>
               <p>Application and accuracy of molecular phylogenies</p>
            </title>
            <aug>
               <au>
                  <snm>Hillis</snm>
                  <fnm>DM</fnm>
               </au>
               <au>
                  <snm>Huelsenbeck</snm>
                  <fnm>JP</fnm>
               </au>
               <au>
                  <snm>Cunningham</snm>
                  <fnm>CW</fnm>
               </au>
            </aug>
            <source>Science</source>
            <pubdate>1994</pubdate>
            <volume>264</volume>
            <issue>5159</issue>
            <fpage>671</fpage>
            <lpage>677</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1126/science.8171318</pubid>
                  <pubid idtype="pmpid" link="fulltext">8171318</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B16">
            <title>
               <p>Comparative analysis of multiple protein-sequence alignment methods</p>
            </title>
            <aug>
               <au>
                  <snm>McClure</snm>
                  <fnm>MA</fnm>
               </au>
               <au>
                  <snm>Vasi</snm>
                  <fnm>TK</fnm>
               </au>
               <au>
                  <snm>Fitch</snm>
                  <fnm>WM</fnm>
               </au>
            </aug>
            <source>Mol Biol Evol</source>
            <pubdate>1994</pubdate>
            <volume>11</volume>
            <issue>4</issue>
            <fpage>571</fpage>
            <lpage>592</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">8078398</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B17">
            <title>
               <p>A comprehensive comparison of multiple sequence alignment programs</p>
            </title>
            <aug>
               <au>
                  <snm>Thompson</snm>
                  <fnm>JD</fnm>
               </au>
               <au>
                  <snm>Plewniak</snm>
                  <fnm>F</fnm>
               </au>
               <au>
                  <snm>Poch</snm>
                  <fnm>O</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>1999</pubdate>
            <volume>27</volume>
            <issue>13</issue>
            <fpage>2682</fpage>
            <lpage>2690</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">148477</pubid>
                  <pubid idtype="pmpid" link="fulltext">10373585</pubid>
                  <pubid idtype="doi">10.1093/nar/27.13.2682</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B18">
            <title>
               <p>Accurate anchoring alignment of divergent sequences</p>
            </title>
            <aug>
               <au>
                  <snm>Huang</snm>
                  <fnm>W</fnm>
               </au>
               <au>
                  <snm>Umbach</snm>
                  <fnm>DM</fnm>
               </au>
               <au>
                  <snm>Li</snm>
                  <fnm>L</fnm>
               </au>
            </aug>
            <source>Bioinformatics</source>
            <pubdate>2005</pubdate>
         </bibl>
         <bibl id="B19">
            <title>
               <p>MCALIGN: stochastic alignment of noncoding DNA sequences based on an evolutionary model of sequence evolution</p>
            </title>
            <aug>
               <au>
                  <snm>Keightley</snm>
                  <fnm>PD</fnm>
               </au>
               <au>
                  <snm>Johnson</snm>
                  <fnm>T</fnm>
               </au>
            </aug>
            <source>Genome Res</source>
            <pubdate>2004</pubdate>
            <volume>14</volume>
            <issue>3</issue>
            <fpage>442</fpage>
            <lpage>450</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">353231</pubid>
                  <pubid idtype="pmpid" link="fulltext">14993209</pubid>
                  <pubid idtype="doi">10.1101/gr.1571904</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B20">
            <title>
               <p>Genomic Regulatory Systems</p>
            </title>
            <aug>
               <au>
                  <snm>Davidson</snm>
                  <fnm>EH</fnm>
               </au>
            </aug>
            <publisher>San Diego, CA , Academic Press</publisher>
            <pubdate>2001</pubdate>
            <fpage>261</fpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">11264933</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B21">
            <title>
               <p>Human-mouse genome comparisons to locate regulatory sites</p>
            </title>
            <aug>
               <au>
                  <snm>Wasserman</snm>
                  <fnm>WW</fnm>
               </au>
               <au>
                  <snm>Palumbo</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Thompson</snm>
                  <fnm>W</fnm>
               </au>
               <au>
                  <snm>Fickett</snm>
                  <fnm>JW</fnm>
               </au>
               <au>
                  <snm>Lawrence</snm>
                  <fnm>CE</fnm>
               </au>
            </aug>
            <source>Nat Genet</source>
            <pubdate>2000</pubdate>
            <volume>26</volume>
            <issue>2</issue>
            <fpage>225</fpage>
            <lpage>228</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/79965</pubid>
                  <pubid idtype="pmpid" link="fulltext">11017083</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B22">
            <title>
               <p>Factors influencing the identification of transcription factor binding sites by cross-species comparison</p>
            </title>
            <aug>
               <au>
                  <snm>McCue</snm>
                  <fnm>LA</fnm>
               </au>
               <au>
                  <snm>Thompson</snm>
                  <fnm>W</fnm>
               </au>
               <au>
                  <snm>Carmack</snm>
                  <fnm>CS</fnm>
               </au>
               <au>
                  <snm>Lawrence</snm>
                  <fnm>CE</fnm>
               </au>
            </aug>
            <source>Genome Res</source>
            <pubdate>2002</pubdate>
            <volume>12</volume>
            <issue>10</issue>
            <fpage>1523</fpage>
            <lpage>1532</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">187528</pubid>
                  <pubid idtype="pmpid" link="fulltext">12368244</pubid>
                  <pubid idtype="doi">10.1101/gr.323602</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B23">
            <title>
               <p>Embryonic enhancers in the dpp disk region regulate a second round of Dpp signaling from the dorsal ectoderm to the mesoderm that represses Zfh-1 expression in a subset of pericardial cells</p>
            </title>
            <aug>
               <au>
                  <snm>Johnson</snm>
                  <fnm>AN</fnm>
               </au>
               <au>
                  <snm>Bergman</snm>
                  <fnm>CM</fnm>
               </au>
               <au>
                  <snm>Kreitman</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Newfeld</snm>
                  <fnm>SJ</fnm>
               </au>
            </aug>
            <source>Dev Biol</source>
            <pubdate>2003</pubdate>
            <volume>262</volume>
            <issue>1</issue>
            <fpage>137</fpage>
            <lpage>151</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/S0012-1606(03)00350-6</pubid>
                  <pubid idtype="pmpid" link="fulltext">14512024</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B24">
            <title>
               <p>Combining phylogenetic data with co-regulated genes to identify regulatory motifs</p>
            </title>
            <aug>
               <au>
                  <snm>Wang</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Stormo</snm>
                  <fnm>GD</fnm>
               </au>
            </aug>
            <source>Bioinformatics</source>
            <pubdate>2003</pubdate>
            <volume>19</volume>
            <issue>18</issue>
            <fpage>2369</fpage>
            <lpage>2380</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/bioinformatics/btg329</pubid>
                  <pubid idtype="pmpid" link="fulltext">14668220</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B25">
            <title>
               <p>Prediction of similarly acting cis-regulatory modules by subsequence profiling and comparative genomics in <it>Drosophila melanogaster</it> and <it>D.pseudoobscura</it></p>
            </title>
            <aug>
               <au>
                  <snm>Grad</snm>
                  <fnm>YH</fnm>
               </au>
               <au>
                  <snm>Roth</snm>
                  <fnm>FP</fnm>
               </au>
               <au>
                  <snm>Halfon</snm>
                  <fnm>MS</fnm>
               </au>
               <au>
                  <snm>Church</snm>
                  <fnm>GM</fnm>
               </au>
            </aug>
            <source>Bioinformatics</source>
            <pubdate>2004</pubdate>
            <volume>20</volume>
            <issue>16</issue>
            <fpage>2738</fpage>
            <lpage>2750</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/bioinformatics/bth320</pubid>
                  <pubid idtype="pmpid" link="fulltext">15145800</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B26">
            <title>
               <p>Computational identification of developmental enhancers: conservation and function of transcription factor binding-site clusters in Drosophila melanogaster and Drosophila pseudoobscura</p>
            </title>
            <aug>
               <au>
                  <snm>Berman</snm>
                  <fnm>BP</fnm>
               </au>
               <au>
                  <snm>Pfeiffer</snm>
                  <fnm>BD</fnm>
               </au>
               <au>
                  <snm>Laverty</snm>
                  <fnm>TR</fnm>
               </au>
               <au>
                  <snm>Salzberg</snm>
                  <fnm>SL</fnm>
               </au>
               <au>
                  <snm>Rubin</snm>
                  <fnm>GM</fnm>
               </au>
               <au>
                  <snm>Eisen</snm>
                  <fnm>MB</fnm>
               </au>
               <au>
                  <snm>Celniker</snm>
                  <fnm>SE</fnm>
               </au>
            </aug>
            <source>Genome Biol</source>
            <pubdate>2004</pubdate>
            <volume>5</volume>
            <issue>9</issue>
            <fpage>R61</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">522868</pubid>
                  <pubid idtype="pmpid" link="fulltext">15345045</pubid>
                  <pubid idtype="doi">10.1186/gb-2004-5-9-r61</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B27">
            <title>
               <p>Cross-species comparison significantly improves genome-wide prediction of cis-regulatory modules in Drosophila</p>
            </title>
            <aug>
               <au>
                  <snm>Sinha</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Schroeder</snm>
                  <fnm>MD</fnm>
               </au>
               <au>
                  <snm>Unnerstall</snm>
                  <fnm>U</fnm>
               </au>
               <au>
                  <snm>Gaul</snm>
                  <fnm>U</fnm>
               </au>
               <au>
                  <snm>Siggia</snm>
                  <fnm>ED</fnm>
               </au>
            </aug>
            <source>BMC Bioinformatics</source>
            <pubdate>2004</pubdate>
            <volume>5</volume>
            <fpage>129</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">521067</pubid>
                  <pubid idtype="pmpid" link="fulltext">15357878</pubid>
                  <pubid idtype="doi">10.1186/1471-2105-5-129</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B28">
            <title>
               <p>MONKEY: identifying conserved transcription-factor binding sites in multiple alignments using a binding site-specific evolutionary model</p>
            </title>
            <aug>
               <au>
                  <snm>Moses</snm>
                  <fnm>AM</fnm>
               </au>
               <au>
                  <snm>Chiang</snm>
                  <fnm>DY</fnm>
               </au>
               <au>
                  <snm>Pollard</snm>
                  <fnm>DA</fnm>
               </au>
               <au>
                  <snm>Iyer</snm>
                  <fnm>VN</fnm>
               </au>
               <au>
                  <snm>Eisen</snm>
                  <fnm>MB</fnm>
               </au>
            </aug>
            <source>Genome Biol</source>
            <pubdate>2004</pubdate>
            <volume>5</volume>
            <issue>12</issue>
            <fpage>R98</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">545801</pubid>
                  <pubid idtype="pmpid" link="fulltext">15575972</pubid>
                  <pubid idtype="doi">10.1186/gb-2004-5-12-r98</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B29">
            <title>
               <p>Identification of functional transcription factor binding sites using closely related Saccharomyces species</p>
            </title>
            <aug>
               <au>
                  <snm>Doniger</snm>
                  <fnm>SW</fnm>
               </au>
               <au>
                  <snm>Huh</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Fay</snm>
                  <fnm>JC</fnm>
               </au>
            </aug>
            <source>Genome Res</source>
            <pubdate>2005</pubdate>
            <volume>15</volume>
            <issue>5</issue>
            <fpage>701</fpage>
            <lpage>709</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1088298</pubid>
                  <pubid idtype="pmpid" link="fulltext">15837806</pubid>
                  <pubid idtype="doi">10.1101/gr.3578205</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B30">
            <title>
               <p>Discovery, validation, and genetic dissection of transcription factor binding sites by comparative and functional genomics</p>
            </title>
            <aug>
               <au>
                  <snm>Gertz</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Riles</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Turnbaugh</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Ho</snm>
                  <fnm>SW</fnm>
               </au>
               <au>
                  <snm>Cohen</snm>
                  <fnm>BA</fnm>
               </au>
            </aug>
            <source>Genome Res</source>
            <pubdate>2005</pubdate>
            <volume>15</volume>
            <issue>8</issue>
            <fpage>1145</fpage>
            <lpage>1152</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1182227</pubid>
                  <pubid idtype="pmpid" link="fulltext">16077013</pubid>
                  <pubid idtype="doi">10.1101/gr.3859605</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B31">
            <title>
               <p>Computational screening of conserved genomic DNA in search of functional noncoding elements</p>
            </title>
            <aug>
               <au>
                  <snm>Bejerano</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Siepel</snm>
                  <fnm>AC</fnm>
               </au>
               <au>
                  <snm>Kent</snm>
                  <fnm>WJ</fnm>
               </au>
               <au>
                  <snm>Haussler</snm>
                  <fnm>D</fnm>
               </au>
            </aug>
            <source>Nat Methods</source>
            <pubdate>2005</pubdate>
            <volume>2</volume>
            <issue>7</issue>
            <fpage>535</fpage>
            <lpage>545</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/nmeth0705-535</pubid>
                  <pubid idtype="pmpid" link="fulltext">16170870</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B32">
            <title>
               <p>De novo discovery of a tissue-specific gene regulatory module in a chordate</p>
            </title>
            <aug>
               <au>
                  <snm>Johnson</snm>
                  <fnm>DS</fnm>
               </au>
               <au>
                  <snm>Zhou</snm>
                  <fnm>Q</fnm>
               </au>
               <au>
                  <snm>Yagi</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Satoh</snm>
                  <fnm>N</fnm>
               </au>
               <au>
                  <snm>Wong</snm>
                  <fnm>W</fnm>
               </au>
               <au>
                  <snm>Sidow</snm>
                  <fnm>A</fnm>
               </au>
            </aug>
            <source>Genome Res</source>
            <pubdate>2005</pubdate>
            <volume>15</volume>
            <issue>10</issue>
            <fpage>1315</fpage>
            <lpage>1324</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1240073</pubid>
                  <pubid idtype="pmpid" link="fulltext">16169925</pubid>
                  <pubid idtype="doi">10.1101/gr.4062605</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B33">
            <title>
               <p>Identifying the conserved network of cis-regulatory sites of a eukaryotic genome</p>
            </title>
            <aug>
               <au>
                  <snm>Wang</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Stormo</snm>
                  <fnm>GD</fnm>
               </au>
            </aug>
            <source>Proc Natl Acad Sci U S A</source>
            <pubdate>2005</pubdate>
            <volume>102</volume>
            <issue>48</issue>
            <fpage>17400</fpage>
            <lpage>17405</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1297658</pubid>
                  <pubid idtype="pmpid" link="fulltext">16301543</pubid>
                  <pubid idtype="doi">10.1073/pnas.0505147102</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B34">
            <title>
               <p>Conservation of regulatory elements between two species of Drosophila</p>
            </title>
            <aug>
               <au>
                  <snm>Emberly</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Rajewsky</snm>
                  <fnm>N</fnm>
               </au>
               <au>
                  <snm>Siggia</snm>
                  <fnm>ED</fnm>
               </au>
            </aug>
            <source>BMC Bioinformatics</source>
            <pubdate>2003</pubdate>
            <volume>4</volume>
            <issue>1</issue>
            <fpage>57</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">302112</pubid>
                  <pubid idtype="pmpid" link="fulltext">14629780</pubid>
                  <pubid idtype="doi">10.1186/1471-2105-4-57</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B35">
            <title>
               <p>Evidence for stabilizing selection in a eukaryotic enhancer element</p>
            </title>
            <aug>
               <au>
                  <snm>Ludwig</snm>
                  <fnm>MZ</fnm>
               </au>
               <au>
                  <snm>Bergman</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Patel</snm>
                  <fnm>NH</fnm>
               </au>
               <au>
                  <snm>Kreitman</snm>
                  <fnm>M</fnm>
               </au>
            </aug>
            <source>Nature</source>
            <pubdate>2000</pubdate>
            <volume>403</volume>
            <issue>6769</issue>
            <fpage>564</fpage>
            <lpage>567</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/35000615</pubid>
                  <pubid idtype="pmpid" link="fulltext">10676967</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B36">
            <title>
               <p>Functional evolution of noncoding DNA</p>
            </title>
            <aug>
               <au>
                  <snm>Ludwig</snm>
                  <fnm>MZ</fnm>
               </au>
            </aug>
            <source>Curr Opin Genet Dev</source>
            <pubdate>2002</pubdate>
            <volume>12</volume>
            <issue>6</issue>
            <fpage>634</fpage>
            <lpage>639</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/S0959-437X(02)00355-6</pubid>
                  <pubid idtype="pmpid" link="fulltext">12433575</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B37">
            <title>
               <p>Evolution of transcription factor binding sites in Mammalian gene regulatory regions: conservation and turnover</p>
            </title>
            <aug>
               <au>
                  <snm>Dermitzakis</snm>
                  <fnm>ET</fnm>
               </au>
               <au>
                  <snm>Clark</snm>
                  <fnm>AG</fnm>
               </au>
            </aug>
            <source>Mol Biol Evol</source>
            <pubdate>2002</pubdate>
            <volume>19</volume>
            <issue>7</issue>
            <fpage>1114</fpage>
            <lpage>1121</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">12082130</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B38">
            <title>
               <p>Turnover of binding sites for transcription factors involved in early Drosophila development</p>
            </title>
            <aug>
               <au>
                  <snm>Costas</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Casares</snm>
                  <fnm>F</fnm>
               </au>
               <au>
                  <snm>Vieira</snm>
                  <fnm>J</fnm>
               </au>
            </aug>
            <source>Gene</source>
            <pubdate>2003</pubdate>
            <volume>310</volume>
            <fpage>215</fpage>
            <lpage>220</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/S0378-1119(03)00556-0</pubid>
                  <pubid idtype="pmpid" link="fulltext">12801649</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B39">
            <title>
               <p>Tracing the evolutionary history of Drosophila regulatory regions with models that identify transcription factor binding sites</p>
            </title>
            <aug>
               <au>
                  <snm>Dermitzakis</snm>
                  <fnm>ET</fnm>
               </au>
               <au>
                  <snm>Bergman</snm>
                  <fnm>CM</fnm>
               </au>
               <au>
                  <snm>Clark</snm>
                  <fnm>AG</fnm>
               </au>
            </aug>
            <source>Mol Biol Evol</source>
            <pubdate>2003</pubdate>
            <volume>20</volume>
            <issue>5</issue>
            <fpage>703</fpage>
            <lpage>714</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/molbev/msg077</pubid>
                  <pubid idtype="pmpid" link="fulltext">12679540</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B40">
            <title>
               <p>Dynamics and function of intron sequences of the wingless gene during the evolution of the Drosophila genus</p>
            </title>
            <aug>
               <au>
                  <snm>Costas</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Pereira</snm>
                  <fnm>PS</fnm>
               </au>
               <au>
                  <snm>Vieira</snm>
                  <fnm>CP</fnm>
               </au>
               <au>
                  <snm>Pinho</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Vieira</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Casares</snm>
                  <fnm>F</fnm>
               </au>
            </aug>
            <source>Evol Dev</source>
            <pubdate>2004</pubdate>
            <volume>6</volume>
            <issue>5</issue>
            <fpage>325</fpage>
            <lpage>335</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1111/j.1525-142X.2004.04040.x</pubid>
                  <pubid idtype="pmpid" link="fulltext">15330865</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B41">
            <title>
               <p>Expected Rates and Modes of Evolution of Enhancer Sequences</p>
            </title>
            <aug>
               <au>
                  <snm>MacArthur</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Brookfield</snm>
                  <fnm>JF</fnm>
               </au>
            </aug>
            <source>Mol Biol Evol</source>
            <pubdate>2004</pubdate>
         </bibl>
         <bibl id="B42">
            <title>
               <p>Sequence turnover and tandem repeats in cis-regulatory modules in drosophila</p>
            </title>
            <aug>
               <au>
                  <snm>Sinha</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Siggia</snm>
                  <fnm>ED</fnm>
               </au>
            </aug>
            <source>Mol Biol Evol</source>
            <pubdate>2005</pubdate>
            <volume>22</volume>
            <issue>4</issue>
            <fpage>874</fpage>
            <lpage>885</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/molbev/msi090</pubid>
                  <pubid idtype="pmpid" link="fulltext">15659554</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B43">
            <title>
               <p>Conserved noncoding sequences are reliable guides to regulatory elements</p>
            </title>
            <aug>
               <au>
                  <snm>Hardison</snm>
                  <fnm>RC</fnm>
               </au>
            </aug>
            <source>Trends Genet</source>
            <pubdate>2000</pubdate>
            <volume>16</volume>
            <issue>9</issue>
            <fpage>369</fpage>
            <lpage>372</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/S0168-9525(00)02081-3</pubid>
                  <pubid idtype="pmpid" link="fulltext">10973062</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B44">
            <title>
               <p>Quantitative estimates of sequence divergence for comparative analyses of mammalian genomes</p>
            </title>
            <aug>
               <au>
                  <snm>Cooper</snm>
                  <fnm>GM</fnm>
               </au>
               <au>
                  <snm>Brudno</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Green</snm>
                  <fnm>ED</fnm>
               </au>
               <au>
                  <snm>Batzoglou</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Sidow</snm>
                  <fnm>A</fnm>
               </au>
            </aug>
            <source>Genome Res</source>
            <pubdate>2003</pubdate>
            <volume>13</volume>
            <issue>5</issue>
            <fpage>813</fpage>
            <lpage>820</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">430923</pubid>
                  <pubid idtype="pmpid" link="fulltext">12727901</pubid>
                  <pubid idtype="doi">10.1101/gr.1064503</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B45">
            <title>
               <p>Functional constraints and frequency of deleterious mutations in noncoding DNA of rodents</p>
            </title>
            <aug>
               <au>
                  <snm>Keightley</snm>
                  <fnm>PD</fnm>
               </au>
               <au>
                  <snm>Gaffney</snm>
                  <fnm>DJ</fnm>
               </au>
            </aug>
            <source>Proc Natl Acad Sci U S A</source>
            <pubdate>2003</pubdate>
            <volume>100</volume>
            <issue>23</issue>
            <fpage>13402</fpage>
            <lpage>13406</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">263826</pubid>
                  <pubid idtype="pmpid" link="fulltext">14597721</pubid>
                  <pubid idtype="doi">10.1073/pnas.2233252100</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B46">
            <title>
               <p>The share of human genomic DNA under selection estimated from human-mouse genomic alignments</p>
            </title>
            <aug>
               <au>
                  <snm>Chiaromonte</snm>
                  <fnm>F</fnm>
               </au>
               <au>
                  <snm>Weber</snm>
                  <fnm>RJ</fnm>
               </au>
               <au>
                  <snm>Roskin</snm>
                  <fnm>KM</fnm>
               </au>
               <au>
                  <snm>Diekhans</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Kent</snm>
                  <fnm>WJ</fnm>
               </au>
               <au>
                  <snm>Haussler</snm>
                  <fnm>D</fnm>
               </au>
            </aug>
            <source>Cold Spring Harb Symp Quant Biol</source>
            <pubdate>2003</pubdate>
            <volume>68</volume>
            <fpage>245</fpage>
            <lpage>254</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1101/sqb.2003.68.245</pubid>
                  <pubid idtype="pmpid">15338624</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B47">
            <title>
               <p>Distinguishing regulatory DNA from neutral sites</p>
            </title>
            <aug>
               <au>
                  <snm>Elnitski</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Hardison</snm>
                  <fnm>RC</fnm>
               </au>
               <au>
                  <snm>Li</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Yang</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Kolbe</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Eswara</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>O'Connor</snm>
                  <fnm>MJ</fnm>
               </au>
               <au>
                  <snm>Schwartz</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Miller</snm>
                  <fnm>W</fnm>
               </au>
               <au>
                  <snm>Chiaromonte</snm>
                  <fnm>F</fnm>
               </au>
            </aug>
            <source>Genome Res</source>
            <pubdate>2003</pubdate>
            <volume>13</volume>
            <issue>1</issue>
            <fpage>64</fpage>
            <lpage>72</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">430974</pubid>
                  <pubid idtype="pmpid" link="fulltext">12529307</pubid>
                  <pubid idtype="doi">10.1101/gr.817703</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B48">
            <title>
               <p>Patterns of evolutionary constraints in intronic and intergenic DNA of Drosophila</p>
            </title>
            <aug>
               <au>
                  <snm>Halligan</snm>
                  <fnm>DL</fnm>
               </au>
               <au>
                  <snm>Eyre-Walker</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Andolfatto</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Keightley</snm>
                  <fnm>PD</fnm>
               </au>
            </aug>
            <source>Genome Res</source>
            <pubdate>2004</pubdate>
            <volume>14</volume>
            <issue>2</issue>
            <fpage>273</fpage>
            <lpage>279</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">327102</pubid>
                  <pubid idtype="pmpid" link="fulltext">14762063</pubid>
                  <pubid idtype="doi">10.1101/gr.1329204</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B49">
            <title>
               <p>Regulatory potential scores from genome-wide three-way alignments of human, mouse, and rat</p>
            </title>
            <aug>
               <au>
                  <snm>Kolbe</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Taylor</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Elnitski</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Eswara</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Li</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Miller</snm>
                  <fnm>W</fnm>
               </au>
               <au>
                  <snm>Hardison</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Chiaromonte</snm>
                  <fnm>F</fnm>
               </au>
            </aug>
            <source>Genome Res</source>
            <pubdate>2004</pubdate>
            <volume>14</volume>
            <issue>4</issue>
            <fpage>700</fpage>
            <lpage>707</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">383316</pubid>
                  <pubid idtype="pmpid" link="fulltext">15060013</pubid>
                  <pubid idtype="doi">10.1101/gr.1976004</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B50">
            <title>
               <p>Evolutionary constraints in conserved nongenic sequences of mammals</p>
            </title>
            <aug>
               <au>
                  <snm>Keightley</snm>
                  <fnm>PD</fnm>
               </au>
               <au>
                  <snm>Kryukov</snm>
                  <fnm>GV</fnm>
               </au>
               <au>
                  <snm>Sunyaev</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Halligan</snm>
                  <fnm>DL</fnm>
               </au>
               <au>
                  <snm>Gaffney</snm>
                  <fnm>DJ</fnm>
               </au>
            </aug>
            <source>Genome Res</source>
            <pubdate>2005</pubdate>
            <volume>15</volume>
            <issue>10</issue>
            <fpage>1373</fpage>
            <lpage>1378</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1240079</pubid>
                  <pubid idtype="pmpid" link="fulltext">16204190</pubid>
                  <pubid idtype="doi">10.1101/gr.3942005</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B51">
            <title>
               <p>Evaluation of regulatory potential and conservation scores for detecting cis-regulatory modules in aligned mammalian genome sequences</p>
            </title>
            <aug>
               <au>
                  <snm>King</snm>
                  <fnm>DC</fnm>
               </au>
               <au>
                  <snm>Taylor</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Elnitski</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Chiaromonte</snm>
                  <fnm>F</fnm>
               </au>
               <au>
                  <snm>Miller</snm>
                  <fnm>W</fnm>
               </au>
               <au>
                  <snm>Hardison</snm>
                  <fnm>RC</fnm>
               </au>
            </aug>
            <source>Genome Res</source>
            <pubdate>2005</pubdate>
            <volume>15</volume>
            <issue>8</issue>
            <fpage>1051</fpage>
            <lpage>1060</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1182217</pubid>
                  <pubid idtype="pmpid" link="fulltext">16024817</pubid>
                  <pubid idtype="doi">10.1101/gr.3642605</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B52">
            <title>
               <p>Generation time and genomic evolution in primates</p>
            </title>
            <aug>
               <au>
                  <snm>Sarich</snm>
                  <fnm>VM</fnm>
               </au>
               <au>
                  <snm>Wilson</snm>
                  <fnm>AC</fnm>
               </au>
            </aug>
            <source>Science</source>
            <pubdate>1973</pubdate>
            <volume>179</volume>
            <issue>78</issue>
            <fpage>1144</fpage>
            <lpage>1147</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1126/science.179.4078.1144</pubid>
                  <pubid idtype="pmpid" link="fulltext">4120260</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B53">
            <title>
               <p>Divergence of conserved non-coding sequences: rate estimates and relative rate tests</p>
            </title>
            <aug>
               <au>
                  <snm>Wagner</snm>
                  <fnm>GP</fnm>
               </au>
               <au>
                  <snm>Fried</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Prohaska</snm>
                  <fnm>SJ</fnm>
               </au>
               <au>
                  <snm>Stadler</snm>
                  <fnm>PF</fnm>
               </au>
            </aug>
            <source>Mol Biol Evol</source>
            <pubdate>2004</pubdate>
            <volume>21</volume>
            <issue>11</issue>
            <fpage>2116</fpage>
            <lpage>2121</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/molbev/msh221</pubid>
                  <pubid idtype="pmpid" link="fulltext">15282332</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B54">
            <title>
               <p>Dating of the human-ape splitting by a molecular clock of mitochondrial DNA</p>
            </title>
            <aug>
               <au>
                  <snm>Hasegawa</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Kishino</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Yano</snm>
                  <fnm>T</fnm>
               </au>
            </aug>
            <source>J Mol Evol</source>
            <pubdate>1985</pubdate>
            <volume>22</volume>
            <issue>2</issue>
            <fpage>160</fpage>
            <lpage>174</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1007/BF02101694</pubid>
                  <pubid idtype="pmpid">3934395</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B55">
            <title>
               <p>The correlation between intron length and recombination in drosophila. Dynamic equilibrium between mutational and selective forces</p>
            </title>
            <aug>
               <au>
                  <snm>Comeron</snm>
                  <fnm>JM</fnm>
               </au>
               <au>
                  <snm>Kreitman</snm>
                  <fnm>M</fnm>
               </au>
            </aug>
            <source>Genetics</source>
            <pubdate>2000</pubdate>
            <volume>156</volume>
            <issue>3</issue>
            <fpage>1175</fpage>
            <lpage>1190</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1461334</pubid>
                  <pubid idtype="pmpid" link="fulltext">11063693</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B56">
            <title>
               <p>Evolutionary distances for protein-coding sequences: modeling site-specific residue frequencies</p>
            </title>
            <aug>
               <au>
                  <snm>Halpern</snm>
                  <fnm>AL</fnm>
               </au>
               <au>
                  <snm>Bruno</snm>
                  <fnm>WJ</fnm>
               </au>
            </aug>
            <source>Mol Biol Evol</source>
            <pubdate>1998</pubdate>
            <volume>15</volume>
            <issue>7</issue>
            <fpage>910</fpage>
            <lpage>917</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">9656490</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B57">
            <title>
               <p>Position specific variation in the rate of evolution in transcription factor binding sites</p>
            </title>
            <aug>
               <au>
                  <snm>Moses</snm>
                  <fnm>AM</fnm>
               </au>
               <au>
                  <snm>Chiang</snm>
                  <fnm>DY</fnm>
               </au>
               <au>
                  <snm>Kellis</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Lander</snm>
                  <fnm>ES</fnm>
               </au>
               <au>
                  <snm>Eisen</snm>
                  <fnm>MB</fnm>
               </au>
            </aug>
            <source>BMC Evol Biol</source>
            <pubdate>2003</pubdate>
            <volume>3</volume>
            <issue>1</issue>
            <fpage>19</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">212491</pubid>
                  <pubid idtype="pmpid" link="fulltext">12946282</pubid>
                  <pubid idtype="doi">10.1186/1471-2148-3-19</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B58">
            <title>
               <p>Information content of binding sites on nucleotide sequences</p>
            </title>
            <aug>
               <au>
                  <snm>Schneider</snm>
                  <fnm>TD</fnm>
               </au>
               <au>
                  <snm>Stormo</snm>
                  <fnm>GD</fnm>
               </au>
               <au>
                  <snm>Gold</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Ehrenfeucht</snm>
                  <fnm>A</fnm>
               </au>
            </aug>
            <source>J Mol Biol</source>
            <pubdate>1986</pubdate>
            <volume>188</volume>
            <issue>3</issue>
            <fpage>415</fpage>
            <lpage>431</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/0022-2836(86)90165-8</pubid>
                  <pubid idtype="pmpid" link="fulltext">3525846</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B59">
            <title>
               <p>CisEvolver</p>
            </title>
            <url>http://rana.lbl.gov/CisEvolver</url>
         </bibl>
         <bibl id="B60">
            <title>
               <p>Transcriptional control in the segmentation gene network of Drosophila</p>
            </title>
            <aug>
               <au>
                  <snm>Schroeder</snm>
                  <fnm>MD</fnm>
               </au>
               <au>
                  <snm>Pearce</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Fak</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Fan</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Unnerstall</snm>
                  <fnm>U</fnm>
               </au>
               <au>
                  <snm>Emberly</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Rajewsky</snm>
                  <fnm>N</fnm>
               </au>
               <au>
                  <snm>Siggia</snm>
                  <fnm>ED</fnm>
               </au>
               <au>
                  <snm>Gaul</snm>
                  <fnm>U</fnm>
               </au>
            </aug>
            <source>PLoS Biol</source>
            <pubdate>2004</pubdate>
            <volume>2</volume>
            <issue>9</issue>
            <fpage>E271</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">514885</pubid>
                  <pubid idtype="pmpid" link="fulltext">15340490</pubid>
                  <pubid idtype="doi">10.1371/journal.pbio.0020271</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B61">
            <title>
               <p>Extraction of functional binding sites from unique regulatory regions: the Drosophila early developmental enhancers</p>
            </title>
            <aug>
               <au>
                  <snm>Papatsenko</snm>
                  <fnm>DA</fnm>
               </au>
               <au>
                  <snm>Makeev</snm>
                  <fnm>VJ</fnm>
               </au>
               <au>
                  <snm>Lifanov</snm>
                  <fnm>AP</fnm>
               </au>
               <au>
                  <snm>Regnier</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Nazina</snm>
                  <fnm>AG</fnm>
               </au>
               <au>
                  <snm>Desplan</snm>
                  <fnm>C</fnm>
               </au>
            </aug>
            <source>Genome Res</source>
            <pubdate>2002</pubdate>
            <volume>12</volume>
            <issue>3</issue>
            <fpage>470</fpage>
            <lpage>481</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">155290</pubid>
                  <pubid idtype="pmpid" link="fulltext">11875036</pubid>
                  <pubid idtype="doi">10.1101/gr.212502. Article published online before print in February 2002</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B62">
            <title>
               <p>Drosophila DNase I footprint database: a systematic genome annotation of transcription factor binding sites in the fruitfly, Drosophila melanogaster</p>
            </title>
            <aug>
               <au>
                  <snm>Bergman</snm>
                  <fnm>CM</fnm>
               </au>
               <au>
                  <snm>Carlson</snm>
                  <fnm>JW</fnm>
               </au>
               <au>
                  <snm>Celniker</snm>
                  <fnm>SE</fnm>
               </au>
            </aug>
            <source>Bioinformatics</source>
            <pubdate>2005</pubdate>
            <volume>21</volume>
            <issue>8</issue>
            <fpage>1747</fpage>
            <lpage>1749</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/bioinformatics/bti173</pubid>
                  <pubid idtype="pmpid" link="fulltext">15572468</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B63">
            <title>
               <p>CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice</p>
            </title>
            <aug>
               <au>
                  <snm>Thompson</snm>
                  <fnm>JD</fnm>
               </au>
               <au>
                  <snm>Higgins</snm>
                  <fnm>DG</fnm>
               </au>
               <au>
                  <snm>Gibson</snm>
                  <fnm>TJ</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>1994</pubdate>
            <volume>22</volume>
            <issue>22</issue>
            <fpage>4673</fpage>
            <lpage>4680</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">308517</pubid>
                  <pubid idtype="pmpid">7984417</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B64">
            <title>
               <p>MAVID: constrained ancestral alignment of multiple sequences</p>
            </title>
            <aug>
               <au>
                  <snm>Bray</snm>
                  <fnm>N</fnm>
               </au>
               <au>
                  <snm>Pachter</snm>
                  <fnm>L</fnm>
               </au>
            </aug>
            <source>Genome Res</source>
            <pubdate>2004</pubdate>
            <volume>14</volume>
            <issue>4</issue>
            <fpage>693</fpage>
            <lpage>699</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">383315</pubid>
                  <pubid idtype="pmpid" link="fulltext">15060012</pubid>
                  <pubid idtype="doi">10.1101/gr.1960404</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B65">
            <title>
               <p>LAGAN and Multi-LAGAN: efficient tools for large-scale multiple alignment of genomic DNA</p>
            </title>
            <aug>
               <au>
                  <snm>Brudno</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Do</snm>
                  <fnm>CB</fnm>
               </au>
               <au>
                  <snm>Cooper</snm>
                  <fnm>GM</fnm>
               </au>
               <au>
                  <snm>Kim</snm>
                  <fnm>MF</fnm>
               </au>
               <au>
                  <snm>Davydov</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Green</snm>
                  <fnm>ED</fnm>
               </au>
               <au>
                  <snm>Sidow</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Batzoglou</snm>
                  <fnm>S</fnm>
               </au>
            </aug>
            <source>Genome Res</source>
            <pubdate>2003</pubdate>
            <volume>13</volume>
            <issue>4</issue>
            <fpage>721</fpage>
            <lpage>731</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">430158</pubid>
                  <pubid idtype="pmpid" link="fulltext">12654723</pubid>
                  <pubid idtype="doi">10.1101/gr.926603</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B66">
            <title>
               <p>PipMaker--a web server for aligning two genomic DNA sequences</p>
            </title>
            <aug>
               <au>
                  <snm>Schwartz</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Zhang</snm>
                  <fnm>Z</fnm>
               </au>
               <au>
                  <snm>Frazer</snm>
                  <fnm>KA</fnm>
               </au>
               <au>
                  <snm>Smit</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Riemer</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Bouck</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Gibbs</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Hardison</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Miller</snm>
                  <fnm>W</fnm>
               </au>
            </aug>
            <source>Genome Res</source>
            <pubdate>2000</pubdate>
            <volume>10</volume>
            <issue>4</issue>
            <fpage>577</fpage>
            <lpage>586</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">310868</pubid>
                  <pubid idtype="pmpid" link="fulltext">10779500</pubid>
                  <pubid idtype="doi">10.1101/gr.10.4.577</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B67">
            <title>
               <p>Human-mouse alignments with BLASTZ</p>
            </title>
            <aug>
               <au>
                  <snm>Schwartz</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Kent</snm>
                  <fnm>WJ</fnm>
               </au>
               <au>
                  <snm>Smit</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Zhang</snm>
                  <fnm>Z</fnm>
               </au>
               <au>
                  <snm>Baertsch</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Hardison</snm>
                  <fnm>RC</fnm>
               </au>
               <au>
                  <snm>Haussler</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Miller</snm>
                  <fnm>W</fnm>
               </au>
            </aug>
            <source>Genome Res</source>
            <pubdate>2003</pubdate>
            <volume>13</volume>
            <issue>1</issue>
            <fpage>103</fpage>
            <lpage>107</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">430961</pubid>
                  <pubid idtype="pmpid" link="fulltext">12529312</pubid>
                  <pubid idtype="doi">10.1101/gr.809403</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B68">
            <title>
               <p>Aligning Alignments Exactly</p>
            </title>
            <aug>
               <au>
                  <snm>Keceioglou</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Starrett</snm>
                  <fnm>D</fnm>
               </au>
            </aug>
            <source>RECOMB</source>
            <publisher> San Diego, California, USA ,  ACM Press, New York, NY, USA</publisher>
            <pubdate>2004</pubdate>
            <fpage>85</fpage>
            <lpage>96</lpage>
         </bibl>
         <bibl id="B69">
            <title>
               <p>PAML: a program package for phylogenetic analysis by maximum likelihood</p>
            </title>
            <aug>
               <au>
                  <snm>Yang</snm>
                  <fnm>Z</fnm>
               </au>
            </aug>
            <source>Comput Appl Biosci</source>
            <pubdate>1997</pubdate>
            <volume>13</volume>
            <issue>5</issue>
            <fpage>555</fpage>
            <lpage>556</lpage>
            <xrefbib>
               <pubid idtype="pmpid">9367129</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B70">
            <title>
               <p>Pattern of selective constraint in C. elegans and C. briggsae genomes</p>
            </title>
            <aug>
               <au>
                  <snm>Shabalina</snm>
                  <fnm>SA</fnm>
               </au>
               <au>
                  <snm>Kondrashov</snm>
                  <fnm>AS</fnm>
               </au>
            </aug>
            <source>Genet Res</source>
            <pubdate>1999</pubdate>
            <volume>74</volume>
            <issue>1</issue>
            <fpage>23</fpage>
            <lpage>30</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1017/S0016672399003821</pubid>
                  <pubid idtype="pmpid">10505405</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B71">
            <title>
               <p>Maximum likelihood phylogenetic estimation from DNA sequences with variable rates over sites: approximate methods</p>
            </title>
            <aug>
               <au>
                  <snm>Yang</snm>
                  <fnm>Z</fnm>
               </au>
            </aug>
            <source>J Mol Evol</source>
            <pubdate>1994</pubdate>
            <volume>39</volume>
            <issue>3</issue>
            <fpage>306</fpage>
            <lpage>314</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1007/BF00160154</pubid>
                  <pubid idtype="pmpid">7932792</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B72">
            <title>
               <p>Biological sequence analysis: Probabilistic models of proteins and nucleic acids</p>
            </title>
            <aug>
               <au>
                  <snm>Durbin</snm>
                  <fnm>R</fnm>
               </au>
            </aug>
            <publisher> Cambridge University Press</publisher>
            <pubdate>1998</pubdate>
            <fpage>356</fpage>
         </bibl>
         <bibl id="B73">
            <title>
               <p>Noncoding RNA gene detection using comparative sequence analysis</p>
            </title>
            <aug>
               <au>
                  <snm>Rivas</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Eddy</snm>
                  <fnm>SR</fnm>
               </au>
            </aug>
            <source>BMC Bioinformatics</source>
            <pubdate>2001</pubdate>
            <volume>2</volume>
            <fpage>8</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">64605</pubid>
                  <pubid idtype="pmpid" link="fulltext">11801179</pubid>
                  <pubid idtype="doi">10.1186/1471-2105-2-8</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B74">
            <title>
               <p>Identification and classification of conserved RNA secondary structures in the human genome</p>
            </title>
            <aug>
               <au>
                  <snm>Pedersen</snm>
                  <fnm>JS</fnm>
               </au>
               <au>
                  <snm>Bejerano</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Siepel</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Rosenbloom</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Lindblad-Toh</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Lander</snm>
                  <fnm>ES</fnm>
               </au>
               <au>
                  <snm>Kent</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Miller</snm>
                  <fnm>W</fnm>
               </au>
               <au>
                  <snm>Haussler</snm>
                  <fnm>D</fnm>
               </au>
            </aug>
            <source>PLoS Comput Biol</source>
            <pubdate>2006</pubdate>
            <volume>2</volume>
            <issue>4</issue>
            <fpage>e33</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1440920</pubid>
                  <pubid idtype="pmpid" link="fulltext">16628248</pubid>
                  <pubid idtype="doi">10.1371/journal.pcbi.0020033</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B75">
            <title>
               <p>Chromosome evolution in eukaryotes: a multi-kingdom perspective</p>
            </title>
            <aug>
               <au>
                  <snm>Coghlan</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Eichler</snm>
                  <fnm>EE</fnm>
               </au>
               <au>
                  <snm>Oliver</snm>
                  <fnm>SG</fnm>
               </au>
               <au>
                  <snm>Paterson</snm>
                  <fnm>AH</fnm>
               </au>
               <au>
                  <snm>Stein</snm>
                  <fnm>L</fnm>
               </au>
            </aug>
            <source>Trends Genet</source>
            <pubdate>2005</pubdate>
            <volume>21</volume>
            <issue>12</issue>
            <fpage>673</fpage>
            <lpage>682</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/j.tig.2005.09.009</pubid>
                  <pubid idtype="pmpid" link="fulltext">16242204</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B76">
            <title>
               <p>Conservation of regulatory sequences and gene expression patterns in the disintegrating Drosophila Hox gene complex</p>
            </title>
            <aug>
               <au>
                  <snm>Negre</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Casillas</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Suzanne</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Sanchez-Herrero</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Akam</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Nefedov</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Barbadilla</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>de Jong</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Ruiz</snm>
                  <fnm>A</fnm>
               </au>
            </aug>
            <source>Genome Res</source>
            <pubdate>2005</pubdate>
            <volume>15</volume>
            <issue>5</issue>
            <fpage>692</fpage>
            <lpage>700</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1088297</pubid>
                  <pubid idtype="pmpid" link="fulltext">15867430</pubid>
                  <pubid idtype="doi">10.1101/gr.3468605</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B77">
            <title>
               <p>Decoding cis-regulatory DNAs in the Drosophila genome</p>
            </title>
            <aug>
               <au>
                  <snm>Markstein</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Levine</snm>
                  <fnm>M</fnm>
               </au>
            </aug>
            <source>Curr Opin Genet Dev</source>
            <pubdate>2002</pubdate>
            <volume>12</volume>
            <issue>5</issue>
            <fpage>601</fpage>
            <lpage>606</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/S0959-437X(02)00345-3</pubid>
                  <pubid idtype="pmpid" link="fulltext">12200166</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B78">
            <title>
               <p>Comparative genome sequencing of Drosophila pseudoobscura: chromosomal, gene, and cis-element evolution</p>
            </title>
            <aug>
               <au>
                  <snm>Richards</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Liu</snm>
                  <fnm>Y</fnm>
               </au>
               <au>
                  <snm>Bettencourt</snm>
                  <fnm>BR</fnm>
               </au>
               <au>
                  <snm>Hradecky</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Letovsky</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Nielsen</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Thornton</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Hubisz</snm>
                  <fnm>MJ</fnm>
               </au>
               <au>
                  <snm>Chen</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Meisel</snm>
                  <fnm>RP</fnm>
               </au>
               <au>
                  <snm>Couronne</snm>
                  <fnm>O</fnm>
               </au>
               <au>
                  <snm>Hua</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Smith</snm>
                  <fnm>MA</fnm>
               </au>
               <au>
                  <snm>Zhang</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Liu</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Bussemaker</snm>
                  <fnm>HJ</fnm>
               </au>
               <au>
                  <snm>van Batenburg</snm>
                  <fnm>MF</fnm>
               </au>
               <au>
                  <snm>Howells</snm>
                  <fnm>SL</fnm>
               </au>
               <au>
                  <snm>Scherer</snm>
                  <fnm>SE</fnm>
               </au>
               <au>
                  <snm>Sodergren</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Matthews</snm>
                  <fnm>BB</fnm>
               </au>
               <au>
                  <snm>Crosby</snm>
                  <fnm>MA</fnm>
               </au>
               <au>
                  <snm>Schroeder</snm>
                  <fnm>AJ</fnm>
               </au>
               <au>
                  <snm>Ortiz-Barrientos</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Rives</snm>
                  <fnm>CM</fnm>
               </au>
               <au>
                  <snm>Metzker</snm>
                  <fnm>ML</fnm>
               </au>
               <au>
                  <snm>Muzny</snm>
                  <fnm>DM</fnm>
               </au>
               <au>
                  <snm>Scott</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Steffen</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Wheeler</snm>
                  <fnm>DA</fnm>
               </au>
               <au>
                  <snm>Worley</snm>
                  <fnm>KC</fnm>
               </au>
               <au>
                  <snm>Havlak</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Durbin</snm>
                  <fnm>KJ</fnm>
               </au>
               <au>
                  <snm>Egan</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Gill</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Hume</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Morgan</snm>
                  <fnm>MB</fnm>
               </au>
               <au>
                  <snm>Miner</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Hamilton</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Huang</snm>
                  <fnm>Y</fnm>
               </au>
               <au>
                  <snm>Waldron</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Verduzco</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Clerc-Blankenburg</snm>
                  <fnm>KP</fnm>
               </au>
               <au>
                  <snm>Dubchak</snm>
                  <fnm>I</fnm>
               </au>
               <au>
                  <snm>Noor</snm>
                  <fnm>MA</fnm>
               </au>
               <au>
                  <snm>Anderson</snm>
                  <fnm>W</fnm>
               </au>
               <au>
                  <snm>White</snm>
                  <fnm>KP</fnm>
               </au>
               <au>
                  <snm>Clark</snm>
                  <fnm>AG</fnm>
               </au>
               <au>
                  <snm>Schaeffer</snm>
                  <fnm>SW</fnm>
               </au>
               <au>
                  <snm>Gelbart</snm>
                  <fnm>W</fnm>
               </au>
               <au>
                  <snm>Weinstock</snm>
                  <fnm>GM</fnm>
               </au>
               <au>
                  <snm>Gibbs</snm>
                  <fnm>RA</fnm>
               </au>
            </aug>
            <source>Genome Res</source>
            <pubdate>2005</pubdate>
            <volume>15</volume>
            <issue>1</issue>
            <fpage>1</fpage>
            <lpage>18</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">540289</pubid>
                  <pubid idtype="pmpid" link="fulltext">15632085</pubid>
                  <pubid idtype="doi">10.1101/gr.3059305</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B79">
            <title>
               <p>rVista for comparative sequence-based discovery of functional transcription factor binding sites</p>
            </title>
            <aug>
               <au>
                  <snm>Loots</snm>
                  <fnm>GG</fnm>
               </au>
               <au>
                  <snm>Ovcharenko</snm>
                  <fnm>I</fnm>
               </au>
               <au>
                  <snm>Pachter</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Dubchak</snm>
                  <fnm>I</fnm>
               </au>
               <au>
                  <snm>Rubin</snm>
                  <fnm>EM</fnm>
               </au>
            </aug>
            <source>Genome Res</source>
            <pubdate>2002</pubdate>
            <volume>12</volume>
            <issue>5</issue>
            <fpage>832</fpage>
            <lpage>839</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">186580</pubid>
                  <pubid idtype="pmpid" link="fulltext">11997350</pubid>
                  <pubid idtype="doi">10.1101/gr.225502. Article published online before print in April 2002</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B80">
            <title>
               <p>Transcriptional regulatory code of a eukaryotic genome</p>
            </title>
            <aug>
               <au>
                  <snm>Harbison</snm>
                  <fnm>CT</fnm>
               </au>
               <au>
                  <snm>Gordon</snm>
                  <fnm>DB</fnm>
               </au>
               <au>
                  <snm>Lee</snm>
                  <fnm>TI</fnm>
               </au>
               <au>
                  <snm>Rinaldi</snm>
                  <fnm>NJ</fnm>
               </au>
               <au>
                  <snm>Macisaac</snm>
                  <fnm>KD</fnm>
               </au>
               <au>
                  <snm>Danford</snm>
                  <fnm>TW</fnm>
               </au>
               <au>
                  <snm>Hannett</snm>
                  <fnm>NM</fnm>
               </au>
               <au>
                  <snm>Tagne</snm>
                  <fnm>JB</fnm>
               </au>
               <au>
                  <snm>Reynolds</snm>
                  <fnm>DB</fnm>
               </au>
               <au>
                  <snm>Yoo</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Jennings</snm>
                  <fnm>EG</fnm>
               </au>
               <au>
                  <snm>Zeitlinger</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Pokholok</snm>
                  <fnm>DK</fnm>
               </au>
               <au>
                  <snm>Kellis</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Rolfe</snm>
                  <fnm>PA</fnm>
               </au>
               <au>
                  <snm>Takusagawa</snm>
                  <fnm>KT</fnm>
               </au>
               <au>
                  <snm>Lander</snm>
                  <fnm>ES</fnm>
               </au>
               <au>
                  <snm>Gifford</snm>
                  <fnm>DK</fnm>
               </au>
               <au>
                  <snm>Fraenkel</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Young</snm>
                  <fnm>RA</fnm>
               </au>
            </aug>
            <source>Nature</source>
            <pubdate>2004</pubdate>
            <volume>431</volume>
            <issue>7004</issue>
            <fpage>99</fpage>
            <lpage>104</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/nature02800</pubid>
                  <pubid idtype="pmpid" link="fulltext">15343339</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B81">
            <title>
               <p>Chromosomal organization is shaped by the transcription regulatory network</p>
            </title>
            <aug>
               <au>
                  <snm>Hershberg</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Yeger-Lotem</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Margalit</snm>
                  <fnm>H</fnm>
               </au>
            </aug>
            <source>Trends Genet</source>
            <pubdate>2005</pubdate>
            <volume>21</volume>
            <issue>3</issue>
            <fpage>138</fpage>
            <lpage>142</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/j.tig.2005.01.003</pubid>
                  <pubid idtype="pmpid" link="fulltext">15734572</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B82">
            <title>
               <p>Twilight zone of protein sequence alignments</p>
            </title>
            <aug>
               <au>
                  <snm>Rost</snm>
                  <fnm>B</fnm>
               </au>
            </aug>
            <source>Protein Eng</source>
            <pubdate>1999</pubdate>
            <volume>12</volume>
            <issue>2</issue>
            <fpage>85</fpage>
            <lpage>94</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/protein/12.2.85</pubid>
                  <pubid idtype="pmpid" link="fulltext">10195279</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B83">
            <title>
               <p>Analysis of conserved noncoding DNA in Drosophila reveals similar constraints in intergenic and intronic sequences</p>
            </title>
            <aug>
               <au>
                  <snm>Bergman</snm>
                  <fnm>CM</fnm>
               </au>
               <au>
                  <snm>Kreitman</snm>
                  <fnm>M</fnm>
               </au>
            </aug>
            <source>Genome Res</source>
            <pubdate>2001</pubdate>
            <volume>11</volume>
            <issue>8</issue>
            <fpage>1335</fpage>
            <lpage>1345</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1101/gr.178701</pubid>
                  <pubid idtype="pmpid" link="fulltext">11483574</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B84">
            <title>
               <p>High intrinsic rate of DNA loss in Drosophila</p>
            </title>
            <aug>
               <au>
                  <snm>Petrov</snm>
                  <fnm>DA</fnm>
               </au>
               <au>
                  <snm>Lozovskaya</snm>
                  <fnm>ER</fnm>
               </au>
               <au>
                  <snm>Hartl</snm>
                  <fnm>DL</fnm>
               </au>
            </aug>
            <source>Nature</source>
            <pubdate>1996</pubdate>
            <volume>384</volume>
            <issue>6607</issue>
            <fpage>346</fpage>
            <lpage>349</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/384346a0</pubid>
                  <pubid idtype="pmpid" link="fulltext">8934517</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B85">
            <title>
               <p>High rate of DNA loss in the Drosophila melanogaster and Drosophila virilis species groups</p>
            </title>
            <aug>
               <au>
                  <snm>Petrov</snm>
                  <fnm>DA</fnm>
               </au>
               <au>
                  <snm>Hartl</snm>
                  <fnm>DL</fnm>
               </au>
            </aug>
            <source>Mol Biol Evol</source>
            <pubdate>1998</pubdate>
            <volume>15</volume>
            <issue>3</issue>
            <fpage>293</fpage>
            <lpage>302</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">9501496</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B86">
            <title>
               <p>BLAT--the BLAST-like alignment tool</p>
            </title>
            <aug>
               <au>
                  <snm>Kent</snm>
                  <fnm>WJ</fnm>
               </au>
            </aug>
            <source>Genome Res</source>
            <pubdate>2002</pubdate>
            <volume>12</volume>
            <issue>4</issue>
            <fpage>656</fpage>
            <lpage>664</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">187518</pubid>
                  <pubid idtype="pmpid" link="fulltext">11932250</pubid>
                  <pubid idtype="doi">10.1101/gr.229202. Article published online before March 2002</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B87">
            <title>
               <p>The value of prior knowledge in discovering motifs with MEME</p>
            </title>
            <aug>
               <au>
                  <snm>Bailey</snm>
                  <fnm>TL</fnm>
               </au>
               <au>
                  <snm>Elkan</snm>
                  <fnm>C</fnm>
               </au>
            </aug>
            <source>Proc Int Conf Intell Syst Mol Biol</source>
            <pubdate>1995</pubdate>
            <volume>3</volume>
            <fpage>21</fpage>
            <lpage>29</lpage>
            <xrefbib>
               <pubid idtype="pmpid">7584439</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B88">
            <title>
               <p>Matrices</p>
            </title>
            <url>http://rana.lbl.gov/~dan/matrices.html</url>
         </bibl>
         <bibl id="B89">
            <title>
               <p>Identification of consensus patterns in unaligned DNA sequences known to be functionally related</p>
            </title>
            <aug>
               <au>
                  <snm>Hertz</snm>
                  <fnm>GZ</fnm>
               </au>
               <au>
                  <snm>Hartzell</snm>
                  <fnm>GW</fnm>
               </au>
               <au>
                  <snm>Stormo</snm>
                  <fnm>GD</fnm>
               </au>
            </aug>
            <source>Comput Appl Biosci</source>
            <pubdate>1990</pubdate>
            <volume>6</volume>
            <issue>2</issue>
            <fpage>81</fpage>
            <lpage>92</lpage>
            <xrefbib>
               <pubid idtype="pmpid">2193692</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B90">
            <title>
               <p>R: a language for data analysis and graphics</p>
            </title>
            <aug>
               <au>
                  <snm>Ihaka</snm>
                  <fnm>RGR</fnm>
               </au>
            </aug>
            <source>Journal of Computational and Graphical Statistics</source>
            <pubdate>1996</pubdate>
            <volume>5</volume>
            <fpage>299</fpage>
            <lpage>314</lpage>
            <xrefbib>
               <pubid idtype="doi">10.2307/1390807</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B91">
            <title>
               <p>Sanger Center GFF2 Format Specification</p>
            </title>
            <url>http://www.sanger.ac.uk/Software/formats/GFF/GFF_Spec.shtml</url>
         </bibl>
         <bibl id="B92">
            <title>
               <p>Flybase</p>
            </title>
            <url>http://flybase.net</url>
         </bibl>
      </refgrp>
   </bm>
</art>
