<?xml version='1.0'?>
<!DOCTYPE art SYSTEM 'http://www.biomedcentral.com/xml/article.dtd'>
<art>
   <ui>1471-2105-8-46</ui>
   <ji>1471-2105</ji>
   <fm>
      <dochead>Research article</dochead>
      <bibl>
         <title>
            <p>WeederH: an algorithm for finding conserved regulatory motifs and regions in homologous sequences</p>
         </title>
         <aug>
            <au id="A1" ca="yes">
               <snm>Pavesi</snm>
               <fnm>Giulio</fnm>
               <insr iid="I1"/>
               <email>giulio.pavesi@unimi.it</email>
            </au>
            <au id="A2">
               <snm>Zambelli</snm>
               <fnm>Federico</fnm>
               <insr iid="I1"/>
               <email>federico.zambelli@unimi.it</email>
            </au>
            <au id="A3" ca="yes">
               <snm>Pesole</snm>
               <fnm>Graziano</fnm>
               <insr iid="I2"/>
               <insr iid="I3"/>
               <email>graziano.pesole@biologia.uniba.it</email>
            </au>
         </aug>
         <insg>
            <ins id="I1">
               <p>Dipartimento di Scienze Biomolecolari e Biotecnologie, University of Milan, Milan, Italy</p>
            </ins>
            <ins id="I2">
               <p>Dipartimento di Biochimica e Biologia Molecolare "E. Quagliariello", University of Bari, Bari, Italy</p>
            </ins>
            <ins id="I3">
               <p>Istituto Tecnologie Biomediche &#8211; Consiglio Nazionale delle Ricerche, Bari, Italy</p>
            </ins>
         </insg>
         <source>BMC Bioinformatics</source>
         <issn>1471-2105</issn>
         <pubdate>2007</pubdate>
         <volume>8</volume>
         <issue>1</issue>
         <fpage>46</fpage>
         <url>http://www.biomedcentral.com/1471-2105/8/46</url>
         <xrefbib>
            <pubidlist>
               <pubid idtype="pmpid">17286865</pubid>
               <pubid idtype="doi">10.1186/1471-2105-8-46</pubid>
            </pubidlist>
         </xrefbib>
      </bibl>
      <history>
         <rec>
            <date>
               <day>29</day>
               <month>8</month>
               <year>2006</year>
            </date>
         </rec>
         <acc>
            <date>
               <day>07</day>
               <month>2</month>
               <year>2007</year>
            </date>
         </acc>
         <pub>
            <date>
               <day>07</day>
               <month>2</month>
               <year>2007</year>
            </date>
         </pub>
      </history>
      <cpyrt>
         <year>2007</year>
         <collab>Pavesi et al; licensee BioMed Central Ltd.</collab>
         <note>This is an Open Access article distributed under the terms of the Creative Commons Attribution License (<url>http://creativecommons.org/licenses/by/2.0</url>), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</note>
      </cpyrt>
      <abs>
         <sec>
            <st>
               <p>Abstract</p>
            </st>
            <sec>
               <st>
                  <p>Background</p>
               </st>
               <p>This work addresses the problem of detecting conserved transcription factor binding sites and in general regulatory regions through the analysis of sequences from homologous genes, an approach that is becoming more and more widely used given the ever increasing amount of genomic data available.</p>
            </sec>
            <sec>
               <st>
                  <p>Results</p>
               </st>
               <p>We present an algorithm that identifies conserved transcription factor binding sites in a given sequence by comparing it to one or more homologs, adapting a framework we previously introduced for the discovery of sites in sequences from co-regulated genes. Differently from the most commonly used methods, the approach we present does not need or compute an alignment of the sequences investigated, nor resorts to descriptors of the binding specificity of known transcription factors. The main novel idea we introduce is a relative measure of conservation, assuming that true functional elements should present a higher level of conservation with respect to the rest of the sequence surrounding them. We present tests where we applied the algorithm to the identification of conserved annotated sites in homologous promoters, as well as in distal regions like enhancers.</p>
            </sec>
            <sec>
               <st>
                  <p>Conclusion</p>
               </st>
               <p>Results of the tests show how the algorithm can provide fast and reliable predictions of conserved transcription factor binding sites regulating the transcription of a gene, with better performances than other available methods for the same task. We also show examples on how the algorithm can be successfully employed when promoter annotations of the genes investigated are missing, or when regulatory sites and regions are located far away from the genes.</p>
            </sec>
         </sec>
      </abs>
   </fm>
   <meta>
      <classifications>
         <classification type="bmc" subtype="user_supplied_xml" id="endnote"/>
      </classifications>
   </meta>
   <bdy>
      <sec>
         <st>
            <p>Background</p>
         </st>
         <p>Genome sequencing projects have told researchers <it>where </it>genes are located, in human and an ever increasing number of other species, and microarrays and other sources of information can tell <it>when </it>genes are activated: but the complete understanding of <it>how </it>genes expression is regulated at the transcriptional and post-transcriptional levels, as well as the characterization of all the elements involved in the process still remain an open question in molecular biology. Transcription is a fundamental step in the regulation of gene expression, and it is modulated by the interaction of transcription factors (TFs) with their corresponding binding sites (TFBS) on the DNA <abbrgrp><abbr bid="B1">1</abbr></abbrgrp>, mostly located near the transcription start site (TSS) of the gene or far apart organized in cis-regulatory modules (CRMs, i.e. enhancers, silencers, etc.).</p>
         <p>Computational methods for the discovery of conserved TFBSs (<it>motifs</it>) can be split into two broad categories: the 'single species, many genes' approach <abbrgrp><abbr bid="B2">2</abbr></abbrgrp>, and the 'single gene, many species' one <abbrgrp><abbr bid="B3">3</abbr></abbrgrp>. In the former case, a set of regions (i.e., promoters) from co-regulated genes are analyzed looking for over-represented motifs, that is, the TFBSs responsible for the co-regulation of the genes; while in the latter approach, known as <it>phylogenetic footprinting </it>(a term first introduced in <abbrgrp><abbr bid="B4">4</abbr></abbrgrp>), a single gene is investigated, and non coding regions flanking it are compared to their homologs in other species. Non coding sequence elements that are found to be conserved by evolution are likely to be involved in the regulation of the expression of the gene. Clearly, the two approaches can be merged, and each of a set of co-regulated genes can be compared both to its homologs, and to the others <abbrgrp><abbr bid="B5">5</abbr><abbr bid="B6">6</abbr></abbrgrp>, and this analysis can be also performed on a full-genomic scale <abbrgrp><abbr bid="B7">7</abbr><abbr bid="B8">8</abbr><abbr bid="B9">9</abbr><abbr bid="B10">10</abbr></abbrgrp>. Given the ever increasing number of annotated genomic sequences available, phylogenetic footprinting has become more and more widely used, since it avoids the need of assembling a set of co-regulated genes (that in turn implies the need of building reliable datasets) and allows for the investigation of single genes alone.</p>
         <p>The comparison of homologous non coding sequences can be performed both for the identification of single TFBSs, for example in promoter regions, and on a larger scale for the discovery of distal CRMs, an approach that has been successful in several cases ever since the first human-mouse comparisons have been possible (see <abbrgrp><abbr bid="B3">3</abbr></abbrgrp> for a review). The available methods that are more commonly used first build an alignment (either local or global) of the sequences investigated <abbrgrp><abbr bid="B11">11</abbr></abbrgrp>, or take advantage of the pre-computed full genomic alignments now available <abbrgrp><abbr bid="B12">12</abbr><abbr bid="B13">13</abbr></abbrgrp>. Then, one simple solution to identify conserved functional elements is to use descriptors of the binding specificity of TFBSs (like <it>position specific weight matrices </it><abbrgrp><abbr bid="B14">14</abbr></abbrgrp> provided for example in the TRANSFAC database <abbrgrp><abbr bid="B15">15</abbr></abbrgrp>) and look for conserved aligned regions fitting the descriptor. This approach can be used both for the detection of single TFBSs (see for example <abbrgrp><abbr bid="B16">16</abbr></abbrgrp>) as well as of clusters of TFBSs likely to form conserved CRMs (among many others, <abbrgrp><abbr bid="B17">17</abbr><abbr bid="B18">18</abbr></abbrgrp>).</p>
         <p>Methods of this kind have to face two issues: first of all, the need of reliable descriptors of the binding specificity of the different TFs. Usually, PWMs yield a large number of false positive matches <abbrgrp><abbr bid="B19">19</abbr></abbrgrp>, and while requiring a match to be conserved throughout different sequences reduces them, the problem of defining whether a match is significant in all the species considered remains. Second, and most important, is the need of having a reliable alignment of the sequences investigated. TFBSs tend to be quite short (6&#8211;15 nucleotides), when compared to a typical region analyzed (a promoter of 500&#8211;1000 bps): in case the sequences aligned are too divergent, the result is that conserved TFBSs can be missed simply because not correctly aligned. A dual solution, of using matches of TFBSs as anchors for the alignment has indeed been proposed in order to overcome this problem and improve alignment reliability <abbrgrp><abbr bid="B20">20</abbr></abbrgrp>. Sequence alignments are avoided by the Footer algorithm <abbrgrp><abbr bid="B21">21</abbr></abbrgrp>, that performs human-mouse comparisons by using distinct descriptors for the TFBSs of the two species, and looks for matches for homologous TFs that fall at similar positions with respect to the genes studied.</p>
         <p>When known TFBSs descriptors are not used, the idea is naturally to identify elements or regions that can be considered to be "significantly conserved", and hence likely to possess a regulatory function. The simplest strategy is just to single out the most conserved parts of the alignments, according to identity percentage: while a non-coding region highly (or "ultra") conserved can be reasonably suspected to possess a functional role <abbrgrp><abbr bid="B22">22</abbr><abbr bid="B23">23</abbr></abbrgrp>, the problem is often to define how much conserved regions should be to be considered significant. In fact, while conserved TFBSs and CRMs can be qualitatively defined as "islands of conservation in a sea of much less conserved DNA" <abbrgrp><abbr bid="B24">24</abbr></abbrgrp>, suitable measures able to quantify this concept have to be introduced <abbrgrp><abbr bid="B25">25</abbr></abbrgrp>. Indeed, recent research has focused on this point. MBA is an algorithm that looks for blocks of highly constrained alignments, weights them according to phylogenetic distance, and estimates significance according to neutral substitution rates <abbrgrp><abbr bid="B26">26</abbr></abbrgrp>. A regulatory potential (RP) score is defined in <abbrgrp><abbr bid="B27">27</abbr></abbrgrp>, by looking for patterns of conservation frequently found in conserved regulatory regions. Evolutionary and hidden Markov models are combined in phastCons <abbrgrp><abbr bid="B28">28</abbr></abbrgrp>, in order to define a measure of significance for the conservation of a multiple genomic alignment.</p>
         <p>While all these methods can perform well in the identification of quite large conserved regulatory regions like CRMs or whole promoters <abbrgrp><abbr bid="B29">29</abbr></abbrgrp>, they are less powerful for the identification of single TFBSs. Size and conservation of TFBSs are in fact often not enough to constitute "significant" parts of the alignments (or significant local alignments) <abbrgrp><abbr bid="B30">30</abbr></abbrgrp>. Another important issue is defining how much conserved a region should be to be considered as worth of further investigation. Different homologous genomic regions, for example in a human-rodent comparison, show a varying degree of conservation, that is, seem to evolve at different rates <abbrgrp><abbr bid="B31">31</abbr></abbrgrp>. Thus, for example, if a unique significance threshold is used in some cases a whole promoter region can be considered as "significantly conserved" (thus missing the locations of single TFBSs), while in others no significant sequence element is reported.</p>
         <p>Indeed, some methods that do not compute a global or local sequence alignment have already been proposed. Several motif discovery algorithms for the detection of conserved sequence elements in co-regulated genes already exist <abbrgrp><abbr bid="B2">2</abbr><abbr bid="B32">32</abbr></abbrgrp>, and sometimes have been successfully applied to homologous sequences as well <abbrgrp><abbr bid="B33">33</abbr></abbrgrp>. Algorithms of this kind, however, assume that input sequences are not related by evolution, and thus look for subtle similarities: the result is that a human-mouse comparison can report a deluge of conserved motifs, regardless of the algorithmic strategy and significance evaluation employed.</p>
         <p>Footprinter <abbrgrp><abbr bid="B34">34</abbr></abbrgrp>, instead, is an algorithm for the discovery of conserved motifs explicitly devised for phylogenetic footprinting, that looks for conserved sequence motifs making use of the phylogenetic relationships among the sequences. Motifs conservation is first of all evaluated according to parsimony scores. While this approach bypasses the need of pre-aligning the sequences, parsimony scores alone do not provide a fine-grained ranking of the motifs found, especially when a few sequences are investigated, i.e., in a typical human-rodent comparison. A statistical evaluation of the results of the algorithm is possible, by comparing motifs found with a "random" dataset of simulated orthologous sequences. The problem is, again, that even when the same species are compared the degree of conservation in the promoters of different genes can vary significantly according to the genes investigated, and thus establishing unique significance thresholds has the effect of yielding too many significant motifs in some cases, too few in others.</p>
         <p>The aim of this work is to introduce a novel strategy to identify significantly conserved motifs and regions in homologous sequences. Given a reference sequence, and one or more homologs, the algorithm we propose is based on the idea that functional conserved elements should be conserved both in sequence and in position with respect to the genes they regulate. Starting from this consideration, we adapted to this case a statistical measure we previously used for the discovery of TFBSs in sequences from co-regulated genes, by adding to it positional conservation. Moreover, as we discussed before, defining absolute measures of significance for conservation is hard, given that sequence conservation varies greatly according to the species and the genes considered. We tackle this problem by measuring conservation not in an absolute, but in a relative way, according to the average degree of conservation of the whole sequences compared, with the idea that functional elements should be more conserved than the rest: in other words, what the algorithm evaluates is not significant conservation, but rather significant variation of conservation within the same sequences. Clearly, the motifs and regions selected by the algorithm can be further processed by matching them against descriptors of known TFBSs, or compared to regions extracted from other co-regulated genes.</p>
      </sec>
      <sec>
         <st>
            <p>Results</p>
         </st>
         <p>In this section we first present the algorithm, then we assess its performance showing results obtained on tests performed on collections of known functional elements.</p>
         <sec>
            <st>
               <p>Algorithm</p>
            </st>
            <p>The WeederH algorithm takes as input a reference sequence <it>S</it>, and any number <it>k </it>&#8805; 1 of homologous sequences <it>H</it><sub>1 </sub>... <it>H</it><sub><it>k</it></sub>. Also, it assumes that all the sequences have been taken from the different genomes with respect to the same reference points: that is, all sequences are upstream of the TSS or the ATG codon of homologous genes, or are intergenic regions between two genes and between their homologs in other species, and so on. Conserved motifs are identified in the reference sequence, by comparing it to the homologs. The steps performed by the algorithm can be summarized as follows:</p>
            <p>1. Each oligo of suitable size of the reference sequence is matched against the homologous sequences;</p>
            <p>2. Matches not exceeding a given substitution threshold are scored with a measure taking into account sequence and position conservation, and the highest scoring match is kept;</p>
            <p>3. Oligo scores are transformed into relative scores, according to the average scores obtained by oligos of the same size;</p>
            <p>4. High scoring oligos are merged, whenever possible, in order to obtain longer motifs and regions.</p>
            <p>It can be immediately seen that we used for modeling conserved sites the oligos themselves, rather than more involved representations of TFBSs like profiles and position weight matrices <abbrgrp><abbr bid="B14">14</abbr></abbrgrp>. The latter are clearly more powerful than oligos and consensi for modeling the binding specificity of a given TF: however, for the ab initio discovery of novel motifs and sites, which in turn is essentially based on the detection of similar oligos and its statistical evaluation <abbrgrp><abbr bid="B2">2</abbr></abbrgrp>, recent results have shown no definite prevalence, and rather consensus- (or oligo-) based models have often yielded better results <abbrgrp><abbr bid="B32">32</abbr></abbrgrp>.</p>
            <sec>
               <st>
                  <p>Finding conserved motifs</p>
               </st>
               <p>In the first step, each oligo of a given length <it>m </it>of the reference sequence <it>S </it>is matched against each of the homologous sequences, and all its occurrences with at most <it>e </it>substitutions are collected. Given <it>s</it><sub><it>i</it></sub>, the <it>m</it>-mer at position <it>i </it>of the reference sequence, a match at position <it>j </it>of the <it>k</it>-th homologous sequence <it>H</it><sub><it>k </it></sub>presenting <it>d </it>&#8804; <it>e </it>substitutions is scored by taking into account sequence and position conservation:</p>
               <p><it>B</it><sub><it>k </it></sub>(<it>i</it>, <it>j</it>, <it>m</it>) = - log (<it>E </it>(<it>s</it><sub><it>i</it></sub>, <it>d</it>, <it>k</it>)) - log (&#916; (<it>i</it>, <it>j</it>) + 1)</p>
               <p>Where <it>E(s</it><sub><it>i</it></sub><it>, d, k) </it>is the expected frequency of <it>s</it><sub><it>i </it></sub>with at most <it>d </it>substitutions in the species of sequence <it>k </it>(see Methods), and <it>&#916; (i,j) </it>is the distance between the two positions, measured according to the reference points defined (e.g. <it>i </it>and <it>j </it>bps upstream of the TSS of the respective genes). This function is similar to the one we employed in the original Weeder algorithm <abbrgrp><abbr bid="B35">35</abbr><abbr bid="B36">36</abbr></abbrgrp>, that anyway did not make any assumption on the positional conservation of the motifs.</p>
               <p>The score of oligo <it>s</it><sub><it>i </it></sub>with respect to the <it>k</it>-th homolog <it>H</it><sub><it>k </it></sub>is given by the maximum among the matching positions:</p>
               <p>
                  <m:math name="1471-2105-8-46-i1" xmlns:m="http://www.w3.org/1998/Math/MathML">
                     <m:semantics>
                        <m:mrow>
                           <m:msub>
                              <m:mi>B</m:mi>
                              <m:mi>k</m:mi>
                           </m:msub>
                           <m:mo stretchy="false">(</m:mo>
                           <m:mi>i</m:mi>
                           <m:mo>,</m:mo>
                           <m:mi>m</m:mi>
                           <m:mo stretchy="false">)</m:mo>
                           <m:mo>=</m:mo>
                           <m:munder>
                              <m:mrow>
                                 <m:mi>max</m:mi>
                                 <m:mo>&#8289;</m:mo>
                              </m:mrow>
                              <m:mi>j</m:mi>
                           </m:munder>
                           <m:msub>
                              <m:mi>B</m:mi>
                              <m:mi>k</m:mi>
                           </m:msub>
                           <m:mo stretchy="false">(</m:mo>
                           <m:mi>i</m:mi>
                           <m:mo>,</m:mo>
                           <m:mi>j</m:mi>
                           <m:mo>,</m:mo>
                           <m:mi>m</m:mi>
                           <m:mo stretchy="false">)</m:mo>
                        </m:mrow>
                        <m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacqWGcbGqdaWgaaWcbaGaem4AaSgabeaakiabcIcaOiabdMgaPjabcYcaSiabd2gaTjabcMcaPiabg2da9maaxababaGagiyBa0MaeiyyaeMaeiiEaGhaleaacqWGQbGAaeqaaOGaemOqai0aaSbaaSqaaiabdUgaRbqabaGccqGGOaakcqWGPbqAcqGGSaalcqWGQbGAcqGGSaalcqWGTbqBcqGGPaqkaaa@4599@</m:annotation>
                     </m:semantics>
                  </m:math>
               </p>
               <p>If no match is found for <it>s</it><sub><it>i </it></sub>in sequence <it>H</it><sub><it>k</it></sub>, or all matches yield negative scores (that is, the distance exceeds the expected value) this score is set to zero. At this point, the overall score associated with <it>s</it><sub><it>i </it></sub>could be defined as the sum of the scores in each homologous sequence: but, the <it>B</it><sub><it>k</it></sub><it>(i) </it>values can vary greatly according to the overall conservation of the sequences compared (e.g., a human-mouse comparison will yield scores greater than a human-chicken comparison). However, regardless of the species considered, the idea underlying the algorithm is that functional elements should be more conserved than the rest of the sequences: thus, instead of using directly the <it>B</it><sub><it>k</it></sub><it>(i) </it>scores the algorithm first transforms them into <it>relative </it>scores. Let</p>
               <p>
                  <m:math name="1471-2105-8-46-i2" xmlns:m="http://www.w3.org/1998/Math/MathML">
                     <m:semantics>
                        <m:mrow>
                           <m:mi>&#956;</m:mi>
                           <m:mo stretchy="false">(</m:mo>
                           <m:mi>k</m:mi>
                           <m:mo>,</m:mo>
                           <m:mi>m</m:mi>
                           <m:mo stretchy="false">)</m:mo>
                           <m:mo>=</m:mo>
                           <m:mfrac>
                              <m:mrow>
                                 <m:mstyle displaystyle="true">
                                    <m:msub>
                                       <m:mo>&#8721;</m:mo>
                                       <m:mi>i</m:mi>
                                    </m:msub>
                                    <m:mrow>
                                       <m:msub>
                                          <m:mi>B</m:mi>
                                          <m:mi>k</m:mi>
                                       </m:msub>
                                       <m:mo stretchy="false">(</m:mo>
                                       <m:mi>i</m:mi>
                                       <m:mo>,</m:mo>
                                       <m:mi>m</m:mi>
                                       <m:mo stretchy="false">)</m:mo>
                                    </m:mrow>
                                 </m:mstyle>
                              </m:mrow>
                              <m:mrow>
                                 <m:mrow>
                                    <m:mo>|</m:mo>
                                    <m:mi>S</m:mi>
                                    <m:mo>|</m:mo>
                                 </m:mrow>
                                 <m:mo>&#8722;</m:mo>
                                 <m:mi>m</m:mi>
                                 <m:mo>+</m:mo>
                                 <m:mn>1</m:mn>
                              </m:mrow>
                           </m:mfrac>
                        </m:mrow>
                        <m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaaiiGacqWF8oqBcqGGOaakcqWGRbWAcqGGSaalcqWGTbqBcqGGPaqkcqGH9aqpdaWcaaqaamaaqababaGaemOqai0aaSbaaSqaaiabdUgaRbqabaGccqGGOaakcqWGPbqAcqGGSaalcqWGTbqBcqGGPaqkaSqaaiabdMgaPbqab0GaeyyeIuoaaOqaamaaemaabaGaem4uamfacaGLhWUaayjcSdGaeyOeI0IaemyBa0Maey4kaSIaeGymaedaaaaa@4880@</m:annotation>
                     </m:semantics>
                  </m:math>
               </p>
               <p>and</p>
               <p>
                  <m:math name="1471-2105-8-46-i3" xmlns:m="http://www.w3.org/1998/Math/MathML">
                     <m:semantics>
                        <m:mrow>
                           <m:msup>
                              <m:mi>&#963;</m:mi>
                              <m:mn>2</m:mn>
                           </m:msup>
                           <m:mo stretchy="false">(</m:mo>
                           <m:mi>k</m:mi>
                           <m:mo>,</m:mo>
                           <m:mi>m</m:mi>
                           <m:mo stretchy="false">)</m:mo>
                           <m:mo>=</m:mo>
                           <m:mfrac>
                              <m:mrow>
                                 <m:mstyle displaystyle="true">
                                    <m:msub>
                                       <m:mo>&#8721;</m:mo>
                                       <m:mi>i</m:mi>
                                    </m:msub>
                                    <m:mrow>
                                       <m:msup>
                                          <m:mrow>
                                             <m:mo stretchy="false">(</m:mo>
                                             <m:msub>
                                                <m:mi>B</m:mi>
                                                <m:mi>k</m:mi>
                                             </m:msub>
                                             <m:mo stretchy="false">(</m:mo>
                                             <m:mi>i</m:mi>
                                             <m:mo>,</m:mo>
                                             <m:mi>m</m:mi>
                                             <m:mo stretchy="false">)</m:mo>
                                             <m:mo>&#8722;</m:mo>
                                             <m:mi>&#956;</m:mi>
                                             <m:mo stretchy="false">(</m:mo>
                                             <m:mi>k</m:mi>
                                             <m:mo>,</m:mo>
                                             <m:mi>m</m:mi>
                                             <m:mo stretchy="false">)</m:mo>
                                             <m:mo stretchy="false">)</m:mo>
                                          </m:mrow>
                                          <m:mn>2</m:mn>
                                       </m:msup>
                                    </m:mrow>
                                 </m:mstyle>
                              </m:mrow>
                              <m:mrow>
                                 <m:mrow>
                                    <m:mo>|</m:mo>
                                    <m:mi>S</m:mi>
                                    <m:mo>|</m:mo>
                                 </m:mrow>
                                 <m:mo>&#8722;</m:mo>
                                 <m:mi>m</m:mi>
                                 <m:mo>+</m:mo>
                                 <m:mn>1</m:mn>
                              </m:mrow>
                           </m:mfrac>
                        </m:mrow>
                        <m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaaiiGacqWFdpWCdaahaaWcbeqaaiabikdaYaaakiabcIcaOiabdUgaRjabcYcaSiabd2gaTjabcMcaPiabg2da9maalaaabaWaaabeaeaacqGGOaakcqWGcbGqdaWgaaWcbaGaem4AaSgabeaakiabcIcaOiabdMgaPjabcYcaSiabd2gaTjabcMcaPiabgkHiTiab=X7aTjabcIcaOiabdUgaRjabcYcaSiabd2gaTjabcMcaPiabcMcaPmaaCaaaleqabaGaeGOmaidaaaqaaiabdMgaPbqab0GaeyyeIuoaaOqaamaaemaabaGaem4uamfacaGLhWUaayjcSdGaeyOeI0IaemyBa0Maey4kaSIaeGymaedaaaaa@546E@</m:annotation>
                     </m:semantics>
                  </m:math>
               </p>
               <p>Be the mean and the variance of the scores obtained by <it>m</it>-mers of the reference sequence when matched against sequence <it>H</it><sub><it>k</it></sub>. The term |<it>S</it>| indicates the length of the reference sequence. The score of each <it>m</it>-mer is standardized into a &#967;<sup>2 </sup>relative score:</p>
               <p>
                  <m:math name="1471-2105-8-46-i4" xmlns:m="http://www.w3.org/1998/Math/MathML">
                     <m:semantics>
                        <m:mrow>
                           <m:msubsup>
                              <m:mi>&#967;</m:mi>
                              <m:mi>k</m:mi>
                              <m:mn>2</m:mn>
                           </m:msubsup>
                           <m:mo stretchy="false">(</m:mo>
                           <m:mi>i</m:mi>
                           <m:mo>,</m:mo>
                           <m:mi>m</m:mi>
                           <m:mo stretchy="false">)</m:mo>
                           <m:mo>=</m:mo>
                           <m:mfrac>
                              <m:mrow>
                                 <m:msup>
                                    <m:mrow>
                                       <m:mo stretchy="false">(</m:mo>
                                       <m:msub>
                                          <m:mi>B</m:mi>
                                          <m:mi>k</m:mi>
                                       </m:msub>
                                       <m:mo stretchy="false">(</m:mo>
                                       <m:mi>i</m:mi>
                                       <m:mo>,</m:mo>
                                       <m:mi>m</m:mi>
                                       <m:mo stretchy="false">)</m:mo>
                                       <m:mo>&#8722;</m:mo>
                                       <m:mi>&#956;</m:mi>
                                       <m:mo stretchy="false">(</m:mo>
                                       <m:mi>k</m:mi>
                                       <m:mo>,</m:mo>
                                       <m:mi>m</m:mi>
                                       <m:mo stretchy="false">)</m:mo>
                                       <m:mo stretchy="false">)</m:mo>
                                    </m:mrow>
                                    <m:mn>2</m:mn>
                                 </m:msup>
                              </m:mrow>
                              <m:mrow>
                                 <m:msup>
                                    <m:mi>&#963;</m:mi>
                                    <m:mn>2</m:mn>
                                 </m:msup>
                                 <m:mo stretchy="false">(</m:mo>
                                 <m:mi>k</m:mi>
                                 <m:mo>,</m:mo>
                                 <m:mi>m</m:mi>
                                 <m:mo stretchy="false">)</m:mo>
                              </m:mrow>
                           </m:mfrac>
                        </m:mrow>
                        <m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaaiiGacqWFhpWydaqhaaWcbaGaem4AaSgabaGaeGOmaidaaOGaeiikaGIaemyAaKMaeiilaWIaemyBa0MaeiykaKIaeyypa0ZaaSaaaeaacqGGOaakcqWGcbGqdaWgaaWcbaGaem4AaSgabeaakiabcIcaOiabdMgaPjabcYcaSiabd2gaTjabcMcaPiabgkHiTiab=X7aTjabcIcaOiabdUgaRjabcYcaSiabd2gaTjabcMcaPiabcMcaPmaaCaaaleqabaGaeGOmaidaaaGcbaGae83Wdm3aaWbaaSqabeaacqaIYaGmaaGccqGGOaakcqWGRbWAcqGGSaalcqWGTbqBcqGGPaqkaaaaaa@5252@</m:annotation>
                     </m:semantics>
                  </m:math>
               </p>
               <p>The overall relative score for the <it>m</it>-mer at position <it>i </it>of the reference sequence is finally defined as the sum of the relative score contributions of each homologous sequence:</p>
               <p>
                  <m:math name="1471-2105-8-46-i5" xmlns:m="http://www.w3.org/1998/Math/MathML">
                     <m:semantics>
                        <m:mrow>
                           <m:msup>
                              <m:mi>&#967;</m:mi>
                              <m:mn>2</m:mn>
                           </m:msup>
                           <m:mo stretchy="false">(</m:mo>
                           <m:mi>i</m:mi>
                           <m:mo>,</m:mo>
                           <m:mi>m</m:mi>
                           <m:mo stretchy="false">)</m:mo>
                           <m:mo>=</m:mo>
                           <m:mstyle displaystyle="true">
                              <m:munder>
                                 <m:mo>&#8721;</m:mo>
                                 <m:mi>k</m:mi>
                              </m:munder>
                              <m:mrow>
                                 <m:msubsup>
                                    <m:mi>&#967;</m:mi>
                                    <m:mi>k</m:mi>
                                    <m:mn>2</m:mn>
                                 </m:msubsup>
                                 <m:mo stretchy="false">(</m:mo>
                                 <m:mi>i</m:mi>
                                 <m:mo>,</m:mo>
                                 <m:mi>m</m:mi>
                                 <m:mo stretchy="false">)</m:mo>
                              </m:mrow>
                           </m:mstyle>
                        </m:mrow>
                        <m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaaiiGacqWFhpWydaahaaWcbeqaaiabikdaYaaakiabcIcaOiabdMgaPjabcYcaSiabd2gaTjabcMcaPiabg2da9maaqafabaGae83Xdm2aa0baaSqaaiabdUgaRbqaaiabikdaYaaakiabcIcaOiabdMgaPjabcYcaSiabd2gaTjabcMcaPaWcbaGaem4AaSgabeqdcqGHris5aaaa@42F5@</m:annotation>
                     </m:semantics>
                  </m:math>
               </p>
               <p>&#967;<sup>2 </sup>scores are computed only when <it>B</it><sub><it>k </it></sub><it>(i, m) </it>> &#956; <it>(k,m) </it>for each homologous sequence <it>H</it><sub><it>k</it></sub>, otherwise it is set to zero. Concerning suitable values for the motif length <it>m </it>that has to be considered, in the experiments we present in this article we ran the algorithm on size <it>m </it>= 8 and <it>m </it>= 12, computing for each mean and variance values: longer motifs or regions are detected by combining and merging shorter ones, as explained in the following section.</p>
            </sec>
            <sec>
               <st>
                  <p>Merging motifs</p>
               </st>
               <p>Very often, the regulation of the transcription of an eukaryotic gene is the result of the simultaneous action of different TFs. Binding sites for cooperative or competitive factors are often adjacent or overlapping each other, with the result that regions longer than a single site are often found to be conserved throughout different species. In order to detect explicitly these regions, in the second step the algorithm merges motifs adjacent in the reference sequence (e.g., motif <it>m</it><sub>1 </sub>in position <it>i </it>of length <it>l </it>and motif <it>m</it><sub>2 </sub>in position <it>i + l</it>), if their best occurrences (the ones that were used to compute their scores) are adjacent in all the homologs.</p>
               <p>Since the occurrences of the two motifs do not overlap and are independent, the sum of the original <it>B</it><sub><it>i </it></sub>scores of two motifs <it>m</it><sub>1 </sub>and <it>m</it><sub>2 </sub>(denoted here for sake of simplicity as <it>B</it><sub><it>i</it></sub><it>(m</it><sub>1</sub><it>) </it>and <it>B</it><sub><it>i</it></sub><it>(m</it><sub>2</sub><it>)</it>) has mean <it>&#956;</it><sub>1 <it>i </it></sub>+ <it>&#956;</it><sub>2 <it>i </it></sub>and variance &#963;<sup>2</sup><sub>1 <it>i</it></sub>+&#963;<sup>2</sup><sub>2 <it>i</it></sub>, that is, are the sum of mean and variance of the first and second motif <it>B</it><sub><it>i </it></sub>scores, respectively, computed according to the motifs' length for each homologous sequence <it>H</it><sub><it>i</it></sub>. The resulting &#967;<sup>2 </sup>score of the merged motif can be then defined as:</p>
               <p>
                  <m:math name="1471-2105-8-46-i6" xmlns:m="http://www.w3.org/1998/Math/MathML">
                     <m:semantics>
                        <m:mrow>
                           <m:msup>
                              <m:mi>&#967;</m:mi>
                              <m:mn>2</m:mn>
                           </m:msup>
                           <m:mo stretchy="false">(</m:mo>
                           <m:msub>
                              <m:mi>m</m:mi>
                              <m:mn>1</m:mn>
                           </m:msub>
                           <m:msub>
                              <m:mi>m</m:mi>
                              <m:mn>2</m:mn>
                           </m:msub>
                           <m:mo stretchy="false">)</m:mo>
                           <m:mo>=</m:mo>
                           <m:mstyle displaystyle="true">
                              <m:munder>
                                 <m:mo>&#8721;</m:mo>
                                 <m:mi>i</m:mi>
                              </m:munder>
                              <m:mrow>
                                 <m:mfrac>
                                    <m:mrow>
                                       <m:msup>
                                          <m:mrow>
                                             <m:mo stretchy="false">(</m:mo>
                                             <m:msub>
                                                <m:mi>B</m:mi>
                                                <m:mi>i</m:mi>
                                             </m:msub>
                                             <m:mo stretchy="false">(</m:mo>
                                             <m:msub>
                                                <m:mi>m</m:mi>
                                                <m:mn>1</m:mn>
                                             </m:msub>
                                             <m:mo stretchy="false">)</m:mo>
                                             <m:mo>+</m:mo>
                                             <m:msub>
                                                <m:mi>B</m:mi>
                                                <m:mi>i</m:mi>
                                             </m:msub>
                                             <m:mo stretchy="false">(</m:mo>
                                             <m:msub>
                                                <m:mi>m</m:mi>
                                                <m:mn>2</m:mn>
                                             </m:msub>
                                             <m:mo stretchy="false">)</m:mo>
                                             <m:mo>&#8722;</m:mo>
                                             <m:msub>
                                                <m:mi>&#956;</m:mi>
                                                <m:mrow>
                                                   <m:mn>1</m:mn>
                                                   <m:mi>i</m:mi>
                                                </m:mrow>
                                             </m:msub>
                                             <m:mo>&#8722;</m:mo>
                                             <m:msub>
                                                <m:mi>&#956;</m:mi>
                                                <m:mrow>
                                                   <m:mn>2</m:mn>
                                                   <m:mi>i</m:mi>
                                                </m:mrow>
                                             </m:msub>
                                             <m:mo stretchy="false">)</m:mo>
                                          </m:mrow>
                                          <m:mn>2</m:mn>
                                       </m:msup>
                                    </m:mrow>
                                    <m:mrow>
                                       <m:msubsup>
                                          <m:mi>&#963;</m:mi>
                                          <m:mrow>
                                             <m:mn>1</m:mn>
                                             <m:mi>i</m:mi>
                                          </m:mrow>
                                          <m:mn>2</m:mn>
                                       </m:msubsup>
                                       <m:mo>+</m:mo>
                                       <m:msubsup>
                                          <m:mi>&#963;</m:mi>
                                          <m:mrow>
                                             <m:mn>2</m:mn>
                                             <m:mi>i</m:mi>
                                          </m:mrow>
                                          <m:mn>2</m:mn>
                                       </m:msubsup>
                                    </m:mrow>
                                 </m:mfrac>
                              </m:mrow>
                           </m:mstyle>
                        </m:mrow>
                        <m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaaiiGacqWFhpWydaahaaWcbeqaaiabikdaYaaakiabcIcaOiabd2gaTnaaBaaaleaacqaIXaqmaeqaaOGaemyBa02aaSbaaSqaaiabikdaYaqabaGccqGGPaqkcqGH9aqpdaaeqbqaamaalaaabaGaeiikaGIaemOqai0aaSbaaSqaaiabdMgaPbqabaGccqGGOaakcqWGTbqBdaWgaaWcbaGaeGymaedabeaakiabcMcaPiabgUcaRiabdkeacnaaBaaaleaacqWGPbqAaeqaaOGaeiikaGIaemyBa02aaSbaaSqaaiabikdaYaqabaGccqGGPaqkcqGHsislcqWF8oqBdaWgaaWcbaGaeGymaeJaemyAaKgabeaakiabgkHiTiab=X7aTnaaBaaaleaacqaIYaGmcqWGPbqAaeqaaOGaeiykaKYaaWbaaSqabeaacqaIYaGmaaaakeaacqWFdpWCdaqhaaWcbaGaeGymaeJaemyAaKgabaGaeGOmaidaaOGaey4kaSIae83Wdm3aa0baaSqaaiabikdaYiabdMgaPbqaaiabikdaYaaaaaaabaGaemyAaKgabeqdcqGHris5aaaa@61D1@</m:annotation>
                     </m:semantics>
                  </m:math>
               </p>
               <p>Merging is performed by the algorithm only on motifs that in the first step had positive &#967;<sup>2 </sup>score, and thus were more conserved than the average in each homologous sequence. This step is iterated for each position of the reference sequence, that is, single motifs are first compared to the adjacent ones; then, motifs resulting from merging are compared to the adjacent ones, and so on until no further merging is possible. The result is that in this way long conserved regions can be detected, but the regions have to be conserved also <it>locally</it>: they must be built by fragments that, taken singularly, fit our model for conserved TFBSs. Thus, in the merging step the algorithm is able to detect regions of size 16, 20, 24, and so on.</p>
            </sec>
            <sec>
               <st>
                  <p>Input parameters</p>
               </st>
               <p>The only parameters needed by the algorithm, other than the species of origin of the input sequence, are the motifs' size and the maximum number of substitutions allowed when collecting occurrences. As introduced before, in our experiments we used oligos of length <it>m </it>= 8 and 12, with <it>e </it>= 2 and 3 substitutions, respectively. The choice of these parameters was based first of all on the parameters used in the original Weeder algorithm, and also on studies that showed how variation of 25% of the sequence seems to be a critical value, at least for human-rodent comparisons. This, in turn, implies that also longer regions must present at most 25% of substitutions in their occurrences in the homologs. Usually, TFBSs presenting 30% or more mutations in their homologs are in fact much less likely to preserve their function <abbrgrp><abbr bid="B37">37</abbr></abbrgrp>.</p>
            </sec>
            <sec>
               <st>
                  <p>Output</p>
               </st>
               <p>In a typical application, like a human-rodents comparison, several overlapping motifs (before and after merging) with positive &#967;<sup>2 </sup>scores are found. To trim down the size of the output, the algorithm avoids reporting motifs that overlap by more than 2 bps for eightmers, 3 bps for twelve-mers, and 4 bps for longer motifs, with a simple top-down greedy procedure. If a motif overlaps another motif with higher score by more than the defined number of nucleotides, it is removed from the ranked output of motifs.</p>
               <p>An example of a typical output of the algorithm is shown in Figure <figr fid="F1">1</figr>, for the 500 bp upstream and non-coding first exon of the p53 gene of human, mouse, and rat. The highest scoring motifs are shown, displayed within a UCSC genome browser window. Known TFBSs annotated for the human gene in the TRANSFAC database are also shown. It can be seen how a quite long region has been reported, on which adjacent binding sites map, while other shorter motifs are scattered along the sequence, the distal ones falling within a region not deemed to be conserved according to genomic alignments ("Conservation" track). We also show the location on the human sequence of motifs predicted by MEME <abbrgrp><abbr bid="B38">38</abbr></abbrgrp> (run in one occurrence per sequence mode), that cover most of the sequence itself: in fact, since algorithms of this kind are mainly aimed at the detection of very subtle similarities <abbrgrp><abbr bid="B2">2</abbr></abbrgrp>, the high level of overall conservation and the few sequences available lead to the prediction of several long significantly conserved motifs.</p>
               <fig id="F1">
                  <title>
                     <p>Figure 1</p>
                  </title>
                  <caption>
                     <p>An example of the output of WeederH</p>
                  </caption>
                  <text>
                     <p><b>An example of the output of WeederH</b>. An example of the output of WeederH, showing the highest scoring motifs in core promoter and first non-coding exon of the human p53 gene, compared to mouse and rat homologs. Motifs are displayed within the UCSC genome browser. Sites annotated in the TRANSFAC database are shown in green. The longest (and highest scoring) motif encompasses several adjacent sites. Regulatory potential (RP) score [27] and phastCons [45] tracks are also displayed (see Results), together with the motifs output by MEME [38] on the same dataset (run in "oops" mode).</p>
                  </text>
                  <graphic file="1471-2105-8-46-1"/>
               </fig>
            </sec>
         </sec>
         <sec>
            <st>
               <p>Experimental setup</p>
            </st>
            <p>To test the algorithm, we used data taken from the ABS database <abbrgrp><abbr bid="B39">39</abbr></abbrgrp>, a collection of experimentally validated transcription factor binding sites conserved in at least two species, together with the homologous promoters containing them. We retrieved from the database 99 sets of homologous sequences 500 bps long, containing a total of 302 experimentally validated binding sites. Seven sequence sets contained human-mouse-rat sequences, 66 a human-mouse pair, 17 a human-rat pair, and 9 mouse- rat sequences. We used these data first to build simulated datasets, then to test the algorithm on real orthologous sequences.</p>
         </sec>
         <sec>
            <st>
               <p>Results on simulated sequences</p>
            </st>
            <p>We built a dataset of simulated sequences as follows. For each human sequence retrieved from the database, we generated simulated mouse and rat sequences by using the Dawg program <abbrgrp><abbr bid="B40">40</abbr></abbrgrp>, that permits the simulation of sequence evolution also including insertion and deletions. We set different substitution rates yielding different sequence identity percentages, while gap and the other parameters needed were estimated by the Dawg algorithm from the alignment of the sequences retrieved from the database. Since our algorithm selects motifs according to maximum substitution rates, rather than defining substitution rates also for the evolution of binding sites we chose to use the original sites annotated in the ABS database. Thus, once a simulated rodent sequence set was generated, we selected a single site annotated in the human sequence and we planted its annotated homologs in the rodent sequences, together with the five nucleotides flanking them on each side. Motifs were planted at their original positions, since WeederH scores motifs according to their conservation in position. In this way, we obtained 254 sequence sets, each containing a single planted motif. Eighteen sets were composed by human-mouse-rat sequences, 185 by a human-mouse pair, and 51 contained human-rat sequences.</p>
            <p>We assessed the performance of the algorithm by using different measures. First of all, at the nucleotide level, we calculated the percentage of the nucleotides of the true sites planted in the sequences that were predicted as part of a conserved motif by the algorithm (nucleotide coverage &#8211; Nc). This measure is equivalent to sensitivity (ratio of true positives vs. overall sites length). Then, at the site level, by defining a site as correctly predicted if either at least eight of its nucleotides or at least 75% of the site nucleotides overlapped a predicted motif (site coverage &#8211; Sc). In order to have an estimate of the false positive predictions of the algorithm we also computed the overall length of the motifs predicted by the algorithm in each sequence set, that is, the percentage of the reference sequence covered by motifs (%pred), and according to this the specificity (ratio of true negatives vs. overall length of the part of the reference sequence not containing a site).</p>
            <p>Table <tblr tid="T1">1</tblr> shows the results of the algorithm applied to simulated sequences with an average identity of 50%, which has been shown to be average identity percentage on 2,000 bp regions in non-coding human-rodent homologous sequences <abbrgrp><abbr bid="B5">5</abbr><abbr bid="B41">41</abbr></abbrgrp>. The performance varies according to the &#967;<sup>2 </sup>score threshold used. At threshold value 2 we obtain 85% and 93% of nucleotide and site coverage, respectively, but 44% of the reference sequence is covered by motifs, with a specificity of .54. Increasing the score threshold significantly lowers the number of motifs reported, with the percentage of motifs correctly predicted remains at satisfactory values. With threshold 7.5, the algorithm identifies more than 85% of the planted sites, with only 15% of the reference sequence covered by motifs (specificity .85).</p>
            <tbl id="T1">
               <title>
                  <p>Table 1</p>
               </title>
               <caption>
                  <p>Performance of WeederH on simulated promoter sets</p>
               </caption>
               <tblbdy cols="5">
                  <r>
                     <c ca="left">
                        <p>
                           <b>&#967;<sup><b>2 </b></sup>Score Threshold</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>%pred</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>Nc</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>Sc</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>Sp</b>
                        </p>
                     </c>
                  </r>
                  <r>
                     <c cspan="5">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <b>0</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>82.98</p>
                     </c>
                     <c ca="center">
                        <p>94.99</p>
                     </c>
                     <c ca="center">
                        <p>99.21</p>
                     </c>
                     <c ca="center">
                        <p>0.12</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <b>1</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>57.97</p>
                     </c>
                     <c ca="center">
                        <p>88.51</p>
                     </c>
                     <c ca="center">
                        <p>96.45</p>
                     </c>
                     <c ca="center">
                        <p>0.39</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <b>2</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>44.50</p>
                     </c>
                     <c ca="center">
                        <p>85.74</p>
                     </c>
                     <c ca="center">
                        <p>93.30</p>
                     </c>
                     <c ca="center">
                        <p>0.54</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <b>3</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>34.98</p>
                     </c>
                     <c ca="center">
                        <p>82.89</p>
                     </c>
                     <c ca="center">
                        <p>91.73</p>
                     </c>
                     <c ca="center">
                        <p>0.64</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <b>5</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>24.00</p>
                     </c>
                     <c ca="center">
                        <p>78.87</p>
                     </c>
                     <c ca="center">
                        <p>89.37</p>
                     </c>
                     <c ca="center">
                        <p>0.76</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <b>7.5</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>15.59</p>
                     </c>
                     <c ca="center">
                        <p>74.23</p>
                     </c>
                     <c ca="center">
                        <p>85.82</p>
                     </c>
                     <c ca="center">
                        <p>0.85</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <b>10</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>10.90</p>
                     </c>
                     <c ca="center">
                        <p>67.44</p>
                     </c>
                     <c ca="center">
                        <p>77.95</p>
                     </c>
                     <c ca="center">
                        <p>0.90</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <b>12</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>8.33</p>
                     </c>
                     <c ca="center">
                        <p>64.30</p>
                     </c>
                     <c ca="center">
                        <p>73.62</p>
                     </c>
                     <c ca="center">
                        <p>0.93</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <b>15</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>6.15</p>
                     </c>
                     <c ca="center">
                        <p>60.23</p>
                     </c>
                     <c ca="center">
                        <p>69.29</p>
                     </c>
                     <c ca="center">
                        <p>0.95</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <b>FP</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>10.54</p>
                     </c>
                     <c ca="center">
                        <p>61.48</p>
                     </c>
                     <c ca="center">
                        <p>72.44</p>
                     </c>
                     <c ca="center">
                        <p>0.90</p>
                     </c>
                  </r>
               </tblbdy>
               <tblfn>
                  <p>Performance of WeederH at different &#967;<sup>2 </sup>score thresholds on the 254 simulated sequence sets. %pred indicates the percentage of nucleotides of the reference sequence belonging to a motif output by the algorithm. Nc indicates the nucleotide coverage, the percentage of nucleotides in annotated sites matched by a predicted motif (equalling sensitivity); Sc indicates the site coverage, the percentage of annotated sites matched (at least by 75% of their length) by a motif predicted by WeederH; finally, Sp is specificity. The bottom row reports the performance of the Footprinter algorithm on the same set.</p>
               </tblfn>
            </tbl>
            <p>To make a comparison, the core of the Footprinter algorithm is based on the same idea, finding matching oligos in the sequences examined, and computing a parsimony score according to the number of mismatches in motifs' instances and the phylogenetic relationship among the species investigated. When no substitution was allowed in motifs instances, Footprinter yielded a nucleotide coverage (sensitivity) of 61%, and a site coverage of 72%, with motifs predicted only on small fraction of the reference sequence (10% &#8211; specificity .9). With threshold 10, however, WeederH reached the same specificity, but with sensitivity of .67 and site coverage of about 78%, respectively.</p>
            <p>Other than yielding lower accuracy, the parsimony score employed by Footprinter makes a fine grained ranking of motifs more difficult, especially when few (two or three) homologous sequences are examined. Allowing one or two substitutions when searching for conserved motifs (eightmers with two substitutions are exactly the same parameters employed by WeederH) improved the results up to 99% when allowing two substitutions, but also increased significantly the percentage of the reference sequence covered by a motif: 99% with two substitutions (as in WeederH with no score threshold employed, data not shown) and 72% with one substitution. Figure <figr fid="F2">2</figr> shows the ROC curve obtained by WeederH at different &#967;<sup>2 </sup>values, plotting sensitivity and site coverage vs. (1-specificity).</p>
            <fig id="F2">
               <title>
                  <p>Figure 2</p>
               </title>
               <caption>
                  <p>Performance of WeederH on the simulated promoter set</p>
               </caption>
               <text>
                  <p><b>Performance of WeederH on the simulated promoter set</b>. Performance of WeederH at different &#967;<sup>2 </sup>score thresholds on 254 simulated promoter sequence sets. The plot shows the ROC curve (sensitivity vs. 1-specificity), blue line, and the site coverage Sc versus 1-specificity, red line. Blue and red boxes indicate the performance of Footprinter (sensitivity and Sc).</p>
               </text>
               <graphic file="1471-2105-8-46-2"/>
            </fig>
         </sec>
         <sec>
            <st>
               <p>Results on homologous human-rodent promoters</p>
            </st>
            <p>As a further test, we applied the algorithm to the original datasets retrieved from the ABS database (sequences are available as Additional file <supplr sid="S1">1</supplr>, and the location of the conserved TFBSs as Additional file <supplr sid="S2">2</supplr>), composed of 90 sets of human-rodent homologous promoters and 9 mouse-rat sequence sets. The results, at different &#967;<sup>2 </sup>values are summarized in Table <tblr tid="T2">2</tblr> and Figure <figr fid="F3">3</figr> (see Additional file <supplr sid="S3">3</supplr> for the full WeederH output, and Additional file <supplr sid="S4">4</supplr> for a more detailed performance analysis).</p>
            <suppl id="S1">
               <title>
                  <p>Additional file 1</p>
               </title>
               <text>
                  <p>Orthologous sequence sets. Orthologous sequence sets taken from the ABS database used for testing the algorithm.</p>
               </text>
               <file name="1471-2105-8-46-S1.gz">
                  <p>Click here for file</p>
               </file>
            </suppl>
            <suppl id="S2">
               <title>
                  <p>Additional file 2</p>
               </title>
               <text>
                  <p>Motif solutions. Motifs annotated in the ABS database in the test sequences.</p>
               </text>
               <file name="1471-2105-8-46-S2.txt">
                  <p>Click here for file</p>
               </file>
            </suppl>
            <tbl id="T2">
               <title>
                  <p>Table 2</p>
               </title>
               <caption>
                  <p>Performance of WeederH on the ABS promoter set</p>
               </caption>
               <tblbdy cols="5">
                  <r>
                     <c ca="center">
                        <p>
                           <b>&#967;<sup><b>2 </b></sup>Score Threshold</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>%pred</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>Nc</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>Sc</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>Sp</b>
                        </p>
                     </c>
                  </r>
                  <r>
                     <c cspan="5">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>
                           <b>None</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>98.37</p>
                     </c>
                     <c ca="center">
                        <p>99.70</p>
                     </c>
                     <c ca="center">
                        <p>100.0</p>
                     </c>
                     <c ca="center">
                        <p>0.02</p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>
                           <b>0</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>78.90</p>
                     </c>
                     <c ca="center">
                        <p>95.77</p>
                     </c>
                     <c ca="center">
                        <p>98.68</p>
                     </c>
                     <c ca="center">
                        <p>0.23</p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>
                           <b>1</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>51.04</p>
                     </c>
                     <c ca="center">
                        <p>83.91</p>
                     </c>
                     <c ca="center">
                        <p>91.72</p>
                     </c>
                     <c ca="center">
                        <p>0.52</p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>
                           <b>1.5</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>45.79</p>
                     </c>
                     <c ca="center">
                        <p>81.42</p>
                     </c>
                     <c ca="center">
                        <p>89.07</p>
                     </c>
                     <c ca="center">
                        <p>0.57</p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>
                           <b>2</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>41.93</p>
                     </c>
                     <c ca="center">
                        <p>78.26</p>
                     </c>
                     <c ca="center">
                        <p>86.75</p>
                     </c>
                     <c ca="center">
                        <p>0.61</p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>
                           <b>2.5</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>39.53</p>
                     </c>
                     <c ca="center">
                        <p>76.35</p>
                     </c>
                     <c ca="center">
                        <p>85.10</p>
                     </c>
                     <c ca="center">
                        <p>0.64</p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>
                           <b>3</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>37.55</p>
                     </c>
                     <c ca="center">
                        <p>75.24</p>
                     </c>
                     <c ca="center">
                        <p>84.11</p>
                     </c>
                     <c ca="center">
                        <p>0.66</p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>
                           <b>4</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>34.55</p>
                     </c>
                     <c ca="center">
                        <p>73.96</p>
                     </c>
                     <c ca="center">
                        <p>82.45</p>
                     </c>
                     <c ca="center">
                        <p>0.69</p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>
                           <b>4.5</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>33.26</p>
                     </c>
                     <c ca="center">
                        <p>72.42</p>
                     </c>
                     <c ca="center">
                        <p>80.79</p>
                     </c>
                     <c ca="center">
                        <p>0.70</p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>
                           <b>5</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>32.30</p>
                     </c>
                     <c ca="center">
                        <p>71.56</p>
                     </c>
                     <c ca="center">
                        <p>79.14</p>
                     </c>
                     <c ca="center">
                        <p>0.71</p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>
                           <b>5.5</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>31.44</p>
                     </c>
                     <c ca="center">
                        <p>70.77</p>
                     </c>
                     <c ca="center">
                        <p>78.48</p>
                     </c>
                     <c ca="center">
                        <p>0.72</p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>
                           <b>6</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>30.58</p>
                     </c>
                     <c ca="center">
                        <p>68.96</p>
                     </c>
                     <c ca="center">
                        <p>77.48</p>
                     </c>
                     <c ca="center">
                        <p>0.73</p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>
                           <b>8</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>27.51</p>
                     </c>
                     <c ca="center">
                        <p>64.81</p>
                     </c>
                     <c ca="center">
                        <p>72.18</p>
                     </c>
                     <c ca="center">
                        <p>0.75</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>
                           <b>FP</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>37.18</p>
                     </c>
                     <c ca="center">
                        <p>70.32</p>
                     </c>
                     <c ca="center">
                        <p>79.47</p>
                     </c>
                     <c ca="center">
                        <p>0.65</p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>
                           <b>FPSIG</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>31.55</p>
                     </c>
                     <c ca="center">
                        <p>62.61</p>
                     </c>
                     <c ca="center">
                        <p>70.19</p>
                     </c>
                     <c ca="center">
                        <p>0.71</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>
                           <b>phastCons</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>25.97</p>
                     </c>
                     <c ca="center">
                        <p>50.02</p>
                     </c>
                     <c ca="center">
                        <p>55.51</p>
                     </c>
                     <c ca="center">
                        <p>0.73</p>
                     </c>
                  </r>
               </tblbdy>
               <tblfn>
                  <p>Performance of WeederH at different &#967;<sup>2 </sup>score thresholds on the 99 orthologous promoter sets retrieved from the ABS database. %pred indicates the percentage of nucleotides of the reference sequence belonging to a conserved motif output by the algorithm. Nc indicates the nucleotide coverage, the percentage of nucleotides in annotated sites matched by a predicted motif; Sc indicates the site coverage, the number of annotated sites matched (at least by 75% of their length) by a motif predicted by WeederH; Sp is specificity. The next two rows report the performance of the Footprinter algorithm on the same set, using parsimony score alone (FP), and with the introduction of significance estimation (FPSIG). Finally, the bottom row shows the same data for the phastCons most conserved tracks available at the UCSC genome browser (on genomes: human and mouse, March 2006; rat, November 2004).</p>
               </tblfn>
            </tbl>
            <fig id="F3">
               <title>
                  <p>Figure 3</p>
               </title>
               <caption>
                  <p>Performance of WeederH on the ABS promoter set</p>
               </caption>
               <text>
                  <p><b>Performance of WeederH on the ABS promoter set</b>. Performance of WeederH at different &#967;<sup>2 </sup>score thresholds on the 99 promoter sequence sets retrieved from the ABS database. The plot shows the ROC curve (sensitivity vs. 1-specificity), blue line, and the site coverage Sc versus 1-specificity, red line. Blue and red boxes indicate the performance of Footprinter (sensitivity and Sc), blue and red triangles Footprinter with significance measure, blue and red diamond the results of phastCons most conserved regions.</p>
               </text>
               <graphic file="1471-2105-8-46-3"/>
            </fig>
            <suppl id="S3">
               <title>
                  <p>Additional file 3</p>
               </title>
               <text>
                  <p>Output of WeederH on orthologous sequence sets. Full output of WeederH on the orthologous sequence sets taken from the ABS database used for testing the algorithm.</p>
               </text>
               <file name="1471-2105-8-46-S3.gz">
                  <p>Click here for file</p>
               </file>
            </suppl>
            <suppl id="S4">
               <title>
                  <p>Additional file 4</p>
               </title>
               <text>
                  <p>WeederH performance on ortholgous promoters. detailed results of WeederH applied to the ABS database data set, split in the different sequence sets and sites.</p>
               </text>
               <file name="1471-2105-8-46-S4.xls">
                  <p>Click here for file</p>
               </file>
            </suppl>
            <p>The first piece of information that can be gathered is that virtually all the reference sequence is covered by a motif satisfying the substitution threshold, as shown by the first row of the table that reports the motifs found regardless of the &#967;<sup>2 </sup>score threshold. But, the annotated sites in the sequences have indeed a positive &#967;<sup>2 </sup>score, since using a threshold of 0 yields coverage of about 99% at the site level and 96% at the nucleotide level. In this case, however, the motifs output still cover about 79% of the reference sequence. Increasing the &#967;<sup>2 </sup>score threshold to values &#8805; 2 has the effect of lowering the fraction of the reference sequence covered by a conserved motif to 30&#8211;40%, while the site coverage falls less sharply, with about 80&#8211;85% of the annotated sites correctly predicted. It should be noticed that in this case higher performances are obtained at lower threshold values, nevertheless with lower specificity values. This is due to the fact that the sequence sets analyzed are in general more conserved than the artificial sets we generated (around 60&#8211;65% of identity), and thus motifs stand out less well with respect to the rest of the sequences. A single sequence can also contain more than one annotated site. Moreover, rather than be spread randomly along the sequences as in the previous case, now it is more likely to find conserved blocks within the sequences, thus yielding a higher number of oligos satisfying the substitution thresholds employed by the algorithm.</p>
            <p>The number of sites correctly predicted by the algorithm is anyway again quite satisfactory, but in this case assessing how many false positives are reported is far from being straightforward. First of all, one should have an estimate of how many functional sites should be expected inside the 500 bp promoter of a mammalian gene. Then, according to the species included in the analysis, how many sites should be expected to be conserved, a number that for human-mouse comparisons has been estimated to be ranging from 60 to 72% <abbrgrp><abbr bid="B37">37</abbr><abbr bid="B42">42</abbr><abbr bid="B43">43</abbr><abbr bid="B44">44</abbr></abbrgrp>. In other words, the region analyzed might also contain other functional sites not conserved in the other species. Finally, the conservation of functional sites often spans a region longer than the sites themselves, increasing the number of predicted nucleotides.</p>
            <p>We then ran Footprinter also on this sequence set. Again, the best performance (79% at the site level, 70% at the nucleotide level) was obtained by using a parsimony score threshold of 0 (no substitution allowed in motifs' occurrences), with motifs covering more than 37% of the reference sequence. With &#967;<sup>2 </sup>score threshold of 3.0 WeederH motifs covered the same percentage of the reference sequence, with 84% of success at the site level and 76% at the nucleotide level (see Table <tblr tid="T1">1</tblr>). Thus, with the same specificity, WeederH yields a higher sensitivity. Introducing for Footprinter significance evaluation of motifs (using the different significance settings available and trying different combinations) did not improve the results, increasing the specificity but also reducing the percentage of correctly identified sites to 70%. The same specificity was reached by WeederH with &#967;<sup>2 </sup>score threshold of 5.0, but with 79% of the annotated sites correctly predicted.</p>
            <p>We also compared our results to the phastCons annotations <abbrgrp><abbr bid="B28">28</abbr><abbr bid="B45">45</abbr></abbrgrp> available at the UCSC genome browser <abbrgrp><abbr bid="B13">13</abbr></abbrgrp>. The overall percentage of the reference sequences used in this test annotated as "most conserved" is around 25%, and about 55% of the sites annotated in the database are covered (50% at the nucleotide level). Although the tracks are generated by comparing human to all the vertebrate genomic sequences available, instead of rodents alone (and vice versa), this result nevertheless highlights the fact that methods like phastCons are better suited to identify large conserved regions rather than single sites. Examples are shown in Figures <figr fid="F4">4</figr> and <figr fid="F5">5</figr>, with the results of WeederH compared to the conserved regions predicted by phastCons and RP <abbrgrp><abbr bid="B27">27</abbr></abbrgrp> available at the UCSC genome browser. Indeed, these examples show typical situations in which the two methods either do not detect any conserved element in the core promoter because single TFBSs are too small to reach a significant level of conservation (Figure <figr fid="F4">4</figr>), or can single out only large conserved regions (Figure <figr fid="F5">5</figr>, as also in the p53 example, see Figure <figr fid="F1">1</figr>). This is also a drawback deriving from the usage of a single, global threshold of significance. These examples show how WeederH can work at a much more fine grained level of detail, actually being able to identify correctly conserved TFBSs either with little or, vice versa, a very high level of overall sequence conservation.</p>
            <fig id="F4">
               <title>
                  <p>Figure 4</p>
               </title>
               <caption>
                  <p>WeederH output on the 500 bp promoter of the ADH1B gene (as defined in the ABS database)</p>
               </caption>
               <text>
                  <p><b>WeederH output on the 500 bp promoter of the ADH1B gene (as defined in the ABS database)</b>. Top three motifs output by WeederH (top track), ABS annotated sites (in green), sequence of the ABS database (obtained by BLAT search) and predictions according to RP score [27], light blue) and phastCons ([45], bottom track). Notice how neither of the latter two methods reports anything significantly conserved in the core promoter itself, where motifs are located.</p>
               </text>
               <graphic file="1471-2105-8-46-4"/>
            </fig>
            <fig id="F5">
               <title>
                  <p>Figure 5</p>
               </title>
               <caption>
                  <p>WeederH output on the promoter and 5'UTR of the MYB gene (as defined in the ABS database)</p>
               </caption>
               <text>
                  <p><b>WeederH output on the promoter and 5'UTR of the MYB gene (as defined in the ABS database)</b>. Highest scoring motif output by WeederH (top track), ABS annotated site (in green), sequence of the ABS database (S66422 &#8211; aligned with BLAT), and conserved regions predicted by RP score [27], light blue) and phastCons ([45], bottom track). Notice how both the latter methods predict conserved long regions spanning most of the promoter itself, making difficult the identification of single conserved TFBSs.</p>
               </text>
               <graphic file="1471-2105-8-46-5"/>
            </fig>
            <p>Given the difference in performance on the same threshold values for the artificial and real datasets, it might seem that the problem of defining a significance threshold remains, only recast at a relative level, that is, depending on the overall degree of conservation of the sequences investigated. However, useful information can be gathered by examining the ranking, in the lists output by the algorithm, of the motifs matching a planted site. The overall distribution is shown in Figure <figr fid="F6">6</figr>. In artificial sequences, more than 60% of the planted sites matched a top-scoring motif, and nearly 80% one of the first three. Also in the real promoter case, motifs corresponding to a functional site tend to appear within the first five positions (blue bars in the histogram of Figure <figr fid="F6">6</figr>), thus providing further evidence to the fact that indeed the measure used by the algorithm highlights conserved real TFBSs, that are as we stated in the introduction, "more conserved than the rest". One third of the 302 sites are matched by the highest ranking motif, 17% by a second-ranking motif and 13% by the third one. 75% of the annotated sites are matched by a motif ranking among the first five, 90% among the first ten. The percentage is increased if we remove from the ranking those higher-scoring motifs that match an annotated site, in other words, if we count in the ranking of a motif only those preceding it that do not match an annotated site (hence "putative false positives"). In this case, almost half of the sites are matched by the highest ranking motif, or by a motif that is preceded by other motifs matching a solution (red bars in Figure <figr fid="F6">6</figr>). Vice versa, in 62 sequence sets out of 99 the highest scoring motif matched an annotated site (or more than one, since the algorithm can report regions longer than a single site, that hence can contain more than one site), and in 91 out of 99 one of the first three motifs matched an annotated site. A rapid inspection of the highest scoring motifs not matching a site annotated in the ABS database, however, revealed that in all the cases but one the highest scoring motif matched at least one site annotated in TRANSFAC, or a signal like TATA- or CAAT-boxes that given their constrained position are perfect candidates to be picked by the algorithm.</p>
            <fig id="F6">
               <title>
                  <p>Figure 6</p>
               </title>
               <caption>
                  <p>WeederH ranking of the sites annotated in the simulated and ABS promoter sets</p>
               </caption>
               <text>
                  <p><b>WeederH ranking of the sites annotated in the simulated and ABS promoter sets</b>. Ranking of the motifs output by WeederH matching a planted site in the simulated promoter set (green bars) and the ABS promoter set (red and blue bars, see text for explanation).</p>
               </text>
               <graphic file="1471-2105-8-46-6"/>
            </fig>
            <p>The scoring function employed by the algorithm seems thus to be reliable, and scanning the output list from top to bottom is very likely to produce satisfactory results. An interesting result is also the fact that in the case of mouse-rat comparisons, where the sequences presents a much higher degree of similarity, the percentage of annotated sites correctly identified remains fixed at 90% even at high score thresholds, showing that true sites are higher scoring also in the presence of a higher level of sequence conservation.</p>
         </sec>
         <sec>
            <st>
               <p>Finding conserved distal motifs and regions</p>
            </st>
            <p>As we have shown in the previous section, the scoring function employed by the algorithm can successfully discriminate motifs and regions corresponding to true functional sites. The dataset we used for the test, however, was composed of sequences carefully selected, in other words, truly orthologous promoters. In this way, the positional conservation term in the scoring function quite naturally yields the best results. Very often, however, the selection of the input sequences is far from being straightforward: even in largely annotated genomes like human and mouse, genes present several different transcription start sites, alternative promoters, and so on, making difficult the choice of which "upstream" region has to be considered. Moreover, in other recently sequenced genomes like dog, no transcripts are often available, and genes are annotated only starting from the ATG codon <abbrgrp><abbr bid="B32">32</abbr></abbrgrp>. To assess whether the scoring scheme employed is efficient also in the case of longer sequences less carefully selected we performed further tests.</p>
            <p>The Actin cardiac alpha chain gene (<it>ACTC</it>) has the ATG codon located within the second exon, with a fully non-coding first exon. WeederH successfully identified all the 7 sites contained in the 500 bp promoter region retrieved from the ABS database (ABS 3 in Additional file <supplr sid="S1">1</supplr>). We repeated the experiment, but this time retrieving the 10,000 base pairs upstream of the ATG codon of the mouse and human genes. The results are shown in Figure <figr fid="F7">7</figr>, displayed within a UCSC genome browser window. The topmost track (WeederH motifs) shows the location of the highest scoring motifs. It can be seen that they are clustered around the TSS of the gene, falling within the 500 bp promoter of the ABS database (indicated by the "Your sequence from BLAT search" track). Motifs shown in this area cover all the ABS annotated sites. Also interestingly enough, other clusters of motifs are visible, namely at around -2000, -6000, and -8000 from the TSS. As a matter of fact, three distal enhancers are annotated for the <it>ACTC </it>gene, driving developmental and cardiac-muscle specific expression of the gene <abbrgrp><abbr bid="B46">46</abbr></abbrgrp>.</p>
            <fig id="F7">
               <title>
                  <p>Figure 7</p>
               </title>
               <caption>
                  <p>WeederH identifies the promoter and the three annotated enhancers upstream of the mouse actin, alpha cardiac gene</p>
               </caption>
               <text>
                  <p><b>WeederH identifies the promoter and the three annotated enhancers upstream of the mouse actin, alpha cardiac gene</b>. Highest scoring motifs predicted by WeederH in the 10,000 bps region upstream of the ATG codon of the mouse actin alpha cardiac gene, displayed within the UCSC genome browser. Track "Weeder H motifs" shows the location of the motifs; the track "Weeder-H" 500 shows 500 bps regions in which the average 12-mer &#967;<sup>2 </sup>score is greater than 1. Track "Your sequence from Blat Search" shows the location of the original promoter retrieved from the ABS database. The three regions, other than the just upstream of the TSS (the promoter), match three experimentally known enhancers of the gene.</p>
               </text>
               <graphic file="1471-2105-8-46-7"/>
            </fig>
            <p>To ease the identification of clusters of conserved motifs, we added to the basic algorithm the computation of the average motif &#967;<sup>2 </sup>score (for 12-mers) in regions 500 bp long. Figure <figr fid="F7">7</figr> shows the regions with average 12-mer &#967;<sup>2 </sup>score greater than 1, matching the experimentally annotated enhancers.</p>
            <p>Another example is the Actin, skeletal muscle gene (<it>ACTA1</it>, ABS 4 in Additional file <supplr sid="S1">1</supplr>). In this case, we retrieved for human, mouse, and rat the whole intergenic region (of about 7000 bps) upstream of the gene. In this case, two regions were selected as densely populated of significant motifs (see Figure <figr fid="F8">8</figr>): the core promoter, again, and another region at around -1,500 matching an experimentally validated enhancer <abbrgrp><abbr bid="B47">47</abbr></abbrgrp>.</p>
            <fig id="F8">
               <title>
                  <p>Figure 8</p>
               </title>
               <caption>
                  <p>WeederH identifies the promoter and the annotated enhancer upstream of the human skeletal actin gene</p>
               </caption>
               <text>
                  <p><b>WeederH identifies the promoter and the annotated enhancer upstream of the human skeletal actin gene</b>. Highest scoring motifs predicted by WeederH in the intergenic region upstream of the ATG codon of the human skeletal actin gene, displayed within the UCSC genome browser. Track "Weeder H motifs" shows the location of the motifs; the track "Weeder-H" 500 shows 500 bps regions in which the average 12-mer &#967;<sup>2 </sup>score is higher than 1. The two regions selected, are the promoter and an annotated enhancer located at about 1500 bps upstream of the TSS [47]. Track "Your sequence from Blat Search" shows the location of the original promoter retrieved from the ABS database.</p>
               </text>
               <graphic file="1471-2105-8-46-8"/>
            </fig>
            <p>These examples, as well as other tests we performed, show how the conservation principle the algorithm is based on can work also on input sequences whose size spans well outside the typical length of a core promoter, where positional conservation is looser. While not explicitly devised for the identification of distal enhancers, WeederH nevertheless can be applied to cases where the exact location of the TSS of a gene in different species is not available, or in general to identification of conserved motifs located at several hundreds of base pairs of distance from the reference point selected, on which other methods that do not compute global sequence alignments cannot be applied.</p>
         </sec>
      </sec>
      <sec>
         <st>
            <p>Conclusion</p>
         </st>
         <p>The ever increasing availability of annotated genomic sequences, as well as the observation that several non coding regulatory sequence elements are highly conserved throughout different species, have made phylogenetic footprinting one of the most widely used approaches to the identification of sequence cis-acting elements regulating gene expression. The algorithm we presented in this work was aimed at overcoming some of the drawbacks of the methods currently used, namely, the need of reliable genomic alignments and/or descriptors for the binding specificity of transcription factors. The introduction of a relative scoring strategy, moreover, bypasses the problem of defining global significance thresholds. The algorithm is also quite efficient, requiring less than one minute for a typical promoter analysis and a few minutes for sequences a few kbp long. The results we obtained from the tests we performed show that the algorithm can reliably predict conserved TFBSs in homologous promoters, with better performance over existing methods and annotations, but also can identify conserved sites and regions in longer sequences. Clearly, other methods are more suited for the discovery of distal regions and enhancers, that can be located at several thousands or millions of base pairs from the gene they regulate; nevertheless, WeederH can provide significantly more information than traditional motif-finding algorithms.</p>
      </sec>
      <sec>
         <st>
            <p>Methods</p>
         </st>
         <sec>
            <st>
               <p>Computing motifs expected frequency values</p>
            </st>
            <p>The scoring function employed by WeederH is based on the comparison of the observed oligo frequencies with expected values. The term <it>E(s,d,k)</it>, indicating the expected frequency of a given oligo s with <it>d </it>substitutions in the species of origin of sequence <it>H</it><sub><it>k </it></sub>is computing according to the observed frequency of oligos within d substitutions from s in intergenic regions of the species sequence <it>H</it><sub><it>k </it></sub>is taken from:</p>
            <p>
               <m:math name="1471-2105-8-46-i7" xmlns:m="http://www.w3.org/1998/Math/MathML">
                  <m:semantics>
                     <m:mrow>
                        <m:mi>E</m:mi>
                        <m:mo stretchy="false">(</m:mo>
                        <m:mi>s</m:mi>
                        <m:mo>,</m:mo>
                        <m:mi>d</m:mi>
                        <m:mo>,</m:mo>
                        <m:mi>k</m:mi>
                        <m:mo stretchy="false">)</m:mo>
                        <m:mo>=</m:mo>
                        <m:mstyle displaystyle="true">
                           <m:munder>
                              <m:mo>&#8721;</m:mo>
                              <m:mrow>
                                 <m:msup>
                                    <m:mi>s</m:mi>
                                    <m:mo>&#8242;</m:mo>
                                 </m:msup>
                                 <m:mo>&#8712;</m:mo>
                                 <m:mi>N</m:mi>
                                 <m:mo stretchy="false">(</m:mo>
                                 <m:mi>s</m:mi>
                                 <m:mo>,</m:mo>
                                 <m:mi>d</m:mi>
                                 <m:mo stretchy="false">)</m:mo>
                              </m:mrow>
                           </m:munder>
                           <m:mrow>
                              <m:mi>E</m:mi>
                              <m:mo stretchy="false">(</m:mo>
                              <m:msup>
                                 <m:mi>s</m:mi>
                                 <m:mo>&#8242;</m:mo>
                              </m:msup>
                              <m:mo>,</m:mo>
                              <m:mi>k</m:mi>
                              <m:mo stretchy="false">)</m:mo>
                           </m:mrow>
                        </m:mstyle>
                     </m:mrow>
                     <m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacqWGfbqrcqGGOaakcqWGZbWCcqGGSaalcqWGKbazcqGGSaalcqWGRbWAcqGGPaqkcqGH9aqpdaaeqbqaaiabdweafjabcIcaOiqbdohaZzaafaGaeiilaWIaem4AaSMaeiykaKcaleaacuWGZbWCgaqbaiabgIGiolabd6eaojabcIcaOiabdohaZjabcYcaSiabdsgaKjabcMcaPaqab0GaeyyeIuoaaaa@486E@</m:annotation>
                  </m:semantics>
               </m:math>
            </p>
            <p>where <it>N(s,d) </it>is the set of oligos differing from s in no more than <it>d </it>positions. Frequency values for eightmers (<it>E(s,k)</it>) were retrieved from the RSAT Tools database <abbrgrp><abbr bid="B48">48</abbr></abbrgrp>, while the expected frequency of longer oligos is computed starting from the eightmer frequencies with a seventh-order Markov model.</p>
            <p>For oligos longer than 8 nts, we modeled the expected frequency by using a seventh order Markov chain. In other words, let <it>p = p</it><sub>1 </sub>.... <it>p</it><sub><it>n </it></sub>be an <it>n</it>-mer, with <it>n </it>greater than 8:</p>
            <p>
               <m:math name="1471-2105-8-46-i8" xmlns:m="http://www.w3.org/1998/Math/MathML">
                  <m:semantics>
                     <m:mrow>
                        <m:mi>E</m:mi>
                        <m:mi>x</m:mi>
                        <m:mi>p</m:mi>
                        <m:mo stretchy="false">(</m:mo>
                        <m:mi>p</m:mi>
                        <m:mo stretchy="false">)</m:mo>
                        <m:mo>=</m:mo>
                        <m:mi>E</m:mi>
                        <m:mi>x</m:mi>
                        <m:mi>p</m:mi>
                        <m:mo stretchy="false">(</m:mo>
                        <m:msub>
                           <m:mi>p</m:mi>
                           <m:mn>1</m:mn>
                        </m:msub>
                        <m:mn>...</m:mn>
                        <m:msub>
                           <m:mi>p</m:mi>
                           <m:mn>8</m:mn>
                        </m:msub>
                        <m:mo stretchy="false">)</m:mo>
                        <m:mstyle displaystyle="true">
                           <m:munderover>
                              <m:mo>&#8719;</m:mo>
                              <m:mrow>
                                 <m:mi>i</m:mi>
                                 <m:mo>=</m:mo>
                                 <m:mn>9</m:mn>
                              </m:mrow>
                              <m:mi>n</m:mi>
                           </m:munderover>
                           <m:mrow>
                              <m:mi>P</m:mi>
                              <m:mo stretchy="false">(</m:mo>
                              <m:msub>
                                 <m:mi>p</m:mi>
                                 <m:mi>i</m:mi>
                              </m:msub>
                              <m:mo>|</m:mo>
                              <m:msub>
                                 <m:mi>p</m:mi>
                                 <m:mrow>
                                    <m:mi>i</m:mi>
                                    <m:mo>&#8722;</m:mo>
                                    <m:mn>7</m:mn>
                                 </m:mrow>
                              </m:msub>
                              <m:mn>...</m:mn>
                              <m:msub>
                                 <m:mi>p</m:mi>
                                 <m:mrow>
                                    <m:mi>i</m:mi>
                                    <m:mo>&#8722;</m:mo>
                                    <m:mn>1</m:mn>
                                 </m:mrow>
                              </m:msub>
                              <m:mo stretchy="false">)</m:mo>
                           </m:mrow>
                        </m:mstyle>
                     </m:mrow>
                     <m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacqWGfbqrcqWG4baEcqWGWbaCcqGGOaakcqWGWbaCcqGGPaqkcqGH9aqpcqWGfbqrcqWG4baEcqWGWbaCcqGGOaakcqWGWbaCdaWgaaWcbaGaeGymaedabeaakiabc6caUiabc6caUiabc6caUiabdchaWnaaBaaaleaacqaI4aaoaeqaaOGaeiykaKYaaebCaeaacqWGqbaucqGGOaakcqWGWbaCdaWgaaWcbaGaemyAaKgabeaakiabcYha8jabdchaWnaaBaaaleaacqWGPbqAcqGHsislcqaI3aWnaeqaaOGaeiOla4IaeiOla4IaeiOla4IaemiCaa3aaSbaaSqaaiabdMgaPjabgkHiTiabigdaXaqabaGccqGGPaqkaSqaaiabdMgaPjabg2da9iabiMda5aqaaiabd6gaUbqdcqGHpis1aaaa@5CF3@</m:annotation>
                  </m:semantics>
               </m:math>
            </p>
            <p>where <it>P </it>(<it>p</it><sub><it>i </it></sub>| <it>p</it><sub><it>i-7 </it></sub>... <it>p</it><sub><it>i-1</it></sub>) is the conditional probability of having nucleotide <it>p</it><sub><it>i </it></sub>preceded by nucleotides <it>p</it><sub><it>i-7 </it></sub>... <it>p</it><sub><it>i-1</it></sub>, computed by using the expected frequencies of 8-mers:</p>
            <p>
               <m:math name="1471-2105-8-46-i9" xmlns:m="http://www.w3.org/1998/Math/MathML">
                  <m:semantics>
                     <m:mrow>
                        <m:mi>P</m:mi>
                        <m:mo stretchy="false">(</m:mo>
                        <m:msub>
                           <m:mi>p</m:mi>
                           <m:mi>i</m:mi>
                        </m:msub>
                        <m:mo>|</m:mo>
                        <m:msub>
                           <m:mi>p</m:mi>
                           <m:mrow>
                              <m:mi>i</m:mi>
                              <m:mo>&#8722;</m:mo>
                              <m:mn>1</m:mn>
                           </m:mrow>
                        </m:msub>
                        <m:mn>...</m:mn>
                        <m:msub>
                           <m:mi>p</m:mi>
                           <m:mrow>
                              <m:mi>i</m:mi>
                              <m:mo>&#8722;</m:mo>
                              <m:mn>7</m:mn>
                           </m:mrow>
                        </m:msub>
                        <m:mo stretchy="false">)</m:mo>
                        <m:mo>=</m:mo>
                        <m:mfrac>
                           <m:mrow>
                              <m:mi>E</m:mi>
                              <m:mi>x</m:mi>
                              <m:mi>p</m:mi>
                              <m:mo stretchy="false">(</m:mo>
                              <m:msub>
                                 <m:mi>p</m:mi>
                                 <m:mrow>
                                    <m:mi>i</m:mi>
                                    <m:mo>&#8722;</m:mo>
                                    <m:mn>7</m:mn>
                                 </m:mrow>
                              </m:msub>
                              <m:msub>
                                 <m:mi>p</m:mi>
                                 <m:mrow>
                                    <m:mi>i</m:mi>
                                    <m:mo>&#8722;</m:mo>
                                    <m:mn>6</m:mn>
                                 </m:mrow>
                              </m:msub>
                              <m:mn>...</m:mn>
                              <m:msub>
                                 <m:mi>p</m:mi>
                                 <m:mrow>
                                    <m:mi>i</m:mi>
                                    <m:mo>&#8722;</m:mo>
                                    <m:mn>1</m:mn>
                                 </m:mrow>
                              </m:msub>
                              <m:msub>
                                 <m:mi>p</m:mi>
                                 <m:mi>i</m:mi>
                              </m:msub>
                              <m:mo stretchy="false">)</m:mo>
                           </m:mrow>
                           <m:mrow>
                              <m:mstyle displaystyle="true">
                                 <m:munder>
                                    <m:mo>&#8721;</m:mo>
                                    <m:mrow>
                                       <m:mi>n</m:mi>
                                       <m:mo>&#8712;</m:mo>
                                       <m:mo>{</m:mo>
                                       <m:mi>A</m:mi>
                                       <m:mo>,</m:mo>
                                       <m:mi>C</m:mi>
                                       <m:mo>,</m:mo>
                                       <m:mi>G</m:mi>
                                       <m:mo>,</m:mo>
                                       <m:mi>T</m:mi>
                                       <m:mo>}</m:mo>
                                    </m:mrow>
                                 </m:munder>
                                 <m:mrow>
                                    <m:mi>E</m:mi>
                                    <m:mi>x</m:mi>
                                    <m:mi>p</m:mi>
                                    <m:mo stretchy="false">(</m:mo>
                                    <m:msub>
                                       <m:mi>p</m:mi>
                                       <m:mrow>
                                          <m:mi>i</m:mi>
                                          <m:mo>&#8722;</m:mo>
                                          <m:mn>7</m:mn>
                                       </m:mrow>
                                    </m:msub>
                                    <m:msub>
                                       <m:mi>p</m:mi>
                                       <m:mrow>
                                          <m:mi>i</m:mi>
                                          <m:mo>&#8722;</m:mo>
                                          <m:mn>6</m:mn>
                                       </m:mrow>
                                    </m:msub>
                                    <m:mn>...</m:mn>
                                    <m:msub>
                                       <m:mi>p</m:mi>
                                       <m:mrow>
                                          <m:mi>i</m:mi>
                                          <m:mo>&#8722;</m:mo>
                                          <m:mn>1</m:mn>
                                       </m:mrow>
                                    </m:msub>
                                    <m:mi>n</m:mi>
                                    <m:mo stretchy="false">)</m:mo>
                                 </m:mrow>
                              </m:mstyle>
                           </m:mrow>
                        </m:mfrac>
                     </m:mrow>
                     <m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacqWGqbaucqGGOaakcqWGWbaCdaWgaaWcbaGaemyAaKgabeaakiabcYha8jabdchaWnaaBaaaleaacqWGPbqAcqGHsislcqaIXaqmaeqaaOGaeiOla4IaeiOla4IaeiOla4IaemiCaa3aaSbaaSqaaiabdMgaPjabgkHiTiabiEda3aqabaGccqGGPaqkcqGH9aqpdaWcaaqaaiabdweafjabdIha4jabdchaWjabcIcaOiabdchaWnaaBaaaleaacqWGPbqAcqGHsislcqaI3aWnaeqaaOGaemiCaa3aaSbaaSqaaiabdMgaPjabgkHiTiabiAda2aqabaGccqGGUaGlcqGGUaGlcqGGUaGlcqWGWbaCdaWgaaWcbaGaemyAaKMaeyOeI0IaeGymaedabeaakiabdchaWnaaBaaaleaacqWGPbqAaeqaaOGaeiykaKcabaWaaabuaeaacqWGfbqrcqWG4baEcqWGWbaCcqGGOaakcqWGWbaCdaWgaaWcbaGaemyAaKMaeyOeI0IaeG4naCdabeaakiabdchaWnaaBaaaleaacqWGPbqAcqGHsislcqaI2aGnaeqaaOGaeiOla4IaeiOla4IaeiOla4IaemiCaa3aaSbaaSqaaiabdMgaPjabgkHiTiabigdaXaqabaGccqWGUbGBcqGGPaqkaSqaaiabd6gaUjabgIGiolabcUha7jabdgeabjabcYcaSiabdoeadjabcYcaSiabdEeahjabcYcaSiabdsfaujabc2ha9bqab0GaeyyeIuoaaaaaaa@82C6@</m:annotation>
                  </m:semantics>
               </m:math>
            </p>
            <p>The motivation for the choice of a seventh order model is based on the fact that a high order background models (at least third or fourth) have been proven in several experiments to improve significantly the reliability of motif discovery methods (see for example <abbrgrp><abbr bid="B49">49</abbr><abbr bid="B50">50</abbr><abbr bid="B51">51</abbr></abbrgrp>). Moreover, we chose to use directly the <it>n</it>-mer count for computing the <it>Exp(p) </it>values of <it>n</it>-mers, up to the maximum length for which each oligo appeared at least once in the regulatory sequences of the organisms we examined (avoiding the introduction of pseudo-counts to compensate for missing oligo counts). Then, we computed the <it>Exp(p) </it>values of oligos longer than 8 nucleotides starting from the eightmer count values.</p>
         </sec>
      </sec>
      <sec>
         <st>
            <p>Availability and requirements</p>
         </st>
         <p>&#8226; <b>Project name</b>: WeederH</p>
         <p>&#8226; <b>Project home page</b>: Part of the Motif Discovery Tools web server, <url>http://www.beacon.unimi.it/modtools/</url> or <url>http://www.pesolelab.it/modtools/</url>.</p>
         <p>&#8226; <b>Operating systems</b>: web interface that can be accessed from any OS.</p>
         <p>&#8226; <b>Programming language</b>: C/C++, Java (web interface).</p>
         <p>&#8226; <b>Restrictions to use by non-academics</b>: none.</p>
      </sec>
      <sec>
         <st>
            <p>Abbreviations</p>
         </st>
         <p>TF, transcription factor; TFBS, transcription factor binding site; TSS, transcription start site; bp, base pair.</p>
      </sec>
      <sec>
         <st>
            <p>Authors' contributions</p>
         </st>
         <p>GiP came up with the core idea of the algorithm, designed it together with GrP, and finally implemented it; FZ tested extensively the algorithm during its development, and implemented parts of the algorithm itself and of the Web interface. All authors read and approved the final manuscript.</p>
      </sec>
   </bdy>
   <bm>
      <ack>
         <sec>
            <st>
               <p>Acknowledgements</p>
            </st>
            <p>We thank David Horner for the help in the generation of the simulated datasets. This work has been supported by the Italian Ministry of University and Research, under the project FIRB 2003 "Laboratorio Italiano di Bioinformatica", and by EU grant "Transcode".</p>
         </sec>
      </ack>
      <refgrp>
         <bibl id="B1">
            <title>
               <p>Transcription regulation and animal diversity</p>
            </title>
            <aug>
               <au>
                  <snm>Levine</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Tjian</snm>
                  <fnm>R</fnm>
               </au>
            </aug>
            <source>Nature</source>
            <pubdate>2003</pubdate>
            <volume>424</volume>
            <fpage>147</fpage>
            <lpage>151</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/nature01763</pubid>
                  <pubid idtype="pmpid" link="fulltext">12853946</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B2">
            <title>
               <p>In silico representation and discovery of transcription factor binding sites</p>
            </title>
            <aug>
               <au>
                  <snm>Pavesi</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Mauri</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Pesole</snm>
                  <fnm>G</fnm>
               </au>
            </aug>
            <source>Brief Bioinform</source>
            <pubdate>2004</pubdate>
            <volume>5</volume>
            <fpage>217</fpage>
            <lpage>236</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/bib/5.3.217</pubid>
                  <pubid idtype="pmpid" link="fulltext">15383209</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B3">
            <title>
               <p>Computational prediction of transcription-factor binding site locations</p>
            </title>
            <aug>
               <au>
                  <snm>Bulyk</snm>
                  <fnm>ML</fnm>
               </au>
            </aug>
            <source>Genome Biol</source>
            <pubdate>2003</pubdate>
            <volume>5</volume>
            <fpage>201</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">395725</pubid>
                  <pubid idtype="pmpid" link="fulltext">14709165</pubid>
                  <pubid idtype="doi">10.1186/gb-2003-5-1-201</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B4">
            <title>
               <p>Embryonic epsilon and gamma globin genes of a prosimian primate (Galago crassicaudatus). Nucleotide and amino acid sequences, developmental regulation and phylogenetic footprints</p>
            </title>
            <aug>
               <au>
                  <snm>Tagle</snm>
                  <fnm>DA</fnm>
               </au>
               <au>
                  <snm>Koop</snm>
                  <fnm>BF</fnm>
               </au>
               <au>
                  <snm>Goodman</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Slightom</snm>
                  <fnm>JL</fnm>
               </au>
               <au>
                  <snm>Hess</snm>
                  <fnm>DL</fnm>
               </au>
               <au>
                  <snm>Jones</snm>
                  <fnm>RT</fnm>
               </au>
            </aug>
            <source>J Mol Biol</source>
            <pubdate>1988</pubdate>
            <volume>203</volume>
            <fpage>439</fpage>
            <lpage>455</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/0022-2836(88)90011-3</pubid>
                  <pubid idtype="pmpid" link="fulltext">3199442</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B5">
            <title>
               <p>Combining phylogenetic data with co-regulated genes to identify regulatory motifs</p>
            </title>
            <aug>
               <au>
                  <snm>Wang</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Stormo</snm>
                  <fnm>GD</fnm>
               </au>
            </aug>
            <source>Bioinformatics</source>
            <pubdate>2003</pubdate>
            <volume>19</volume>
            <fpage>2369</fpage>
            <lpage>2380</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/bioinformatics/btg329</pubid>
                  <pubid idtype="pmpid" link="fulltext">14668220</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B6">
            <title>
               <p>PhyME: a probabilistic algorithm for finding motifs in sets of orthologous sequences</p>
            </title>
            <aug>
               <au>
                  <snm>Sinha</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Blanchette</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Tompa</snm>
                  <fnm>M</fnm>
               </au>
            </aug>
            <source>BMC Bioinformatics</source>
            <pubdate>2004</pubdate>
            <volume>5</volume>
            <fpage>170</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">534098</pubid>
                  <pubid idtype="pmpid" link="fulltext">15511292</pubid>
                  <pubid idtype="doi">10.1186/1471-2105-5-170</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B7">
            <title>
               <p>Transcriptional regulatory code of a eukaryotic genome</p>
            </title>
            <aug>
               <au>
                  <snm>Harbison</snm>
                  <fnm>CT</fnm>
               </au>
               <au>
                  <snm>Gordon</snm>
                  <fnm>DB</fnm>
               </au>
               <au>
                  <snm>Lee</snm>
                  <fnm>TI</fnm>
               </au>
               <au>
                  <snm>Rinaldi</snm>
                  <fnm>NJ</fnm>
               </au>
               <au>
                  <snm>Macisaac</snm>
                  <fnm>KD</fnm>
               </au>
               <au>
                  <snm>Danford</snm>
                  <fnm>TW</fnm>
               </au>
               <au>
                  <snm>Hannett</snm>
                  <fnm>NM</fnm>
               </au>
               <au>
                  <snm>Tagne</snm>
                  <fnm>JB</fnm>
               </au>
               <au>
                  <snm>Reynolds</snm>
                  <fnm>DB</fnm>
               </au>
               <au>
                  <snm>Yoo</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Jennings</snm>
                  <fnm>EG</fnm>
               </au>
               <au>
                  <snm>Zeitlinger</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Pokholok</snm>
                  <fnm>DK</fnm>
               </au>
               <au>
                  <snm>Kellis</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Rolfe</snm>
                  <fnm>PA</fnm>
               </au>
               <au>
                  <snm>Takusagawa</snm>
                  <fnm>KT</fnm>
               </au>
               <au>
                  <snm>Lander</snm>
                  <fnm>ES</fnm>
               </au>
               <au>
                  <snm>Gifford</snm>
                  <fnm>DK</fnm>
               </au>
               <au>
                  <snm>Fraenkel</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Young</snm>
                  <fnm>RA</fnm>
               </au>
            </aug>
            <source>Nature</source>
            <pubdate>2004</pubdate>
            <volume>431</volume>
            <fpage>99</fpage>
            <lpage>104</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/nature02800</pubid>
                  <pubid idtype="pmpid" link="fulltext">15343339</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B8">
            <title>
               <p>Fast and systematic genome-wide discovery of conserved regulatory elements using a non-alignment based approach</p>
            </title>
            <aug>
               <au>
                  <snm>Elemento</snm>
                  <fnm>O</fnm>
               </au>
               <au>
                  <snm>Tavazoie</snm>
                  <fnm>S</fnm>
               </au>
            </aug>
            <source>Genome Biol</source>
            <pubdate>2005</pubdate>
            <volume>6</volume>
            <fpage>R18</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">551538</pubid>
                  <pubid idtype="pmpid" link="fulltext">15693947</pubid>
                  <pubid idtype="doi">10.1186/gb-2005-6-2-r18</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B9">
            <title>
               <p>Whole-genome discovery of transcription factor binding sites by network-level conservation</p>
            </title>
            <aug>
               <au>
                  <snm>Pritsker</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Liu</snm>
                  <fnm>YC</fnm>
               </au>
               <au>
                  <snm>Beer</snm>
                  <fnm>MA</fnm>
               </au>
               <au>
                  <snm>Tavazoie</snm>
                  <fnm>S</fnm>
               </au>
            </aug>
            <source>Genome Res</source>
            <pubdate>2004</pubdate>
            <volume>14</volume>
            <fpage>99</fpage>
            <lpage>108</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">314286</pubid>
                  <pubid idtype="pmpid" link="fulltext">14672978</pubid>
                  <pubid idtype="doi">10.1101/gr.1739204</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B10">
            <title>
               <p>Systematic discovery of regulatory motifs in human promoters and 3' UTRs by comparison of several mammals</p>
            </title>
            <aug>
               <au>
                  <snm>Xie</snm>
                  <fnm>X</fnm>
               </au>
               <au>
                  <snm>Lu</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Kulbokas</snm>
                  <fnm>EJ</fnm>
               </au>
               <au>
                  <snm>Golub</snm>
                  <fnm>TR</fnm>
               </au>
               <au>
                  <snm>Mootha</snm>
                  <fnm>V</fnm>
               </au>
               <au>
                  <snm>Lindblad-Toh</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Lander</snm>
                  <fnm>ES</fnm>
               </au>
               <au>
                  <snm>Kellis</snm>
                  <fnm>M</fnm>
               </au>
            </aug>
            <source>Nature</source>
            <pubdate>2005</pubdate>
            <volume>434</volume>
            <fpage>338</fpage>
            <lpage>345</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/nature03441</pubid>
                  <pubid idtype="pmpid" link="fulltext">15735639</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B11">
            <title>
               <p>Discovery of regulatory elements in vertebrates through comparative genomics</p>
            </title>
            <aug>
               <au>
                  <snm>Prakash</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Tompa</snm>
                  <fnm>M</fnm>
               </au>
            </aug>
            <source>Nat Biotechnol</source>
            <pubdate>2005</pubdate>
            <volume>23</volume>
            <fpage>1249</fpage>
            <lpage>1256</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/nbt1140</pubid>
                  <pubid idtype="pmpid" link="fulltext">16211068</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B12">
            <title>
               <p>Galaxy: a platform for interactive large-scale genome analysis</p>
            </title>
            <aug>
               <au>
                  <snm>Giardine</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Riemer</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Hardison</snm>
                  <fnm>RC</fnm>
               </au>
               <au>
                  <snm>Burhans</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Elnitski</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Shah</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Zhang</snm>
                  <fnm>Y</fnm>
               </au>
               <au>
                  <snm>Blankenberg</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Albert</snm>
                  <fnm>I</fnm>
               </au>
               <au>
                  <snm>Taylor</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Miller</snm>
                  <fnm>W</fnm>
               </au>
               <au>
                  <snm>Kent</snm>
                  <fnm>WJ</fnm>
               </au>
               <au>
                  <snm>Nekrutenko</snm>
                  <fnm>A</fnm>
               </au>
            </aug>
            <source>Genome Res</source>
            <pubdate>2005</pubdate>
            <volume>15</volume>
            <fpage>1451</fpage>
            <lpage>1455</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1240089</pubid>
                  <pubid idtype="pmpid" link="fulltext">16169926</pubid>
                  <pubid idtype="doi">10.1101/gr.4086505</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B13">
            <title>
               <p>The UCSC Genome Browser Database: update 2006</p>
            </title>
            <aug>
               <au>
                  <snm>Hinrichs</snm>
                  <fnm>AS</fnm>
               </au>
               <au>
                  <snm>Karolchik</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Baertsch</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Barber</snm>
                  <fnm>GP</fnm>
               </au>
               <au>
                  <snm>Bejerano</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Clawson</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Diekhans</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Furey</snm>
                  <fnm>TS</fnm>
               </au>
               <au>
                  <snm>Harte</snm>
                  <fnm>RA</fnm>
               </au>
               <au>
                  <snm>Hsu</snm>
                  <fnm>F</fnm>
               </au>
               <au>
                  <snm>Hillman-Jackson</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Kuhn</snm>
                  <fnm>RM</fnm>
               </au>
               <au>
                  <snm>Pedersen</snm>
                  <fnm>JS</fnm>
               </au>
               <au>
                  <snm>Pohl</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Raney</snm>
                  <fnm>BJ</fnm>
               </au>
               <au>
                  <snm>Rosenbloom</snm>
                  <fnm>KR</fnm>
               </au>
               <au>
                  <snm>Siepel</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Smith</snm>
                  <fnm>KE</fnm>
               </au>
               <au>
                  <snm>Sugnet</snm>
                  <fnm>CW</fnm>
               </au>
               <au>
                  <snm>Sultan-Qurraie</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Thomas</snm>
                  <fnm>DJ</fnm>
               </au>
               <au>
                  <snm>Trumbower</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Weber</snm>
                  <fnm>RJ</fnm>
               </au>
               <au>
                  <snm>Weirauch</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Zweig</snm>
                  <fnm>AS</fnm>
               </au>
               <au>
                  <snm>Haussler</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Kent</snm>
                  <fnm>WJ</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>2006</pubdate>
            <volume>34</volume>
            <fpage>D590</fpage>
            <lpage>8</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1347506</pubid>
                  <pubid idtype="pmpid" link="fulltext">16381938</pubid>
                  <pubid idtype="doi">10.1093/nar/gkj144</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B14">
            <title>
               <p>DNA binding sites: representation and discovery</p>
            </title>
            <aug>
               <au>
                  <snm>Stormo</snm>
                  <fnm>GD</fnm>
               </au>
            </aug>
            <source>Bioinformatics</source>
            <pubdate>2000</pubdate>
            <volume>16</volume>
            <fpage>16</fpage>
            <lpage>23</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/bioinformatics/16.1.16</pubid>
                  <pubid idtype="pmpid" link="fulltext">10812473</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B15">
            <title>
               <p>TRANSFAC and its module TRANSCompel: transcriptional gene regulation in eukaryotes</p>
            </title>
            <aug>
               <au>
                  <snm>Matys</snm>
                  <fnm>V</fnm>
               </au>
               <au>
                  <snm>Kel-Margoulis</snm>
                  <fnm>OV</fnm>
               </au>
               <au>
                  <snm>Fricke</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Liebich</snm>
                  <fnm>I</fnm>
               </au>
               <au>
                  <snm>Land</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Barre-Dirrie</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Reuter</snm>
                  <fnm>I</fnm>
               </au>
               <au>
                  <snm>Chekmenev</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Krull</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Hornischer</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Voss</snm>
                  <fnm>N</fnm>
               </au>
               <au>
                  <snm>Stegmaier</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Lewicki-Potapov</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Saxel</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Kel</snm>
                  <fnm>AE</fnm>
               </au>
               <au>
                  <snm>Wingender</snm>
                  <fnm>E</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>2006</pubdate>
            <volume>34</volume>
            <fpage>D108</fpage>
            <lpage>10</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1347505</pubid>
                  <pubid idtype="pmpid" link="fulltext">16381825</pubid>
                  <pubid idtype="doi">10.1093/nar/gkj143</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B16">
            <title>
               <p>Human-mouse genome comparisons to locate regulatory sites</p>
            </title>
            <aug>
               <au>
                  <snm>Wasserman</snm>
                  <fnm>WW</fnm>
               </au>
               <au>
                  <snm>Palumbo</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Thompson</snm>
                  <fnm>W</fnm>
               </au>
               <au>
                  <snm>Fickett</snm>
                  <fnm>JW</fnm>
               </au>
               <au>
                  <snm>Lawrence</snm>
                  <fnm>CE</fnm>
               </au>
            </aug>
            <source>Nat Genet</source>
            <pubdate>2000</pubdate>
            <volume>26</volume>
            <fpage>225</fpage>
            <lpage>228</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/79965</pubid>
                  <pubid idtype="pmpid" link="fulltext">11017083</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B17">
            <title>
               <p>Genome-wide computational prediction of transcriptional regulatory modules reveals new insights into human gene expression</p>
            </title>
            <aug>
               <au>
                  <snm>Blanchette</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Bataille</snm>
                  <fnm>AR</fnm>
               </au>
               <au>
                  <snm>Chen</snm>
                  <fnm>X</fnm>
               </au>
               <au>
                  <snm>Poitras</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Laganiere</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Lefebvre</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Deblois</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Giguere</snm>
                  <fnm>V</fnm>
               </au>
               <au>
                  <snm>Ferretti</snm>
                  <fnm>V</fnm>
               </au>
               <au>
                  <snm>Bergeron</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Coulombe</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Robert</snm>
                  <fnm>F</fnm>
               </au>
            </aug>
            <source>Genome Res</source>
            <pubdate>2006</pubdate>
            <volume>16</volume>
            <fpage>656</fpage>
            <lpage>668</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1457048</pubid>
                  <pubid idtype="pmpid" link="fulltext">16606704</pubid>
                  <pubid idtype="doi">10.1101/gr.4866006</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B18">
            <title>
               <p>MONKEY: identifying conserved transcription-factor binding sites in multiple alignments using a binding site-specific evolutionary model</p>
            </title>
            <aug>
               <au>
                  <snm>Moses</snm>
                  <fnm>AM</fnm>
               </au>
               <au>
                  <snm>Chiang</snm>
                  <fnm>DY</fnm>
               </au>
               <au>
                  <snm>Pollard</snm>
                  <fnm>DA</fnm>
               </au>
               <au>
                  <snm>Iyer</snm>
                  <fnm>VN</fnm>
               </au>
               <au>
                  <snm>Eisen</snm>
                  <fnm>MB</fnm>
               </au>
            </aug>
            <source>Genome Biol</source>
            <pubdate>2004</pubdate>
            <volume>5</volume>
            <fpage>R98</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">545801</pubid>
                  <pubid idtype="pmpid" link="fulltext">15575972</pubid>
                  <pubid idtype="doi">10.1186/gb-2004-5-12-r98</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B19">
            <title>
               <p>On the power of profiles for transcription factor binding site detection</p>
            </title>
            <aug>
               <au>
                  <snm>Rahmann</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Muller</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Vingron</snm>
                  <fnm>M</fnm>
               </au>
            </aug>
            <source>Stat Appl Genet Mol Biol</source>
            <pubdate>2003</pubdate>
            <volume>2</volume>
            <fpage>Article7</fpage>
            <xrefbib>
               <pubid idtype="pmpid">16646785</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B20">
            <title>
               <p>CONREAL: conserved regulatory elements anchored alignment algorithm for identification of transcription factor binding sites by phylogenetic footprinting</p>
            </title>
            <aug>
               <au>
                  <snm>Berezikov</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Guryev</snm>
                  <fnm>V</fnm>
               </au>
               <au>
                  <snm>Plasterk</snm>
                  <fnm>RH</fnm>
               </au>
               <au>
                  <snm>Cuppen</snm>
                  <fnm>E</fnm>
               </au>
            </aug>
            <source>Genome Res</source>
            <pubdate>2004</pubdate>
            <volume>14</volume>
            <fpage>170</fpage>
            <lpage>178</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">314294</pubid>
                  <pubid idtype="pmpid" link="fulltext">14672977</pubid>
                  <pubid idtype="doi">10.1101/gr.1642804</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B21">
            <title>
               <p>Footer: a quantitative comparative genomics method for efficient recognition of cis-regulatory elements</p>
            </title>
            <aug>
               <au>
                  <snm>Corcoran</snm>
                  <fnm>DL</fnm>
               </au>
               <au>
                  <snm>Feingold</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Dominick</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Wright</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Harnaha</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Trucco</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Giannoukakis</snm>
                  <fnm>N</fnm>
               </au>
               <au>
                  <snm>Benos</snm>
                  <fnm>PV</fnm>
               </au>
            </aug>
            <source>Genome Res</source>
            <pubdate>2005</pubdate>
            <volume>15</volume>
            <fpage>840</fpage>
            <lpage>847</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1142474</pubid>
                  <pubid idtype="pmpid" link="fulltext">15930494</pubid>
                  <pubid idtype="doi">10.1101/gr.2952005</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B22">
            <title>
               <p>Transcriptional regulation of the stem cell leukemia gene (SCL)--comparative analysis of five vertebrate SCL loci</p>
            </title>
            <aug>
               <au>
                  <snm>Gottgens</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Barton</snm>
                  <fnm>LM</fnm>
               </au>
               <au>
                  <snm>Chapman</snm>
                  <fnm>MA</fnm>
               </au>
               <au>
                  <snm>Sinclair</snm>
                  <fnm>AM</fnm>
               </au>
               <au>
                  <snm>Knudsen</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Grafham</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Gilbert</snm>
                  <fnm>JG</fnm>
               </au>
               <au>
                  <snm>Rogers</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Bentley</snm>
                  <fnm>DR</fnm>
               </au>
               <au>
                  <snm>Green</snm>
                  <fnm>AR</fnm>
               </au>
            </aug>
            <source>Genome Res</source>
            <pubdate>2002</pubdate>
            <volume>12</volume>
            <fpage>749</fpage>
            <lpage>759</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">186570</pubid>
                  <pubid idtype="pmpid" link="fulltext">11997341</pubid>
                  <pubid idtype="doi">10.1101/gr.45502</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B23">
            <title>
               <p>Ultraconserved elements in the human genome</p>
            </title>
            <aug>
               <au>
                  <snm>Bejerano</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Pheasant</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Makunin</snm>
                  <fnm>I</fnm>
               </au>
               <au>
                  <snm>Stephen</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Kent</snm>
                  <fnm>WJ</fnm>
               </au>
               <au>
                  <snm>Mattick</snm>
                  <fnm>JS</fnm>
               </au>
               <au>
                  <snm>Haussler</snm>
                  <fnm>D</fnm>
               </au>
            </aug>
            <source>Science</source>
            <pubdate>2004</pubdate>
            <volume>304</volume>
            <fpage>1321</fpage>
            <lpage>1325</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1126/science.1098119</pubid>
                  <pubid idtype="pmpid" link="fulltext">15131266</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B24">
            <title>
               <p>Genomic strategies to identify mammalian regulatory sequences</p>
            </title>
            <aug>
               <au>
                  <snm>Pennacchio</snm>
                  <fnm>LA</fnm>
               </au>
               <au>
                  <snm>Rubin</snm>
                  <fnm>EM</fnm>
               </au>
            </aug>
            <source>Nat Rev Genet</source>
            <pubdate>2001</pubdate>
            <volume>2</volume>
            <fpage>100</fpage>
            <lpage>109</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/35052548</pubid>
                  <pubid idtype="pmpid" link="fulltext">11253049</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B25">
            <title>
               <p>Sequence first. Ask questions later</p>
            </title>
            <aug>
               <au>
                  <snm>Sidow</snm>
                  <fnm>A</fnm>
               </au>
            </aug>
            <source>Cell</source>
            <pubdate>2002</pubdate>
            <volume>111</volume>
            <fpage>13</fpage>
            <lpage>16</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/S0092-8674(02)01003-6</pubid>
                  <pubid idtype="pmpid" link="fulltext">12372296</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B26">
            <title>
               <p>Identification and characterization of multi-species conserved sequences</p>
            </title>
            <aug>
               <au>
                  <snm>Margulies</snm>
                  <fnm>EH</fnm>
               </au>
               <au>
                  <snm>Blanchette</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Haussler</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Green</snm>
                  <fnm>ED</fnm>
               </au>
            </aug>
            <source>Genome Res</source>
            <pubdate>2003</pubdate>
            <volume>13</volume>
            <fpage>2507</fpage>
            <lpage>2518</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">403793</pubid>
                  <pubid idtype="pmpid" link="fulltext">14656959</pubid>
                  <pubid idtype="doi">10.1101/gr.1602203</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B27">
            <title>
               <p>Regulatory potential scores from genome-wide three-way alignments of human, mouse, and rat</p>
            </title>
            <aug>
               <au>
                  <snm>Kolbe</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Taylor</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Elnitski</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Eswara</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Li</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Miller</snm>
                  <fnm>W</fnm>
               </au>
               <au>
                  <snm>Hardison</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Chiaromonte</snm>
                  <fnm>F</fnm>
               </au>
            </aug>
            <source>Genome Res</source>
            <pubdate>2004</pubdate>
            <volume>14</volume>
            <fpage>700</fpage>
            <lpage>707</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">383316</pubid>
                  <pubid idtype="pmpid" link="fulltext">15060013</pubid>
                  <pubid idtype="doi">10.1101/gr.1976004</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B28">
            <title>
               <p>Combining phylogenetic and hidden Markov models in biosequence analysis</p>
            </title>
            <aug>
               <au>
                  <snm>Siepel</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Haussler</snm>
                  <fnm>D</fnm>
               </au>
            </aug>
            <source>J Comput Biol</source>
            <pubdate>2004</pubdate>
            <volume>11</volume>
            <fpage>413</fpage>
            <lpage>428</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1089/1066527041410472</pubid>
                  <pubid idtype="pmpid" link="fulltext">15285899</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B29">
            <title>
               <p>Evaluation of regulatory potential and conservation scores for detecting cis-regulatory modules in aligned mammalian genome sequences</p>
            </title>
            <aug>
               <au>
                  <snm>King</snm>
                  <fnm>DC</fnm>
               </au>
               <au>
                  <snm>Taylor</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Elnitski</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Chiaromonte</snm>
                  <fnm>F</fnm>
               </au>
               <au>
                  <snm>Miller</snm>
                  <fnm>W</fnm>
               </au>
               <au>
                  <snm>Hardison</snm>
                  <fnm>RC</fnm>
               </au>
            </aug>
            <source>Genome Res</source>
            <pubdate>2005</pubdate>
            <volume>15</volume>
            <fpage>1051</fpage>
            <lpage>1060</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1182217</pubid>
                  <pubid idtype="pmpid" link="fulltext">16024817</pubid>
                  <pubid idtype="doi">10.1101/gr.3642605</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B30">
            <title>
               <p>Conservation of regulatory elements between two species of Drosophila</p>
            </title>
            <aug>
               <au>
                  <snm>Emberly</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Rajewsky</snm>
                  <fnm>N</fnm>
               </au>
               <au>
                  <snm>Siggia</snm>
                  <fnm>ED</fnm>
               </au>
            </aug>
            <source>BMC Bioinformatics</source>
            <pubdate>2003</pubdate>
            <volume>4</volume>
            <fpage>57</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">302112</pubid>
                  <pubid idtype="pmpid" link="fulltext">14629780</pubid>
                  <pubid idtype="doi">10.1186/1471-2105-4-57</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B31">
            <title>
               <p>Multi-species sequence comparison: the next frontier in genome annotation</p>
            </title>
            <aug>
               <au>
                  <snm>Dubchak</snm>
                  <fnm>I</fnm>
               </au>
               <au>
                  <snm>Frazer</snm>
                  <fnm>K</fnm>
               </au>
            </aug>
            <source>Genome Biol</source>
            <pubdate>2003</pubdate>
            <volume>4</volume>
            <fpage>122</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">329408</pubid>
                  <pubid idtype="pmpid" link="fulltext">14659006</pubid>
                  <pubid idtype="doi">10.1186/gb-2003-4-12-122</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B32">
            <title>
               <p>Assessing computational tools for the discovery of transcription factor binding sites</p>
            </title>
            <aug>
               <au>
                  <snm>Tompa</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Li</snm>
                  <fnm>N</fnm>
               </au>
               <au>
                  <snm>Bailey</snm>
                  <fnm>TL</fnm>
               </au>
               <au>
                  <snm>Church</snm>
                  <fnm>GM</fnm>
               </au>
               <au>
                  <snm>De Moor</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Eskin</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Favorov</snm>
                  <fnm>AV</fnm>
               </au>
               <au>
                  <snm>Frith</snm>
                  <fnm>MC</fnm>
               </au>
               <au>
                  <snm>Fu</snm>
                  <fnm>Y</fnm>
               </au>
               <au>
                  <snm>Kent</snm>
                  <fnm>WJ</fnm>
               </au>
               <au>
                  <snm>Makeev</snm>
                  <fnm>VJ</fnm>
               </au>
               <au>
                  <snm>Mironov</snm>
                  <fnm>AA</fnm>
               </au>
               <au>
                  <snm>Noble</snm>
                  <fnm>WS</fnm>
               </au>
               <au>
                  <snm>Pavesi</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Pesole</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Regnier</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Simonis</snm>
                  <fnm>N</fnm>
               </au>
               <au>
                  <snm>Sinha</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Thijs</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>van Helden</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Vandenbogaert</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Weng</snm>
                  <fnm>Z</fnm>
               </au>
               <au>
                  <snm>Workman</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Ye</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Zhu</snm>
                  <fnm>Z</fnm>
               </au>
            </aug>
            <source>Nat Biotechnol</source>
            <pubdate>2005</pubdate>
            <volume>23</volume>
            <fpage>137</fpage>
            <lpage>144</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/nbt1053</pubid>
                  <pubid idtype="pmpid" link="fulltext">15637633</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B33">
            <title>
               <p>Phylogenetic footprinting of transcription factor binding sites in proteobacterial genomes</p>
            </title>
            <aug>
               <au>
                  <snm>McCue</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Thompson</snm>
                  <fnm>W</fnm>
               </au>
               <au>
                  <snm>Carmack</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Ryan</snm>
                  <fnm>MP</fnm>
               </au>
               <au>
                  <snm>Liu</snm>
                  <fnm>JS</fnm>
               </au>
               <au>
                  <snm>Derbyshire</snm>
                  <fnm>V</fnm>
               </au>
               <au>
                  <snm>Lawrence</snm>
                  <fnm>CE</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>2001</pubdate>
            <volume>29</volume>
            <fpage>774</fpage>
            <lpage>782</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">30389</pubid>
                  <pubid idtype="pmpid" link="fulltext">11160901</pubid>
                  <pubid idtype="doi">10.1093/nar/29.3.774</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B34">
            <title>
               <p>FootPrinter: A program designed for phylogenetic footprinting</p>
            </title>
            <aug>
               <au>
                  <snm>Blanchette</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Tompa</snm>
                  <fnm>M</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>2003</pubdate>
            <volume>31</volume>
            <fpage>3840</fpage>
            <lpage>3842</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">169012</pubid>
                  <pubid idtype="pmpid" link="fulltext">12824433</pubid>
                  <pubid idtype="doi">10.1093/nar/gkg606</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B35">
            <title>
               <p>Weeder Web: discovery of transcription factor binding sites in a set of sequences from co-regulated genes</p>
            </title>
            <aug>
               <au>
                  <snm>Pavesi</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Mereghetti</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Mauri</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Pesole</snm>
                  <fnm>G</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>2004</pubdate>
            <volume>32</volume>
            <fpage>W199</fpage>
            <lpage>203</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">441603</pubid>
                  <pubid idtype="pmpid" link="fulltext">15215380</pubid>
                  <pubid idtype="doi">10.1093/nar/gkh650</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B36">
            <title>
               <p>Analysis of computational approaches for motif discovery</p>
            </title>
            <aug>
               <au>
                  <snm>Li</snm>
                  <fnm>N</fnm>
               </au>
               <au>
                  <snm>Tompa</snm>
                  <fnm>M</fnm>
               </au>
            </aug>
            <source>Algorithms Mol Biol</source>
            <pubdate>2006</pubdate>
            <volume>1</volume>
            <fpage>8</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1540429</pubid>
                  <pubid idtype="pmpid" link="fulltext">16722558</pubid>
                  <pubid idtype="doi">10.1186/1748-7188-1-8</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B37">
            <title>
               <p>Evolution of transcription factor binding sites in Mammalian gene regulatory regions: conservation and turnover</p>
            </title>
            <aug>
               <au>
                  <snm>Dermitzakis</snm>
                  <fnm>ET</fnm>
               </au>
               <au>
                  <snm>Clark</snm>
                  <fnm>AG</fnm>
               </au>
            </aug>
            <source>Mol Biol Evol</source>
            <pubdate>2002</pubdate>
            <volume>19</volume>
            <fpage>1114</fpage>
            <lpage>1121</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">12082130</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B38">
            <title>
               <p>The value of prior knowledge in discovering motifs with MEME</p>
            </title>
            <aug>
               <au>
                  <snm>Bailey</snm>
                  <fnm>TL</fnm>
               </au>
               <au>
                  <snm>Elkan</snm>
                  <fnm>C</fnm>
               </au>
            </aug>
            <source>Proc Int Conf Intell Syst Mol Biol</source>
            <pubdate>1995</pubdate>
            <volume>3</volume>
            <fpage>21</fpage>
            <lpage>29</lpage>
            <xrefbib>
               <pubid idtype="pmpid">7584439</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B39">
            <title>
               <p>ABS: a database of Annotated regulatory Binding Sites from orthologous promoters</p>
            </title>
            <aug>
               <au>
                  <snm>Blanco</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Farre</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Alba</snm>
                  <fnm>MM</fnm>
               </au>
               <au>
                  <snm>Messeguer</snm>
                  <fnm>X</fnm>
               </au>
               <au>
                  <snm>Guigo</snm>
                  <fnm>R</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>2006</pubdate>
            <volume>34</volume>
            <fpage>D63</fpage>
            <lpage>7</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1347478</pubid>
                  <pubid idtype="pmpid" link="fulltext">16381947</pubid>
                  <pubid idtype="doi">10.1093/nar/gkj116</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B40">
            <title>
               <p>DNA assembly with gaps (Dawg): simulating sequence evolution</p>
            </title>
            <aug>
               <au>
                  <snm>Cartwright</snm>
                  <fnm>RA</fnm>
               </au>
            </aug>
            <source>Bioinformatics</source>
            <pubdate>2005</pubdate>
            <volume>21 Suppl 3</volume>
            <fpage>iii31</fpage>
            <lpage>iii38</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/bioinformatics/bti1200</pubid>
                  <pubid idtype="pmpid" link="fulltext">16306390</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B41">
            <title>
               <p>Selective constraint in intergenic regions of human and mouse genomes</p>
            </title>
            <aug>
               <au>
                  <snm>Shabalina</snm>
                  <fnm>SA</fnm>
               </au>
               <au>
                  <snm>Ogurtsov</snm>
                  <fnm>AY</fnm>
               </au>
               <au>
                  <snm>Kondrashov</snm>
                  <fnm>VA</fnm>
               </au>
               <au>
                  <snm>Kondrashov</snm>
                  <fnm>AS</fnm>
               </au>
            </aug>
            <source>Trends Genet</source>
            <pubdate>2001</pubdate>
            <volume>17</volume>
            <fpage>373</fpage>
            <lpage>376</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/S0168-9525(01)02344-7</pubid>
                  <pubid idtype="pmpid" link="fulltext">11418197</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B42">
            <title>
               <p>Evaluating phylogenetic footprinting for human-rodent comparisons</p>
            </title>
            <aug>
               <au>
                  <snm>Sauer</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Shelest</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Wingender</snm>
                  <fnm>E</fnm>
               </au>
            </aug>
            <source>Bioinformatics</source>
            <pubdate>2006</pubdate>
            <volume>22</volume>
            <fpage>430</fpage>
            <lpage>437</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/bioinformatics/bti819</pubid>
                  <pubid idtype="pmpid" link="fulltext">16332706</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B43">
            <title>
               <p>Eukaryotic regulatory element conservation analysis and identification using comparative genomics</p>
            </title>
            <aug>
               <au>
                  <snm>Liu</snm>
                  <fnm>Y</fnm>
               </au>
               <au>
                  <snm>Liu</snm>
                  <fnm>XS</fnm>
               </au>
               <au>
                  <snm>Wei</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Altman</snm>
                  <fnm>RB</fnm>
               </au>
               <au>
                  <snm>Batzoglou</snm>
                  <fnm>S</fnm>
               </au>
            </aug>
            <source>Genome Res</source>
            <pubdate>2004</pubdate>
            <volume>14</volume>
            <fpage>451</fpage>
            <lpage>458</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">353232</pubid>
                  <pubid idtype="pmpid" link="fulltext">14993210</pubid>
                  <pubid idtype="doi">10.1101/gr.1327604</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B44">
            <title>
               <p>Identification of transcription factor binding sites in the human genome sequence</p>
            </title>
            <aug>
               <au>
                  <snm>Levy</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Hannenhalli</snm>
                  <fnm>S</fnm>
               </au>
            </aug>
            <source>Mamm Genome</source>
            <pubdate>2002</pubdate>
            <volume>13</volume>
            <fpage>510</fpage>
            <lpage>514</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1007/s00335-002-2175-6</pubid>
                  <pubid idtype="pmpid" link="fulltext">12370781</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B45">
            <title>
               <p>Evolutionarily conserved elements in vertebrate, insect, worm, and yeast genomes</p>
            </title>
            <aug>
               <au>
                  <snm>Siepel</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Bejerano</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Pedersen</snm>
                  <fnm>JS</fnm>
               </au>
               <au>
                  <snm>Hinrichs</snm>
                  <fnm>AS</fnm>
               </au>
               <au>
                  <snm>Hou</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Rosenbloom</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Clawson</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Spieth</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Hillier</snm>
                  <fnm>LW</fnm>
               </au>
               <au>
                  <snm>Richards</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Weinstock</snm>
                  <fnm>GM</fnm>
               </au>
               <au>
                  <snm>Wilson</snm>
                  <fnm>RK</fnm>
               </au>
               <au>
                  <snm>Gibbs</snm>
                  <fnm>RA</fnm>
               </au>
               <au>
                  <snm>Kent</snm>
                  <fnm>WJ</fnm>
               </au>
               <au>
                  <snm>Miller</snm>
                  <fnm>W</fnm>
               </au>
               <au>
                  <snm>Haussler</snm>
                  <fnm>D</fnm>
               </au>
            </aug>
            <source>Genome Res</source>
            <pubdate>2005</pubdate>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1182216</pubid>
                  <pubid idtype="pmpid" link="fulltext">16024819</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B46">
            <title>
               <p>Characterization of a cardiac-specific enhancer, which directs {alpha}-cardiac actin gene transcription in the mouse adult heart</p>
            </title>
            <aug>
               <au>
                  <snm>Lemonnier</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Buckingham</snm>
                  <fnm>ME</fnm>
               </au>
            </aug>
            <source>J Biol Chem</source>
            <pubdate>2004</pubdate>
            <volume>279</volume>
            <fpage>55651</fpage>
            <lpage>55658</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1074/jbc.M411082200</pubid>
                  <pubid idtype="pmpid" link="fulltext">15491989</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B47">
            <title>
               <p>Control of cardiac-specific transcription by p300 through myocyte enhancer factor-2D</p>
            </title>
            <aug>
               <au>
                  <snm>Slepak</snm>
                  <fnm>TI</fnm>
               </au>
               <au>
                  <snm>Webster</snm>
                  <fnm>KA</fnm>
               </au>
               <au>
                  <snm>Zang</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Prentice</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>O'Dowd</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Hicks</snm>
                  <fnm>MN</fnm>
               </au>
               <au>
                  <snm>Bishopric</snm>
                  <fnm>NH</fnm>
               </au>
            </aug>
            <source>J Biol Chem</source>
            <pubdate>2001</pubdate>
            <volume>276</volume>
            <fpage>7575</fpage>
            <lpage>7585</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1074/jbc.M004625200</pubid>
                  <pubid idtype="pmpid" link="fulltext">11096067</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B48">
            <title>
               <p>Regulatory sequence analysis tools</p>
            </title>
            <aug>
               <au>
                  <snm>van Helden</snm>
                  <fnm>J</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>2003</pubdate>
            <volume>31</volume>
            <fpage>3593</fpage>
            <lpage>3596</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">168973</pubid>
                  <pubid idtype="pmpid" link="fulltext">12824373</pubid>
                  <pubid idtype="doi">10.1093/nar/gkg567</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B49">
            <title>
               <p>A higher-order background model improves the detection of promoter regulatory elements by Gibbs sampling</p>
            </title>
            <aug>
               <au>
                  <snm>Thijs</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Lescot</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Marchal</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Rombauts</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>De Moor</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Rouze</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Moreau</snm>
                  <fnm>Y</fnm>
               </au>
            </aug>
            <source>Bioinformatics</source>
            <pubdate>2001</pubdate>
            <volume>17</volume>
            <fpage>1113</fpage>
            <lpage>1122</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/bioinformatics/17.12.1113</pubid>
                  <pubid idtype="pmpid" link="fulltext">11751219</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B50">
            <title>
               <p>Discovery of novel transcription factor binding sites by statistical overrepresentation</p>
            </title>
            <aug>
               <au>
                  <snm>Sinha</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Tompa</snm>
                  <fnm>M</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>2002</pubdate>
            <volume>30</volume>
            <fpage>5549</fpage>
            <lpage>5560</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">140044</pubid>
                  <pubid idtype="pmpid" link="fulltext">12490723</pubid>
                  <pubid idtype="doi">10.1093/nar/gkf669</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B51">
            <title>
               <p>Background rareness-based iterative multiple sequence alignment algorithm for regulatory element detection</p>
            </title>
            <aug>
               <au>
                  <snm>Narasimhan</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>LoCascio</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Uberbacher</snm>
                  <fnm>E</fnm>
               </au>
            </aug>
            <source>Bioinformatics</source>
            <pubdate>2003</pubdate>
            <volume>19</volume>
            <fpage>1952</fpage>
            <lpage>1963</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/bioinformatics/btg266</pubid>
                  <pubid idtype="pmpid" link="fulltext">14555629</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
      </refgrp>
   </bm>
</art>
