<?xml version='1.0'?>
<!DOCTYPE art SYSTEM 'http://www.biomedcentral.com/xml/article.dtd'>
<art>
   <ui>1748-7188-1-13</ui>
   <ji>1748-7188</ji>
   <fm>
      <dochead>Research</dochead>
      <bibl>
         <title>
            <p>A combinatorial optimization approach for diverse motif finding applications</p>
         </title>
         <aug>
            <au id="A1" ca="yes">
               <snm>Zaslavsky</snm>
               <fnm>Elena</fnm>
               <insr iid="I1"/>
               <email>elenaz@cs.princeton.edu</email>
            </au>
            <au id="A2" ca="yes">
               <snm>Singh</snm>
               <fnm>Mona</fnm>
               <insr iid="I1"/>
               <email>msingh@cs.princeton.edu</email>
            </au>
         </aug>
         <insg>
            <ins id="I1">
               <p>Department of Computer Science &amp; Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, NJ 08544, USA</p>
            </ins>
         </insg>
         <source>Algorithms for Molecular Biology</source>
         <issn>1748-7188</issn>
         <pubdate>2006</pubdate>
         <volume>1</volume>
         <issue>1</issue>
         <fpage>13</fpage>
         <url>http://www.almob.org/content/1/1/13</url>
         <xrefbib>
            <pubidlist>
               <pubid idtype="pmpid">16916460</pubid>
               <pubid idtype="doi">10.1186/1748-7188-1-13</pubid>
            </pubidlist>
         </xrefbib>
      </bibl>
      <history>
         <rec>
            <date>
               <day>02</day>
               <month>7</month>
               <year>2006</year>
            </date>
         </rec>
         <acc>
            <date>
               <day>17</day>
               <month>8</month>
               <year>2006</year>
            </date>
         </acc>
         <pub>
            <date>
               <day>17</day>
               <month>8</month>
               <year>2006</year>
            </date>
         </pub>
      </history>
      <cpyrt>
         <year>2006</year>
         <collab>Zaslavsky and Singh; licensee BioMed Central Ltd.</collab>
         <note>This is an Open Access article distributed under the terms of the Creative Commons Attribution License (<url>http://creativecommons.org/licenses/by/2.0</url>), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</note>
      </cpyrt>
      <abs>
         <sec>
            <st>
               <p>Abstract</p>
            </st>
            <sec>
               <st>
                  <p>Background</p>
               </st>
               <p>Discovering approximately repeated patterns, or motifs, in biological sequences is an important and widely-studied problem in computational molecular biology. Most frequently, motif finding applications arise when identifying shared regulatory signals within DNA sequences or shared functional and structural elements within protein sequences. Due to the diversity of contexts in which motif finding is applied, several variations of the problem are commonly studied.</p>
            </sec>
            <sec>
               <st>
                  <p>Results</p>
               </st>
               <p>We introduce a versatile combinatorial optimization framework for motif finding that couples graph pruning techniques with a novel integer linear programming formulation. Our approach is flexible and robust enough to model several variants of the motif finding problem, including those incorporating substitution matrices and phylogenetic distances. Additionally, we give an approach for determining statistical significance of uncovered motifs. In testing on numerous DNA and protein datasets, we demonstrate that our approach typically identifies statistically significant motifs corresponding to either known motifs or other motifs of high conservation. Moreover, in most cases, our approach finds provably optimal solutions to the underlying optimization problem.</p>
            </sec>
            <sec>
               <st>
                  <p>Conclusion</p>
               </st>
               <p>Our results demonstrate that a combined graph theoretic and mathematical programming approach can be the basis for effective and powerful techniques for diverse motif finding applications.</p>
            </sec>
         </sec>
      </abs>
   </fm>
   <bdy>
      <sec>
         <st>
            <p>Background</p>
         </st>
         <p>Motif discovery is the problem of finding approximately repeated patterns in unaligned sequence data. It is important in uncovering transcriptional networks, as short common subsequences in genomic data may correspond to a regulatory protein's binding sites, and in protein function identification, where short blocks of conserved amino acids code for important structural or functional elements.</p>
         <p>The biological problems addressed by motif finding are complex and varied, and no single currently existing method can solve them completely (e.g., see <abbrgrp><abbr bid="B1">1</abbr><abbr bid="B2">2</abbr></abbrgrp>). For DNA sequences, motif finding is often applied to sets of sequences from a single genome that have been identified as possessing a common motif, either through DNA microarray studies <abbrgrp><abbr bid="B3">3</abbr></abbrgrp>, ChIP-chip experiments <abbrgrp><abbr bid="B4">4</abbr></abbrgrp> or protein binding microarrays <abbrgrp><abbr bid="B5">5</abbr></abbrgrp>. An orthogonal approach, which attempts to identify regulatory sites among a set of orthologous genes across genomes of varying phylogenetic distance, is adopted by <abbrgrp><abbr bid="B6">6</abbr><abbr bid="B7">7</abbr><abbr bid="B8">8</abbr><abbr bid="B9">9</abbr><abbr bid="B10">10</abbr></abbrgrp>. For protein sequences, and especially in the case of divergent sequence motifs, it is particularly useful to incorporate amino acid substitution matrices <abbrgrp><abbr bid="B11">11</abbr><abbr bid="B12">12</abbr></abbrgrp>. Often, motif finding methods are either tailor-made to a specific variant of the motif finding problem, or perform very differently when presented with a diverse set of instances.</p>
         <p>Numerous approaches to motif finding have been suggested (e.g., <abbrgrp><abbr bid="B13">13</abbr><abbr bid="B14">14</abbr><abbr bid="B15">15</abbr><abbr bid="B16">16</abbr><abbr bid="B17">17</abbr><abbr bid="B18">18</abbr><abbr bid="B19">19</abbr><abbr bid="B20">20</abbr><abbr bid="B21">21</abbr><abbr bid="B22">22</abbr><abbr bid="B23">23</abbr><abbr bid="B24">24</abbr></abbrgrp>, and those referenced in <abbrgrp><abbr bid="B1">1</abbr></abbrgrp>). These methods differ mainly in the choice of the motif representation, the objective function used for assessing the quality of a motif, and the search procedure used for finding an optimal (or sub-optimal but reasonable) solution. Two broad categories of motif finding algorithms can be identified <abbrgrp><abbr bid="B25">25</abbr></abbrgrp>: stochastic-search methods based on the position-specific scoring matrix (PSSM) representation and combinatorial approaches based on variants of the consensus sequence representation. Both categories come with their own sets of scoring functions (e.g., see <abbrgrp><abbr bid="B25">25</abbr><abbr bid="B26">26</abbr></abbrgrp>), and most variants of the motif finding problem are NP-hard, including those optimizing either the average information content score or the sum-of-pairs score <abbrgrp><abbr bid="B27">27</abbr></abbrgrp>. The performance of these two broad groups of methods seem to be complementary in many cases, with a slight performance advantage demonstrated by representative methods of the combinatorial class (e.g., Weeder <abbrgrp><abbr bid="B24">24</abbr></abbrgrp>), as reported in a recent comprehensive study <abbrgrp><abbr bid="B1">1</abbr></abbrgrp>. However, many combinatorial methods enumerate every possible pattern, and are thus limited in the length of the motifs they can search for. While this may be less of an issue in eukaryotic genomes, where transcriptional regulation is mediated combinatorially with a large number of transcription factors with relatively short binding sites, substantially longer motifs are found when considering either DNA binding sites in prokaryotic genomes (e.g., for helix-turn-helix binding domains of transcription factors) or protein motifs <abbrgrp><abbr bid="B28">28</abbr><abbr bid="B29">29</abbr></abbrgrp>.</p>
         <p>Here, we introduce a combinatorial optimization framework for motif finding that is flexible enough to model several variants of the problem and is not limited by the motif length. Underlying our approach, we consider motif discovery as the problem of finding the best gapless local multiple sequence alignment using the sum-of-pairs (SP) scoring scheme. The SP-score is one of many reasonable schemes for assessing motif conservation <abbrgrp><abbr bid="B30">30</abbr><abbr bid="B31">31</abbr></abbrgrp>. In the case of motif search, where the goal is to use a set of known motif instances and uncover additional instances, the SP-score has been shown empirically to be comparable to PSSM-based methods <abbrgrp><abbr bid="B32">32</abbr></abbrgrp>. Additionally, unlike the PSSM models, which typically assume independence of motif positions, the SP-score can address the problem of nucleotide or amino acid dependencies in a natural way. This is an important consideration; for example in the case of nucleotides, it has been shown that there are interdependent effects between bases <abbrgrp><abbr bid="B33">33</abbr><abbr bid="B34">34</abbr></abbrgrp>. Moreover, modeling these dependencies using the SP-score leads to improved performance in representing and searching for binding sites; a similar statistically significant improvement is not observed when extending PSSMs to incorporate pairwise dependencies <abbrgrp><abbr bid="B32">32</abbr></abbrgrp>. The SP-score was most recently utilized in the context of motif finding in MaMF <abbrgrp><abbr bid="B13">13</abbr></abbrgrp>.</p>
         <p>In this paper, we use the SP-score and recast the motif finding problem as that of finding a maximum (or minimum) weight clique in a multi-partite graph, and introduce a two-pronged approach, based on graph pruning and mathematical programming, to solve it. In particular, we first formulate the problem as an integer linear program (ILP) and then consider its linear programming relaxation. In practice, the linear programs (LPs) arising from motif finding applications can be prohibitively large, numbering in the millions of variables. Thus, to reduce the size of the LPs, we develop a number of new pruning techniques, building upon the ideas of <abbrgrp><abbr bid="B35">35</abbr><abbr bid="B36">36</abbr></abbrgrp>. These fall into the broad category of dead-end elimination (DEE) algorithms (e.g., <abbrgrp><abbr bid="B37">37</abbr></abbrgrp>), where sequence positions that are incompatible with the optimal alignment are discarded. In practice, such methods are very effective in reducing problem size; to handle the rare cases where the DEE techniques do not sufficiently prune the problem instance, we also develop a heuristic iterative scheme to eliminate sequence positions. The reduced linear programs are then solved by the ILOG CPLEX LP solver, and in cases where fractional solutions are found, an ILP solver is applied.</p>
         <p>Given a motif discovered by any method, it is important to be able to assess its statistical significance, as even optimal solutions for their respective objective functions may result in very poor motifs. We demonstrate how to test the statistical significance of the motifs discovered via the graph pruning/mathematical programming approach by using the background frequencies for each base or amino acid and computing the motif scores' probability distribution <abbrgrp><abbr bid="B38">38</abbr></abbrgrp>. We then assess the number of motifs of the same or better quality that are expected to occur in the data at random. In the few cases where the heuristic DEE procedure is applied, we are able to give a lower bound on the significance value of the optimal solution; this allows us to evaluate how much better an alternate undiscovered motif might be.</p>
         <p>We test our coupled mathematical programming and pruning approach, LP/DEE, in diverse settings. First, we consider the problem of finding shared motifs in protein sequences. Unlike commonly-used PSSM-based methods for motif finding (e.g., <abbrgrp><abbr bid="B15">15</abbr><abbr bid="B18">18</abbr></abbrgrp>), our combinatorial formulation naturally incorporates amino acid substitution matrices. For all tested datasets, we find the actual protein motifs exactly, and these motifs correspond to optimal solutions according to the SP scoring scheme. Second, we consider sets of genes known to be regulated by the same <it>E. coli </it>transcription factor, and apply our approach to find the corresponding binding sites in genomic sequence data. We compare our results with those of three popular methods <abbrgrp><abbr bid="B18">18</abbr><abbr bid="B22">22</abbr><abbr bid="B39">39</abbr></abbrgrp>, and show that our method is often able to better locate the actual binding sites. Using the same dataset, we also embed <it>E. coli </it>binding sites within sequences of increasingly biased composition, and show that our scoring scheme and motif finding procedure is effective in this scenario as well. Third, we consider the phylogenetic footprinting problem <abbrgrp><abbr bid="B9">9</abbr></abbrgrp>, and find shared motifs upstream of orthologous genes. The difficulty of this problem lies in that the sequences may not have had enough evolutionary time to diverge and may share sequence level similarity beyond the functionally important site; incorporation of additional information, in the form of weightings obtained from a phylogenetic tree relating the species, proves useful in this context. Finally, we demonstrate in the context of phylogenetic footprinting that our formulation can be used to find multiple solutions, corresponding to several distinct motifs. In all scenarios, we test the uncovered motifs for statistical significance. We show that our method works well in practice, typically recovering statistically significant motifs that correspond to either known motifs or other motifs of high conservation.</p>
         <p>Interestingly, the vast majority of motif finding instances considered are not only effectively pruned by the optimality-preserving DEE methods, but also lead to linear programs whose optimal solutions are integral. These two conditions together guarantee optimality of the final solution for the original SP-based motif finding problem. This is interesting, since the motif finding formulation is known to be NP-hard <abbrgrp><abbr bid="B27">27</abbr></abbrgrp>, and nevertheless our approach runs in polynomial time for many practical instances of the problem. Overall, the ability of our LP/DEE method to find optimal solutions to large problems demonstrates the power of the computational search procedures, and its performance in uncovering known motifs illustrates its utility for novel sequence motif discovery.</p>
      </sec>
      <sec>
         <st>
            <p>Methods</p>
         </st>
         <sec>
            <st>
               <p>Broad problem formulation</p>
            </st>
            <p>Motif discovery is modeled here as the problem of finding an ungapped local multiple sequence alignment (MSA) of fixed length with the best sum-of-pairs (SP) score. That is, given <it>N </it>sequences {<it>S</it><sub>1</sub>, ..., <it>S</it><sub><it>N</it></sub>} and a block length parameter <it>l</it>, the goal is to find an <it>l</it>-long subsequence from each input sequence so that the total similarity among selected blocks is maximized. More formally, let <m:math name="1748-7188-1-13-i1" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mrow><m:msubsup><m:mi>s</m:mi><m:mi>i</m:mi><m:mi>k</m:mi></m:msubsup></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacqWGZbWCdaqhaaWcbaGaemyAaKgabaGaem4AaSgaaaaa@3102@</m:annotation></m:semantics></m:math> refer to the <it>l</it>-long subsequence in sequence <it>S</it><sub><it>i </it></sub>beginning in position <it>k </it>and let <it>sim</it>(<it>x</it>, <it>y</it>) denote a similarity score between the <it>l</it>-long subsequences <it>x</it>, <it>y</it>. The objective is then to find the set of positions {<it>k</it><sub>1</sub>, ..., <it>k</it><sub><it>N</it></sub>} in each sequence, such that the SP-score &#8721;<sub><it>i</it>&lt;<it>j </it></sub><it>sim </it>(<m:math name="1748-7188-1-13-i2" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mrow><m:msubsup><m:mi>s</m:mi><m:mi>i</m:mi><m:mrow><m:msub><m:mi>k</m:mi><m:mi>i</m:mi></m:msub></m:mrow></m:msubsup></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacqWGZbWCdaqhaaWcbaGaemyAaKgabaGaem4AaS2aaSbaaWqaaiabdMgaPbqabaaaaaaa@328A@</m:annotation></m:semantics></m:math>, <m:math name="1748-7188-1-13-i3" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mrow><m:msubsup><m:mi>s</m:mi><m:mi>j</m:mi><m:mrow><m:msub><m:mi>k</m:mi><m:mi>j</m:mi></m:msub></m:mrow></m:msubsup></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacqWGZbWCdaqhaaWcbaGaemOAaOgabaGaem4AaS2aaSbaaWqaaiabdQgaQbqabaaaaaaa@328E@</m:annotation></m:semantics></m:math>) is maximized.</p>
            <p>This problem can be formulated in graph-theoretic terms <abbrgrp><abbr bid="B40">40</abbr></abbrgrp>. Let <it>G </it>be an undirected <it>N</it>-partite graph with node set <it>V</it><sub>1 </sub>&#8746; ... &#8746; <it>V</it><sub><it>N</it></sub>, where V<sub><it>i </it></sub>includes a node <it>u </it>for each <it>l</it>-long subsequence <m:math name="1748-7188-1-13-i1" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mrow><m:msubsup><m:mi>s</m:mi><m:mi>i</m:mi><m:mi>k</m:mi></m:msubsup></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacqWGZbWCdaqhaaWcbaGaemyAaKgabaGaem4AaSgaaaaa@3102@</m:annotation></m:semantics></m:math> in the <it>i</it>-th sequence. Note that the subsequences corresponding to two consecutive vertices overlap in <it>l </it>- 1 positions, and that the <it>V</it><sub><it>i</it></sub>'s may have varying sizes. Each pair of nodes <it>u </it>&#8712; <it>V</it><sub><it>i </it></sub>and <it>v </it>&#8712; <it>V</it><sub><it>j </it></sub>(<it>i </it>&#8800; <it>j</it>), corresponding to subsequences <m:math name="1748-7188-1-13-i1" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mrow><m:msubsup><m:mi>s</m:mi><m:mi>i</m:mi><m:mi>k</m:mi></m:msubsup></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacqWGZbWCdaqhaaWcbaGaemyAaKgabaGaem4AaSgaaaaa@3102@</m:annotation></m:semantics></m:math> and <m:math name="1748-7188-1-13-i4" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mrow><m:msubsup><m:mi>s</m:mi><m:mi>j</m:mi><m:msup><m:mi>k</m:mi><m:mo>&#8242;</m:mo></m:msup></m:msubsup></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacqWGZbWCdaqhaaWcbaGaemOAaOgabaGafm4AaSMbauaaaaaaaa@3110@</m:annotation></m:semantics></m:math> in <it>S</it><sub><it>i </it></sub>and <it>S</it><sub><it>j </it></sub>respectively, is joined by an edge with weight of <it>w</it><sub><it>uv </it></sub>= <it>sim </it>(<m:math name="1748-7188-1-13-i1" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mrow><m:msubsup><m:mi>s</m:mi><m:mi>i</m:mi><m:mi>k</m:mi></m:msubsup></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacqWGZbWCdaqhaaWcbaGaemyAaKgabaGaem4AaSgaaaaa@3102@</m:annotation></m:semantics></m:math>, <m:math name="1748-7188-1-13-i4" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mrow><m:msubsup><m:mi>s</m:mi><m:mi>j</m:mi><m:msup><m:mi>k</m:mi><m:mo>&#8242;</m:mo></m:msup></m:msubsup></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacqWGZbWCdaqhaaWcbaGaemOAaOgabaGafm4AaSMbauaaaaaaaa@3110@</m:annotation></m:semantics></m:math>). By this construction <it>G </it>is a complete <it>N</it>-partite graph. The MSA is achieved by picking the highest weight <it>N</it>-partite clique in graph <it>G</it>.</p>
            <p>Similarity between <it>l</it>-long subsequences can be defined using a simple scoring scheme, such as counting up the number of matching bases or amino acids when the subsequences are aligned. However, for DNA sequences, the background distribution of the input sequence can vary substantially, and in order to reward matches of more infrequent bases, instead of using 1 for a match, we assign a score of <it>log</it>(1/<it>f</it>(<it>b</it>)) for a base <it>b </it>pairing, where <it>f</it>(<it>b</it>) is the zero-corrected frequency of base <it>b </it>in the background, and 0 for any mismatch. (We also experimented with a scheme that assigns a score of 1/<it>f</it>(<it>b</it>) for a base <it>b </it>match; both methods perform similarly). In practice, we work with integral scores by scaling the floating point numbers to the desired degree of accuracy and rounding (here, we use the scale factor of 100). For protein sequences, on the other hand, compositional bias is not as major an issue, and instead, to better capture the relationships between the amino acids, we score the similarity between two amino acids using substitution matrices. This assigns higher scores to more favorable substitutions and better reflects shared biochemical properties of such pairings. We experiment with both PAM <abbrgrp><abbr bid="B11">11</abbr></abbrgrp> and BLOSUM <abbrgrp><abbr bid="B12">12</abbr></abbrgrp> matrix families.</p>
         </sec>
         <sec>
            <st>
               <p>Integer linear programming formulation</p>
            </st>
            <p>The motif finding problem can be formulated as an ILP as follows. For a graph <it>G </it>= (<it>V</it>, <it>E</it>) corresponding to the motif finding problem, where <it>V </it>= <it>V</it><sub>1 </sub>&#8746; ... &#8746; <it>V</it><sub><it>N </it></sub>and <it>E </it>= {(<it>u</it>, <it>v</it>): <it>u </it>&#8712; <it>V</it><sub><it>i</it></sub>, <it>v </it>&#8712; <it>V</it><sub><it>j</it></sub>, <it>i </it>&#8800; <it>j</it>}, we introduce a binary decision variable <it>x</it><sub><it>u </it></sub>for every vertex <it>u</it>, and a binary decision variable <it>y</it><sub><it>uv </it></sub>for every edge (<it>u</it>, <it>v</it>). Setting <it>x</it><sub><it>u </it></sub>to 1 corresponds to selecting vertex <it>u </it>for the clique and thus choosing the sequence position corresponding to <it>u </it>in the alignment. Edge variable <it>y</it><sub><it>uv </it></sub>is set to 1 if both endpoints <it>u </it>and <it>v </it>are selected for the clique. Then the following ILP solves the motif finding problem formulated above:</p>
            <p>
               <m:math name="1748-7188-1-13-i5" xmlns:m="http://www.w3.org/1998/Math/MathML">
                  <m:semantics>
                     <m:mrow>
                        <m:mtable columnalign="left">
                           <m:mtr columnalign="left">
                              <m:mtd columnalign="left">
                                 <m:mrow>
                                    <m:mtext>Maximize</m:mtext>
                                 </m:mrow>
                              </m:mtd>
                              <m:mtd columnalign="left">
                                 <m:mrow>
                                    <m:mstyle displaystyle="true">
                                       <m:msub>
                                          <m:mo>&#8721;</m:mo>
                                          <m:mrow>
                                             <m:mo stretchy="false">(</m:mo>
                                             <m:mi>u</m:mi>
                                             <m:mo>,</m:mo>
                                             <m:mi>v</m:mi>
                                             <m:mo stretchy="false">)</m:mo>
                                             <m:mo>&#8712;</m:mo>
                                             <m:mi>E</m:mi>
                                          </m:mrow>
                                       </m:msub>
                                       <m:mrow>
                                          <m:msub>
                                             <m:mi>w</m:mi>
                                             <m:mrow>
                                                <m:mi>u</m:mi>
                                                <m:mi>v</m:mi>
                                             </m:mrow>
                                          </m:msub>
                                          <m:mo>&#8901;</m:mo>
                                          <m:msub>
                                             <m:mi>y</m:mi>
                                             <m:mrow>
                                                <m:mi>u</m:mi>
                                                <m:mi>v</m:mi>
                                             </m:mrow>
                                          </m:msub>
                                       </m:mrow>
                                    </m:mstyle>
                                 </m:mrow>
                              </m:mtd>
                              <m:mtd columnalign="left">
                                 <m:mrow/>
                              </m:mtd>
                           </m:mtr>
                           <m:mtr columnalign="left">
                              <m:mtd columnalign="left">
                                 <m:mrow>
                                    <m:mtext>subject&#160;to</m:mtext>
                                 </m:mrow>
                              </m:mtd>
                              <m:mtd columnalign="left">
                                 <m:mrow/>
                              </m:mtd>
                              <m:mtd columnalign="left">
                                 <m:mrow/>
                              </m:mtd>
                           </m:mtr>
                           <m:mtr columnalign="left">
                              <m:mtd columnalign="left">
                                 <m:mrow>
                                    <m:mstyle displaystyle="true">
                                       <m:msub>
                                          <m:mo>&#8721;</m:mo>
                                          <m:mrow>
                                             <m:mi>u</m:mi>
                                             <m:mo>&#8712;</m:mo>
                                             <m:msub>
                                                <m:mi>V</m:mi>
                                                <m:mi>j</m:mi>
                                             </m:msub>
                                          </m:mrow>
                                       </m:msub>
                                       <m:mrow>
                                          <m:msub>
                                             <m:mi>x</m:mi>
                                             <m:mi>u</m:mi>
                                          </m:msub>
                                          <m:mo>=</m:mo>
                                          <m:mn>1</m:mn>
                                       </m:mrow>
                                    </m:mstyle>
                                 </m:mrow>
                              </m:mtd>
                              <m:mtd columnalign="left">
                                 <m:mrow>
                                    <m:mtext>for&#160;</m:mtext>
                                    <m:mn>1</m:mn>
                                    <m:mo>&#8804;</m:mo>
                                    <m:mi>j</m:mi>
                                    <m:mo>&#8804;</m:mo>
                                    <m:mi>N</m:mi>
                                 </m:mrow>
                              </m:mtd>
                              <m:mtd columnalign="left">
                                 <m:mrow>
                                    <m:mo stretchy="false">(</m:mo>
                                    <m:mi>n</m:mi>
                                    <m:mi>o</m:mi>
                                    <m:mi>d</m:mi>
                                    <m:mi>e</m:mi>
                                    <m:mtext>&#160;constraints</m:mtext>
                                    <m:mo stretchy="false">)</m:mo>
                                 </m:mrow>
                              </m:mtd>
                           </m:mtr>
                           <m:mtr columnalign="left">
                              <m:mtd columnalign="left">
                                 <m:mrow>
                                    <m:mstyle displaystyle="true">
                                       <m:msub>
                                          <m:mo>&#8721;</m:mo>
                                          <m:mrow>
                                             <m:mi>u</m:mi>
                                             <m:mo>&#8712;</m:mo>
                                             <m:msub>
                                                <m:mi>V</m:mi>
                                                <m:mi>j</m:mi>
                                             </m:msub>
                                          </m:mrow>
                                       </m:msub>
                                       <m:mrow>
                                          <m:msub>
                                             <m:mi>y</m:mi>
                                             <m:mrow>
                                                <m:mi>u</m:mi>
                                                <m:mi>v</m:mi>
                                             </m:mrow>
                                          </m:msub>
                                          <m:mo>=</m:mo>
                                          <m:msub>
                                             <m:mi>x</m:mi>
                                             <m:mi>v</m:mi>
                                          </m:msub>
                                       </m:mrow>
                                    </m:mstyle>
                                 </m:mrow>
                              </m:mtd>
                              <m:mtd columnalign="left">
                                 <m:mrow>
                                    <m:mtext>for&#160;</m:mtext>
                                    <m:mn>1</m:mn>
                                    <m:mo>&#8804;</m:mo>
                                    <m:mi>j</m:mi>
                                    <m:mo>&#8804;</m:mo>
                                    <m:mi>N</m:mi>
                                    <m:mo>,</m:mo>
                                    <m:mi>v</m:mi>
                                    <m:mo>&#8712;</m:mo>
                                    <m:mi>V</m:mi>
                                    <m:mo>\</m:mo>
                                    <m:msub>
                                       <m:mi>V</m:mi>
                                       <m:mi>j</m:mi>
                                    </m:msub>
                                 </m:mrow>
                              </m:mtd>
                              <m:mtd columnalign="left">
                                 <m:mrow>
                                    <m:mo stretchy="false">(</m:mo>
                                    <m:mi>e</m:mi>
                                    <m:mi>d</m:mi>
                                    <m:mi>g</m:mi>
                                    <m:mi>e</m:mi>
                                    <m:mtext>&#160;constraints</m:mtext>
                                    <m:mo stretchy="false">)</m:mo>
                                 </m:mrow>
                              </m:mtd>
                           </m:mtr>
                           <m:mtr columnalign="left">
                              <m:mtd columnalign="left">
                                 <m:mrow>
                                    <m:msub>
                                       <m:mi>x</m:mi>
                                       <m:mi>u</m:mi>
                                    </m:msub>
                                    <m:mo>,</m:mo>
                                    <m:msub>
                                       <m:mi>y</m:mi>
                                       <m:mrow>
                                          <m:mi>u</m:mi>
                                          <m:mi>v</m:mi>
                                       </m:mrow>
                                    </m:msub>
                                    <m:mo>&#8712;</m:mo>
                                    <m:mo>{</m:mo>
                                    <m:mn>0</m:mn>
                                    <m:mo>,</m:mo>
                                    <m:mn>1</m:mn>
                                    <m:mo>}</m:mo>
                                 </m:mrow>
                              </m:mtd>
                              <m:mtd columnalign="left">
                                 <m:mrow>
                                    <m:mtext>for&#160;</m:mtext>
                                    <m:mi>u</m:mi>
                                    <m:mo>&#8712;</m:mo>
                                    <m:mi>V</m:mi>
                                    <m:mo>,</m:mo>
                                    <m:mo stretchy="false">(</m:mo>
                                    <m:mi>u</m:mi>
                                    <m:mo>,</m:mo>
                                    <m:mi>v</m:mi>
                                    <m:mo stretchy="false">)</m:mo>
                                    <m:mo>&#8712;</m:mo>
                                    <m:mi>E</m:mi>
                                 </m:mrow>
                              </m:mtd>
                              <m:mtd columnalign="left">
                                 <m:mrow/>
                              </m:mtd>
                           </m:mtr>
                        </m:mtable>
                     </m:mrow>
                     <m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaafaqaaeqbdaaaaeaacqqGnbqtcqqGHbqycqqG4baEcqqGPbqAcqqGTbqBcqqGPbqAcqqG6bGEcqqGLbqzaeaadaaeqaqaaiabdEha3naaBaaaleaacqWG1bqDcqWG2bGDaeqaaOGaeyyXICTaemyEaK3aaSbaaSqaaiabdwha1jabdAha2bqabaaabaGaeiikaGIaemyDauNaeiilaWIaemODayNaeiykaKIaeyicI4SaemyraueabeqdcqGHris5aaGcbaaabaGaee4CamNaeeyDauNaeeOyaiMaeeOAaOMaeeyzauMaee4yamMaeeiDaqNaeeiiaaIaeeiDaqNaee4Ba8gabaaabaaabaWaaabeaeaacqWG4baEdaWgaaWcbaGaemyDauhabeaakiabg2da9iabigdaXaWcbaGaemyDauNaeyicI4SaemOvay1aaSbaaWqaaiabdQgaQbqabaaaleqaniabggHiLdaakeaacqqGMbGzcqqGVbWBcqqGYbGCcqqGGaaicqaIXaqmcqGHKjYOcqWGQbGAcqGHKjYOcqWGobGtaeaacqGGOaakcqWGUbGBcqWGVbWBcqWGKbazcqWGLbqzcqqGGaaicqqGJbWycqqGVbWBcqqGUbGBcqqGZbWCcqqG0baDcqqGYbGCcqqGHbqycqqGPbqAcqqGUbGBcqqG0baDcqqGZbWCcqGGPaqkaeaadaaeqaqaaiabdMha5naaBaaaleaacqWG1bqDcqWG2bGDaeqaaOGaeyypa0JaemiEaG3aaSbaaSqaaiabdAha2bqabaaabaGaemyDauNaeyicI4SaemOvay1aaSbaaWqaaiabdQgaQbqabaaaleqaniabggHiLdaakeaacqqGMbGzcqqGVbWBcqqGYbGCcqqGGaaicqaIXaqmcqGHKjYOcqWGQbGAcqGHKjYOcqWGobGtcqGGSaalcqWG2bGDcqGHiiIZcqWGwbGvcqGGCbaxcqWGwbGvdaWgaaWcbaGaemOAaOgabeaaaOqaaiabcIcaOiabdwgaLjabdsgaKjabdEgaNjabdwgaLjabbccaGiabbogaJjabb+gaVjabb6gaUjabbohaZjabbsha0jabbkhaYjabbggaHjabbMgaPjabb6gaUjabbsha0jabbohaZjabcMcaPaqaaiabdIha4naaBaaaleaacqWG1bqDaeqaaOGaeiilaWIaemyEaK3aaSbaaSqaaiabdwha1jabdAha2bqabaGccqGHiiIZcqGG7bWEcqaIWaamcqGGSaalcqaIXaqmcqGG9bqFaeaacqqGMbGzcqqGVbWBcqqGYbGCcqqGGaaicqWG1bqDcqGHiiIZcqWGwbGvcqGGSaalcqGGOaakcqWG1bqDcqGGSaalcqWG2bGDcqGGPaqkcqGHiiIZcqWGfbqraeaaaaaaaa@E7E9@</m:annotation>
                  </m:semantics>
               </m:math>
            </p>
            <p>The first set of constraints ensures that exactly one vertex is picked from every graph part, corresponding to one position being chosen from every input sequence. The second set of constraints relates vertex variables to edge variables, allowing the objective function to be expressed in terms of finding a maximum edge-weight clique. An edge is chosen only if it connects two chosen vertices. This formulation is similar to that used by us <abbrgrp><abbr bid="B41">41</abbr></abbrgrp> in fixed-backbone protein design and homology modeling.</p>
            <p>ILP itself is NP-hard, but replacing the integrality constraints on the <it>x </it>and <it>y </it>variables with 0 &#8804; <it>x</it><sub><it>u</it></sub>, <it>y</it><sub><it>uv </it></sub>&#8804; 1 gives an LP that can be solved in polynomial time. If the LP solution happens to be integral, it is guaranteed to be optimal for the original ILP and motif finding problem. Non-integral solutions, on the other hand, are not feasible for the ILP and do not translate to a selection of positions for the MSA problem; in these cases, more computationally intensive ILP solvers must be invoked.</p>
         </sec>
         <sec>
            <st>
               <p>Graph pruning techniques</p>
            </st>
            <p>In this section, we introduce a number of successively more powerful optimality-preserving dead-end elimination (DEE) techniques for pruning graphs corresponding to motif finding problems. The basic idea is to discard vertices and/or edges that cannot possibly be part of the optimal solution.</p>
            <sec>
               <st>
                  <p>Basic clique-bounds DEE</p>
               </st>
               <p>The idea of our first pruning technique is as follows. Suppose there exists a clique of weight <it>C</it>* in <it>G</it>. Then a vertex <it>u</it>, whose participation in any possible clique in <it>G </it>reduces the weight of that clique below <it>C</it>*, is incompatible with the optimal alignment and can be safely eliminated (similar to <abbrgrp><abbr bid="B36">36</abbr></abbrgrp>).</p>
               <p>For <it>u </it>&#8712; <it>V</it><sub><it>i </it></sub>define <it>star</it>(<it>u</it>) to be a selection of vertices from every graph part other than <it>V</it><sub><it>i</it></sub>. Let <it>F</it><sub><it>u </it></sub>be the value induced by the edge weights for a <it>star</it>(<it>u</it>) that form best pairwise alignments with <it>u</it>:</p>
               <p>
                  <m:math name="1748-7188-1-13-i6" xmlns:m="http://www.w3.org/1998/Math/MathML">
                     <m:semantics>
                        <m:mrow>
                           <m:msub>
                              <m:mi>F</m:mi>
                              <m:mi>u</m:mi>
                           </m:msub>
                           <m:mo>=</m:mo>
                           <m:mstyle displaystyle="true">
                              <m:munder>
                                 <m:mo>&#8721;</m:mo>
                                 <m:mrow>
                                    <m:mi>j</m:mi>
                                    <m:mo>&#8800;</m:mo>
                                    <m:mi>i</m:mi>
                                 </m:mrow>
                              </m:munder>
                              <m:mrow>
                                 <m:munder>
                                    <m:mrow>
                                       <m:mi>max</m:mi>
                                       <m:mo>&#8289;</m:mo>
                                    </m:mrow>
                                    <m:mrow>
                                       <m:mi>v</m:mi>
                                       <m:mo>&#8712;</m:mo>
                                       <m:msub>
                                          <m:mi>V</m:mi>
                                          <m:mi>j</m:mi>
                                       </m:msub>
                                    </m:mrow>
                                 </m:munder>
                                 <m:mtext>&#160;</m:mtext>
                              </m:mrow>
                           </m:mstyle>
                           <m:msub>
                              <m:mi>w</m:mi>
                              <m:mrow>
                                 <m:mi>u</m:mi>
                                 <m:mi>v</m:mi>
                              </m:mrow>
                           </m:msub>
                           <m:mtext>&#160;&#160;&#160;&#160;&#160;</m:mtext>
                           <m:mrow>
                              <m:mo>(</m:mo>
                              <m:mn>1</m:mn>
                              <m:mo>)</m:mo>
                           </m:mrow>
                        </m:mrow>
                        <m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacqWGgbGrdaWgaaWcbaGaemyDauhabeaakiabg2da9maaqafabaWaaCbeaeaacyGGTbqBcqGGHbqycqGG4baEaSqaaiabdAha2jabgIGiolabdAfawnaaBaaameaacqWGQbGAaeqaaaWcbeaakiabbccaGaWcbaGaemOAaOMaeyiyIKRaemyAaKgabeqdcqGHris5aOGaem4DaC3aaSbaaSqaaiabdwha1jabdAha2bqabaGccaWLjaGaaCzcamaabmGabaGaeGymaedacaGLOaGaayzkaaaaaa@4A63@</m:annotation>
                     </m:semantics>
                  </m:math>
               </p>
               <p>If <it>u </it>were to participate in any clique in <it>G</it>, it cannot possibly contribute more than <it>F</it><sub><it>u </it></sub>to the weight of the clique. Similarly, let <graphic file="1748-7188-1-13-i7.gif"/> be the value of the best possible <it>star</it>(<it>u</it>) among all <it>u </it>&#8712; <it>V</it><sub><it>i</it></sub>:</p>
               <p>
                  <m:math name="1748-7188-1-13-i8" xmlns:m="http://www.w3.org/1998/Math/MathML">
                     <m:semantics>
                        <m:mrow>
                           <m:msubsup>
                              <m:mi>F</m:mi>
                              <m:mi>i</m:mi>
                              <m:mo>&#8727;</m:mo>
                           </m:msubsup>
                           <m:mo>=</m:mo>
                           <m:munder>
                              <m:mrow>
                                 <m:mi>max</m:mi>
                                 <m:mo>&#8289;</m:mo>
                              </m:mrow>
                              <m:mrow>
                                 <m:mi>u</m:mi>
                                 <m:mo>&#8712;</m:mo>
                                 <m:msub>
                                    <m:mi>V</m:mi>
                                    <m:mi>i</m:mi>
                                 </m:msub>
                              </m:mrow>
                           </m:munder>
                           <m:msub>
                              <m:mi>F</m:mi>
                              <m:mi>u</m:mi>
                           </m:msub>
                           <m:mtext>&#160;&#160;&#160;&#160;&#160;</m:mtext>
                           <m:mrow>
                              <m:mo>(</m:mo>
                              <m:mn>2</m:mn>
                              <m:mo>)</m:mo>
                           </m:mrow>
                        </m:mrow>
                        <m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacqWGgbGrdaqhaaWcbaGaemyAaKgabaGaey4fIOcaaOGaeyypa0ZaaCbeaeaacyGGTbqBcqGGHbqycqGG4baEaSqaaiabdwha1jabgIGiolabdAfawnaaBaaameaacqWGPbqAaeqaaaWcbeaakiabdAeagnaaBaaaleaacqWG1bqDaeqaaOGaaCzcaiaaxMaadaqadiqaaiabikdaYaGaayjkaiaawMcaaaaa@41EF@</m:annotation>
                     </m:semantics>
                  </m:math>
               </p>
               <p><m:math name="1748-7188-1-13-i7" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mrow><m:msubsup><m:mi>F</m:mi><m:mi>i</m:mi><m:mo>&#8727;</m:mo></m:msubsup></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacqWGgbGrdaqhaaWcbaGaemyAaKgabaGaey4fIOcaaaaa@3038@</m:annotation></m:semantics></m:math> is an upper bound on what any vertex in <it>V</it><sub><it>i </it></sub>can contribute to any alignment.</p>
               <p>Now, if <it>F</it><sub><it>z</it></sub>, the most a vertex <it>z </it>&#8712; <it>V</it><sub><it>k </it></sub>can contribute to a clique, assuming the best possible contributions from all other graph parts, is insufficient compared to the value <it>C</it>* of an existing clique, i.e. if</p>
               <p>
                  <m:math name="1748-7188-1-13-i9" xmlns:m="http://www.w3.org/1998/Math/MathML">
                     <m:semantics>
                        <m:mrow>
                           <m:msub>
                              <m:mi>F</m:mi>
                              <m:mi>z</m:mi>
                           </m:msub>
                           <m:mo>&lt;</m:mo>
                           <m:mn>2</m:mn>
                           <m:mo>&#215;</m:mo>
                           <m:msup>
                              <m:mi>C</m:mi>
                              <m:mo>&#8727;</m:mo>
                           </m:msup>
                           <m:mo>&#8722;</m:mo>
                           <m:mstyle displaystyle="true">
                              <m:munder>
                                 <m:mo>&#8721;</m:mo>
                                 <m:mrow>
                                    <m:mi>i</m:mi>
                                    <m:mo>&#8800;</m:mo>
                                    <m:mi>k</m:mi>
                                 </m:mrow>
                              </m:munder>
                              <m:mrow>
                                 <m:msubsup>
                                    <m:mi>F</m:mi>
                                    <m:mi>i</m:mi>
                                    <m:mo>&#8727;</m:mo>
                                 </m:msubsup>
                              </m:mrow>
                           </m:mstyle>
                           <m:mo>,</m:mo>
                           <m:mtext>&#160;&#160;&#160;&#160;&#160;</m:mtext>
                           <m:mrow>
                              <m:mo>(</m:mo>
                              <m:mn>3</m:mn>
                              <m:mo>)</m:mo>
                           </m:mrow>
                        </m:mrow>
                        <m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacqWGgbGrdaWgaaWcbaGaemOEaOhabeaakiabgYda8iabikdaYiabgEna0kabdoeadnaaCaaaleqabaGaey4fIOcaaOGaeyOeI0YaaabuaeaacqWGgbGrdaqhaaWcbaGaemyAaKgabaGaey4fIOcaaaqaaiabdMgaPjabgcMi5kabdUgaRbqab0GaeyyeIuoakiabcYcaSiaaxMaacaWLjaWaaeWaceaacqaIZaWmaiaawIcacaGLPaaaaaa@4575@</m:annotation>
                     </m:semantics>
                  </m:math>
               </p>
               <p><it>z </it>can be discarded. The clique value <it>C</it>* is used with a factor of 2 since two edges are accounted for between every pair of graph parts in the above inequality.</p>
               <p>In fact, the values of <m:math name="1748-7188-1-13-i7" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mrow><m:msubsup><m:mi>F</m:mi><m:mi>i</m:mi><m:mo>&#8727;</m:mo></m:msubsup></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacqWGgbGrdaqhaaWcbaGaemyAaKgabaGaey4fIOcaaaaa@3038@</m:annotation></m:semantics></m:math> are further constrained by requiring a connection to <it>z </it>when <it>z </it>is under consideration. That is, when considering a node <it>z </it>&#8712; <it>V</it><sub><it>k </it></sub>to eliminate, and calculating <m:math name="1748-7188-1-13-i7" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mrow><m:msubsup><m:mi>F</m:mi><m:mi>i</m:mi><m:mo>&#8727;</m:mo></m:msubsup></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacqWGgbGrdaqhaaWcbaGaemyAaKgabaGaey4fIOcaaaaa@3038@</m:annotation></m:semantics></m:math> according to Equation 2 among all possible <it>u </it>&#8712; <it>V</it><sub><it>i</it></sub>, the <it>F</it><sub><it>u </it></sub>of Equation 1 is instead computed as:</p>
               <p>
                  <m:math name="1748-7188-1-13-i10" xmlns:m="http://www.w3.org/1998/Math/MathML">
                     <m:semantics>
                        <m:mrow>
                           <m:msub>
                              <m:mi>F</m:mi>
                              <m:mi>u</m:mi>
                           </m:msub>
                           <m:mo>=</m:mo>
                           <m:msub>
                              <m:mi>w</m:mi>
                              <m:mrow>
                                 <m:mi>z</m:mi>
                                 <m:mi>u</m:mi>
                              </m:mrow>
                           </m:msub>
                           <m:mo>+</m:mo>
                           <m:mstyle displaystyle="true">
                              <m:munder>
                                 <m:mo>&#8721;</m:mo>
                                 <m:mrow>
                                    <m:mi>j</m:mi>
                                    <m:mo>&#8800;</m:mo>
                                    <m:mi>i</m:mi>
                                    <m:mo>,</m:mo>
                                    <m:mi>k</m:mi>
                                 </m:mrow>
                              </m:munder>
                              <m:mrow>
                                 <m:munder>
                                    <m:mrow>
                                       <m:mi>max</m:mi>
                                       <m:mo>&#8289;</m:mo>
                                    </m:mrow>
                                    <m:mrow>
                                       <m:mi>v</m:mi>
                                       <m:mo>&#8712;</m:mo>
                                       <m:msub>
                                          <m:mi>V</m:mi>
                                          <m:mi>j</m:mi>
                                       </m:msub>
                                    </m:mrow>
                                 </m:munder>
                              </m:mrow>
                           </m:mstyle>
                           <m:mtext>&#160;</m:mtext>
                           <m:msub>
                              <m:mi>w</m:mi>
                              <m:mrow>
                                 <m:mi>u</m:mi>
                                 <m:mi>v</m:mi>
                              </m:mrow>
                           </m:msub>
                           <m:mtext>&#160;&#160;&#160;&#160;&#160;</m:mtext>
                           <m:mrow>
                              <m:mo>(</m:mo>
                              <m:mn>4</m:mn>
                              <m:mo>)</m:mo>
                           </m:mrow>
                        </m:mrow>
                        <m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacqWGgbGrdaWgaaWcbaGaemyDauhabeaakiabg2da9iabdEha3naaBaaaleaacqWG6bGEcqWG1bqDaeqaaOGaey4kaSYaaabuaeaadaWfqaqaaiGbc2gaTjabcggaHjabcIha4bWcbaGaemODayNaeyicI4SaemOvay1aaSbaaWqaaiabdQgaQbqabaaaleqaaaqaaiabdQgaQjabgcMi5kabdMgaPjabcYcaSiabdUgaRbqab0GaeyyeIuoakiabbccaGiabdEha3naaBaaaleaacqWG1bqDcqWG2bGDaeqaaOGaaCzcaiaaxMaadaqadiqaaiabisda0aGaayjkaiaawMcaaaaa@5212@</m:annotation>
                     </m:semantics>
                  </m:math>
               </p>
               <p>The value of <it>C</it>* can be computed from any "good" alignment. We use the weight of the clique imposed by the best overall <it>star</it>.</p>
            </sec>
            <sec>
               <st>
                  <p>Tighter constraints for clique-bounds DEE</p>
               </st>
               <p>For a vertex <it>u </it>&#8712; <it>V</it><sub><it>i </it></sub>and every other <it>V</it><sub><it>j</it></sub>, an edge has to connect <it>u </it>to some <it>v </it>&#8712; <it>V</it><sub><it>j </it></sub>in any alignment. When calculating <it>F</it><sub><it>u</it></sub>, we can constrain its value by considering three-way alignments and requiring that the vertices in the best <it>star</it>(<it>u</it>) chosen as neighbors of <it>u </it>in graph parts other than <it>V</it><sub><it>j </it></sub>are also good matches to <it>v</it>. Performing this computation for every pair of <it>u</it>, <it>V</it><sub><it>j </it></sub>and considering every edge incident on <it>u </it>would be too costly. Therefore, we only consider such three-way alignments for every vertex <it>u </it>&#8712; <it>V</it><sub><it>i </it></sub>and the next part <it>V</it><sub><it>i</it>+1 </sub>of the graph (with the last and first parts paired). Essentially, this procedure shifts the emphasis onto edges, allowing better alignments and bounds, and yet eliminates vertices by considering the best edge incident on it.</p>
               <p>For a given edge (<it>u</it>, <it>v</it>) with endpoints <it>u </it>&#8712; <it>V</it><sub><it>i </it></sub>and <it>v </it>&#8712; <it>V</it><sub><it>i</it>+1 </sub>we consider an adjacent <it>double star </it>with two centers at <it>u </it>and <it>v</it>, and sharing all the endpoints <it>x</it><sub><it>j </it></sub>in the other graph parts, denoted as <it>dstar</it>(<it>u</it>, <it>v</it>); the weight of such a <it>dstar</it>(<it>u</it>, <it>v</it>) is <m:math name="1748-7188-1-13-i11" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mrow><m:mn>2</m:mn><m:msub><m:mi>w</m:mi><m:mrow><m:mi>u</m:mi><m:mi>v</m:mi></m:mrow></m:msub><m:mo>+</m:mo><m:mstyle displaystyle="true"><m:msubsup><m:mo>&#8721;</m:mo><m:mrow><m:mtable><m:mtr><m:mtd><m:mrow><m:mi>j</m:mi><m:mo>&#8800;</m:mo><m:mi>i</m:mi></m:mrow></m:mtd></m:mtr><m:mtr><m:mtd><m:mrow><m:mi>j</m:mi><m:mo>&#8800;</m:mo><m:mi>i</m:mi><m:mo>+</m:mo><m:mn>1</m:mn></m:mrow></m:mtd></m:mtr></m:mtable></m:mrow><m:mrow/></m:msubsup><m:mrow><m:mo stretchy="false">(</m:mo><m:msub><m:mi>w</m:mi><m:mrow><m:mi>u</m:mi><m:msub><m:mi>x</m:mi><m:mi>j</m:mi></m:msub></m:mrow></m:msub><m:mo>+</m:mo><m:msub><m:mi>w</m:mi><m:mrow><m:mi>v</m:mi><m:msub><m:mi>x</m:mi><m:mi>j</m:mi></m:msub></m:mrow></m:msub><m:mo stretchy="false">)</m:mo></m:mrow></m:mstyle></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacqaIYaGmcqWG3bWDdaWgaaWcbaGaemyDauNaemODayhabeaakiabgUcaRmaaqadabaGaeiikaGIaem4DaC3aaSbaaSqaaiabdwha1jabdIha4naaBaaameaacqWGQbGAaeqaaaWcbeaakiabgUcaRiabdEha3naaBaaaleaacqWG2bGDcqWG4baEdaWgaaadbaGaemOAaOgabeaaaSqabaGccqGGPaqkaSqaauaabeqaceaaaeaacqWGQbGAcqGHGjsUcqWGPbqAaeaacqWGQbGAcqGHGjsUcqWGPbqAcqGHRaWkcqaIXaqmaaaabaaaniabggHiLdaaaa@4EE6@</m:annotation></m:semantics></m:math>. Now consider a clique {<it>u</it><sub>1 </sub>&#8712; <it>V</it><sub>1</sub>, ..., <it>u</it><sub><it>N </it></sub>&#8712; <it>V</it><sub><it>N</it></sub>} of some value <it>C</it>*, and the sum of its <it>double stars:</it></p>
               <p>
                  <m:math name="1748-7188-1-13-i12" xmlns:m="http://www.w3.org/1998/Math/MathML">
                     <m:semantics>
                        <m:mrow>
                           <m:mstyle displaystyle="true">
                              <m:munderover>
                                 <m:mo>&#8721;</m:mo>
                                 <m:mrow>
                                    <m:mi>i</m:mi>
                                    <m:mo>=</m:mo>
                                    <m:mn>1</m:mn>
                                 </m:mrow>
                                 <m:mi>N</m:mi>
                              </m:munderover>
                              <m:mrow>
                                 <m:mo stretchy="false">(</m:mo>
                                 <m:mn>2</m:mn>
                                 <m:msub>
                                    <m:mi>w</m:mi>
                                    <m:mrow>
                                       <m:msub>
                                          <m:mi>u</m:mi>
                                          <m:mi>i</m:mi>
                                       </m:msub>
                                       <m:msub>
                                          <m:mi>u</m:mi>
                                          <m:mrow>
                                             <m:mi>i</m:mi>
                                             <m:mo>+</m:mo>
                                             <m:mn>1</m:mn>
                                          </m:mrow>
                                       </m:msub>
                                    </m:mrow>
                                 </m:msub>
                                 <m:mo>+</m:mo>
                                 <m:mstyle displaystyle="true">
                                    <m:munder>
                                       <m:mo>&#8721;</m:mo>
                                       <m:mrow>
                                          <m:mtable>
                                             <m:mtr>
                                                <m:mtd>
                                                   <m:mrow>
                                                      <m:mi>j</m:mi>
                                                      <m:mo>&#8800;</m:mo>
                                                      <m:mi>i</m:mi>
                                                   </m:mrow>
                                                </m:mtd>
                                             </m:mtr>
                                             <m:mtr>
                                                <m:mtd>
                                                   <m:mrow>
                                                      <m:mi>j</m:mi>
                                                      <m:mo>&#8800;</m:mo>
                                                      <m:mi>i</m:mi>
                                                      <m:mo>+</m:mo>
                                                      <m:mn>1</m:mn>
                                                   </m:mrow>
                                                </m:mtd>
                                             </m:mtr>
                                          </m:mtable>
                                       </m:mrow>
                                    </m:munder>
                                    <m:mrow>
                                       <m:mo stretchy="false">(</m:mo>
                                       <m:msub>
                                          <m:mi>w</m:mi>
                                          <m:mrow>
                                             <m:msub>
                                                <m:mi>u</m:mi>
                                                <m:mi>j</m:mi>
                                             </m:msub>
                                             <m:msub>
                                                <m:mi>u</m:mi>
                                                <m:mi>i</m:mi>
                                             </m:msub>
                                          </m:mrow>
                                       </m:msub>
                                       <m:mo>+</m:mo>
                                       <m:msub>
                                          <m:mi>w</m:mi>
                                          <m:mrow>
                                             <m:msub>
                                                <m:mi>u</m:mi>
                                                <m:mi>j</m:mi>
                                             </m:msub>
                                             <m:msub>
                                                <m:mi>u</m:mi>
                                                <m:mrow>
                                                   <m:mi>i</m:mi>
                                                   <m:mo>+</m:mo>
                                                   <m:mn>1</m:mn>
                                                </m:mrow>
                                             </m:msub>
                                          </m:mrow>
                                       </m:msub>
                                       <m:mo stretchy="false">)</m:mo>
                                       <m:mo stretchy="false">)</m:mo>
                                    </m:mrow>
                                 </m:mstyle>
                              </m:mrow>
                           </m:mstyle>
                           <m:mo>=</m:mo>
                           <m:mn>2</m:mn>
                           <m:mstyle displaystyle="true">
                              <m:munderover>
                                 <m:mo>&#8721;</m:mo>
                                 <m:mrow>
                                    <m:mi>i</m:mi>
                                    <m:mo>=</m:mo>
                                    <m:mn>1</m:mn>
                                 </m:mrow>
                                 <m:mi>N</m:mi>
                              </m:munderover>
                              <m:mrow>
                                 <m:mstyle displaystyle="true">
                                    <m:munder>
                                       <m:mo>&#8721;</m:mo>
                                       <m:mrow>
                                          <m:mi>j</m:mi>
                                          <m:mo>&#8800;</m:mo>
                                          <m:mi>i</m:mi>
                                       </m:mrow>
                                    </m:munder>
                                    <m:mrow>
                                       <m:msub>
                                          <m:mi>w</m:mi>
                                          <m:mrow>
                                             <m:msub>
                                                <m:mi>u</m:mi>
                                                <m:mi>j</m:mi>
                                             </m:msub>
                                             <m:msub>
                                                <m:mi>u</m:mi>
                                                <m:mi>i</m:mi>
                                             </m:msub>
                                          </m:mrow>
                                       </m:msub>
                                    </m:mrow>
                                 </m:mstyle>
                              </m:mrow>
                           </m:mstyle>
                           <m:mtext>&#160;&#160;&#160;&#160;&#160;</m:mtext>
                           <m:mrow>
                              <m:mo>(</m:mo>
                              <m:mn>5</m:mn>
                              <m:mo>)</m:mo>
                           </m:mrow>
                        </m:mrow>
                        <m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaadaaeWbqaaiabcIcaOiabikdaYiabdEha3naaBaaaleaacqWG1bqDdaWgaaadbaGaemyAaKgabeaaliabdwha1naaBaaameaacqWGPbqAcqGHRaWkcqaIXaqmaeqaaaWcbeaakiabgUcaRmaaqafabaGaeiikaGIaem4DaC3aaSbaaSqaaiabdwha1naaBaaameaacqWGQbGAaeqaaSGaemyDau3aaSbaaWqaaiabdMgaPbqabaaaleqaaOGaey4kaSIaem4DaC3aaSbaaSqaaiabdwha1naaBaaameaacqWGQbGAaeqaaSGaemyDau3aaSbaaWqaaiabdMgaPjabgUcaRiabigdaXaqabaaaleqaaOGaeiykaKIaeiykaKcaleaafaqabeGabaaabaGaemOAaOMaeyiyIKRaemyAaKgabaGaemOAaOMaeyiyIKRaemyAaKMaey4kaSIaeGymaedaaaqab0GaeyyeIuoaaSqaaiabdMgaPjabg2da9iabigdaXaqaaiabd6eaobqdcqGHris5aOGaeyypa0JaeGOmaiZaaabCaeaadaaeqbqaaiabdEha3naaBaaaleaacqWG1bqDdaWgaaadbaGaemOAaOgabeaaliabdwha1naaBaaameaacqWGPbqAaeqaaaWcbeaaaeaacqWGQbGAcqGHGjsUcqWGPbqAaeqaniabggHiLdaaleaacqWGPbqAcqGH9aqpcqaIXaqmaeaacqWGobGta0GaeyyeIuoakiaaxMaacaWLjaWaaeWaceaacqaI1aqnaiaawIcacaGLPaaaaaa@7C24@</m:annotation>
                     </m:semantics>
                  </m:math>
               </p>
               <p>This sum is equal to 4<it>C</it>*, as each edge (<it>u</it><sub><it>i</it></sub>, <it>u</it><sub><it>j</it></sub>) is counted four times. We define <it>F</it><sub><it>uv </it></sub>with for an edge (<it>u</it>, <it>v</it>) with endpoints <it>u </it>&#8712; <it>V</it><sub><it>i </it></sub>and <it>v </it>&#8712; <it>V</it><sub><it>i</it>+1 </sub>as</p>
               <p>
                  <graphic file="1748-7188-1-13-i13.gif"/>
               </p>
               <p><it>F</it><sub><it>uv </it></sub>can be viewed as the weight of the best <it>dstar </it>centered at the pair of vertices <it>u</it>, <it>v </it>(or edge (<it>u</it>, <it>v</it>)) and it is the best possible contribution to any alignment, if the edge (<it>u</it>, <it>v</it>) was required to be a part of the alignment. We define <it>F</it><sub><it>u </it></sub>for <it>u </it>&#8712; <it>V</it><sub><it>i </it></sub>and <m:math name="1748-7188-1-13-i7" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mrow><m:msubsup><m:mi>F</m:mi><m:mi>i</m:mi><m:mo>&#8727;</m:mo></m:msubsup></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacqWGgbGrdaqhaaWcbaGaemyAaKgabaGaey4fIOcaaaaa@3038@</m:annotation></m:semantics></m:math> for part <it>i </it>similarly to the above definitions as</p>
               <p>
                  <m:math name="1748-7188-1-13-i14" xmlns:m="http://www.w3.org/1998/Math/MathML">
                     <m:semantics>
                        <m:mrow>
                           <m:msub>
                              <m:mi>F</m:mi>
                              <m:mi>u</m:mi>
                           </m:msub>
                           <m:mo>=</m:mo>
                           <m:munder>
                              <m:mrow>
                                 <m:mi>max</m:mi>
                                 <m:mo>&#8289;</m:mo>
                              </m:mrow>
                              <m:mrow>
                                 <m:mi>v</m:mi>
                                 <m:mo>&#8712;</m:mo>
                                 <m:msub>
                                    <m:mi>V</m:mi>
                                    <m:mrow>
                                       <m:mi>i</m:mi>
                                       <m:mo>+</m:mo>
                                       <m:mn>1</m:mn>
                                    </m:mrow>
                                 </m:msub>
                              </m:mrow>
                           </m:munder>
                           <m:msub>
                              <m:mi>F</m:mi>
                              <m:mrow>
                                 <m:mi>u</m:mi>
                                 <m:mi>v</m:mi>
                              </m:mrow>
                           </m:msub>
                           <m:mtext>&#160;&#160;&#160;&#160;&#160;</m:mtext>
                           <m:mrow>
                              <m:mo>(</m:mo>
                              <m:mn>7</m:mn>
                              <m:mo>)</m:mo>
                           </m:mrow>
                        </m:mrow>
                        <m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacqWGgbGrdaWgaaWcbaGaemyDauhabeaakiabg2da9maaxababaGagiyBa0MaeiyyaeMaeiiEaGhaleaacqWG2bGDcqGHiiIZcqWGwbGvdaWgaaadbaGaemyAaKMaey4kaSIaeGymaedabeaaaSqabaGccqWGgbGrdaWgaaWcbaGaemyDauNaemODayhabeaakiaaxMaacaWLjaWaaeWaceaacqaI3aWnaiaawIcacaGLPaaaaaa@446A@</m:annotation>
                     </m:semantics>
                  </m:math>
               </p>
               <p>
                  <m:math name="1748-7188-1-13-i15" xmlns:m="http://www.w3.org/1998/Math/MathML">
                     <m:semantics>
                        <m:mrow>
                           <m:msubsup>
                              <m:mi>F</m:mi>
                              <m:mi>i</m:mi>
                              <m:mo>&#8727;</m:mo>
                           </m:msubsup>
                           <m:mo>=</m:mo>
                           <m:munder>
                              <m:mrow>
                                 <m:mi>max</m:mi>
                                 <m:mo>&#8289;</m:mo>
                              </m:mrow>
                              <m:mrow>
                                 <m:mi>u</m:mi>
                                 <m:mo>&#8712;</m:mo>
                                 <m:msub>
                                    <m:mi>V</m:mi>
                                    <m:mi>i</m:mi>
                                 </m:msub>
                              </m:mrow>
                           </m:munder>
                           <m:msub>
                              <m:mi>F</m:mi>
                              <m:mi>u</m:mi>
                           </m:msub>
                           <m:mtext>&#160;&#160;&#160;&#160;&#160;</m:mtext>
                           <m:mrow>
                              <m:mo>(</m:mo>
                              <m:mn>8</m:mn>
                              <m:mo>)</m:mo>
                           </m:mrow>
                        </m:mrow>
                        <m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacqWGgbGrdaqhaaWcbaGaemyAaKgabaGaey4fIOcaaOGaeyypa0ZaaCbeaeaacyGGTbqBcqGGHbqycqGG4baEaSqaaiabdwha1jabgIGiolabdAfawnaaBaaameaacqWGPbqAaeqaaaWcbeaakiabdAeagnaaBaaaleaacqWG1bqDaeqaaOGaaCzcaiaaxMaadaqadiqaaiabiIda4aGaayjkaiaawMcaaaaa@41FB@</m:annotation>
                     </m:semantics>
                  </m:math>
               </p>
               <p><it>F</it><sub><it>u </it></sub>is the value of the best <it>dstar </it>centered on vertex <it>u </it>&#8712; <it>V</it><sub><it>i </it></sub>and some vertex <it>v </it>&#8712; <it>V</it><sub><it>i</it>+1</sub>, and <m:math name="1748-7188-1-13-i7" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mrow><m:msubsup><m:mi>F</m:mi><m:mi>i</m:mi><m:mo>&#8727;</m:mo></m:msubsup></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacqWGgbGrdaqhaaWcbaGaemyAaKgabaGaey4fIOcaaaaa@3038@</m:annotation></m:semantics></m:math> is the value of the best <it>dstar </it>centered on any pair of vertices <it>u </it>&#8712; <it>V</it><sub><it>i </it></sub>and <it>v </it>&#8712; <it>V</it><sub><it>i</it>+1</sub>.</p>
               <p>For any clique {<it>u</it><sub>1 </sub>&#8712; <it>V</it><sub>1</sub>, ..., <it>u</it><sub><it>N </it></sub>&#8712; <it>V</it><sub><it>N</it></sub>} of value <it>C</it>* in the graph, by Equations 5&#8211;8 we have</p>
               <p>
                  <m:math name="1748-7188-1-13-i16" xmlns:m="http://www.w3.org/1998/Math/MathML">
                     <m:semantics>
                        <m:mrow>
                           <m:mn>4</m:mn>
                           <m:msup>
                              <m:mi>C</m:mi>
                              <m:mo>&#8727;</m:mo>
                           </m:msup>
                           <m:mo>&#8804;</m:mo>
                           <m:mstyle displaystyle="true">
                              <m:munder>
                                 <m:mo>&#8721;</m:mo>
                                 <m:mrow>
                                    <m:mi>i</m:mi>
                                    <m:mo>=</m:mo>
                                    <m:mn>1...</m:mn>
                                    <m:mi>N</m:mi>
                                 </m:mrow>
                              </m:munder>
                              <m:mrow>
                                 <m:msub>
                                    <m:mi>F</m:mi>
                                    <m:mrow>
                                       <m:msub>
                                          <m:mi>u</m:mi>
                                          <m:mi>i</m:mi>
                                       </m:msub>
                                       <m:msub>
                                          <m:mi>u</m:mi>
                                          <m:mrow>
                                             <m:mi>i</m:mi>
                                             <m:mo>+</m:mo>
                                             <m:mn>1</m:mn>
                                          </m:mrow>
                                       </m:msub>
                                    </m:mrow>
                                 </m:msub>
                              </m:mrow>
                           </m:mstyle>
                           <m:mo>&#8804;</m:mo>
                           <m:mstyle displaystyle="true">
                              <m:munder>
                                 <m:mo>&#8721;</m:mo>
                                 <m:mrow>
                                    <m:mi>i</m:mi>
                                    <m:mo>=</m:mo>
                                    <m:mn>1...</m:mn>
                                    <m:mi>N</m:mi>
                                 </m:mrow>
                              </m:munder>
                              <m:mrow>
                                 <m:msub>
                                    <m:mi>F</m:mi>
                                    <m:mrow>
                                       <m:msub>
                                          <m:mi>u</m:mi>
                                          <m:mi>i</m:mi>
                                       </m:msub>
                                    </m:mrow>
                                 </m:msub>
                              </m:mrow>
                           </m:mstyle>
                           <m:mo>&#8804;</m:mo>
                           <m:mstyle displaystyle="true">
                              <m:munder>
                                 <m:mo>&#8721;</m:mo>
                                 <m:mrow>
                                    <m:mi>i</m:mi>
                                    <m:mo>=</m:mo>
                                    <m:mn>1...</m:mn>
                                    <m:mi>N</m:mi>
                                 </m:mrow>
                              </m:munder>
                              <m:mrow>
                                 <m:msubsup>
                                    <m:mi>F</m:mi>
                                    <m:mi>i</m:mi>
                                    <m:mo>&#8727;</m:mo>
                                 </m:msubsup>
                              </m:mrow>
                           </m:mstyle>
                           <m:mtext>&#160;&#160;&#160;&#160;&#160;</m:mtext>
                           <m:mrow>
                              <m:mo>(</m:mo>
                              <m:mn>9</m:mn>
                              <m:mo>)</m:mo>
                           </m:mrow>
                        </m:mrow>
                        <m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacqaI0aancqWGdbWqdaahaaWcbeqaaiabgEHiQaaakiabgsMiJoaaqafabaGaemOray0aaSbaaSqaaiabdwha1naaBaaameaacqWGPbqAaeqaaSGaemyDau3aaSbaaWqaaiabdMgaPjabgUcaRiabigdaXaqabaaaleqaaaqaaiabdMgaPjabg2da9iabigdaXiabc6caUiabc6caUiabc6caUiabd6eaobqab0GaeyyeIuoakiabgsMiJoaaqafabaGaemOray0aaSbaaSqaaiabdwha1naaBaaameaacqWGPbqAaeqaaaWcbeaaaeaacqWGPbqAcqGH9aqpcqaIXaqmcqGGUaGlcqGGUaGlcqGGUaGlcqWGobGtaeqaniabggHiLdGccqGHKjYOdaaeqbqaaiabdAeagnaaDaaaleaacqWGPbqAaeaacqGHxiIkaaaabaGaemyAaKMaeyypa0JaeGymaeJaeiOla4IaeiOla4IaeiOla4IaemOta4eabeqdcqGHris5aOGaaCzcaiaaxMaadaqadiqaaiabiMda5aGaayjkaiaawMcaaaaa@6583@</m:annotation>
                     </m:semantics>
                  </m:math>
               </p>
               <p>Then Equation 3, with 2<it>C</it>* replaced by 4<it>C</it>*, can be used to eliminate vertices in the same way as before, eliminating a vertex <it>z </it>in a particular graph part if <it>F</it><sub><it>z</it></sub>, the value of its best adjacent <it>dstar</it>, is insufficient considering best possible contributions from all other graph parts. For best pruning results the value of <it>C</it>* should be as high as possible; we choose <it>C</it>* as the clique weight induced by the best overall <it>dstar</it>.</p>
            </sec>
            <sec>
               <st>
                  <p>Graph decomposition</p>
               </st>
               <p>We also use a divide-and-conquer graph decomposition approach for pruning vertices. For every graph part <it>i </it>and vertex <it>u </it>&#8712; <it>V</it><sub><it>i </it></sub>we consider induced subgraphs <it>G</it><sup><it>u </it></sup>= (<it>V</it><sup><it>u</it></sup>, <it>E</it><sup><it>u</it></sup>) in turn, where <it>V</it><sup><it>u </it></sup>= <it>u </it>&#8746; <it>V</it>\<it>V</it><sub><it>i</it></sub>. Application of the <it>clique-bounds </it>DEE technique to graphs <it>G</it><sup><it>u </it></sup>is very effective since one of the graph parts, <m:math name="1748-7188-1-13-i17" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mrow><m:msubsup><m:mi>G</m:mi><m:mi>i</m:mi><m:mi>u</m:mi></m:msubsup></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacqWGhbWrdaqhaaWcbaGaemyAaKgabaGaemyDauhaaaaa@30BE@</m:annotation></m:semantics></m:math> contains only one vertex, <it>u</it>, and all the <it>F </it>and <it>F</it>* values that need to be recomputed for the new graph <it>G</it><sup><it>u </it></sup>are greatly constrained. The process of updating the <it>F </it>and <it>F</it>* values is efficient as the changes are localized to one part in the graph. Importantly, the best known clique value <it>C</it>* remains intact, since the clique of that larger value exists in the original graph and can be used for the decomposed one, helping to eliminate vertices. For some of the vertices <it>u</it>, iterative application of the DEE criterion and re-computation of the <it>F </it>and <it>F</it>* values causes <it>G</it><sup><it>u </it></sup>to become disconnected, implying that vertex <it>u </it>cannot be part of the optimal alignment. Such a vertex <it>u </it>is marked for deletion, and that information is propagated to all subsequently considered induced subgraphs, further constraining the corresponding <it>F </it>and <it>F</it>* values and helping to eliminate other vertices in turn.</p>
            </sec>
         </sec>
         <sec>
            <st>
               <p>Statistical significance</p>
            </st>
            <p>Once we have found a motif of a particular SP-score, we evaluate its statistical significance by calculating the number of motifs of equal or better quality expected to occur in random data with the same characteristics. Let the score of the motif of length <it>l </it>in question be denoted by <it>s</it>, and let <it>f</it>(<it>b</it>) be the zero-corrected background frequency of nucleotide (or residue) <it>b </it>in the input sequences, and <it>sim</it>(<it>b</it><sub>1</sub>, <it>b</it><sub>2</sub>) be the integral score computed for all residue pairs as above. We compute <it>P</it><sub><it>l</it></sub>(<it>X</it>), the probability distribution of scores for a motif of length <it>l </it>in <it>N </it>sequences, in the first two steps of the following, and infer the e-value of score <it>s </it>in the last two:</p>
            <p>1. Calculate the exact probability distribution <it>P</it><sub>1</sub>(<it>X</it>) for a single column of <it>N </it>random residues. We use the multinomial distribution to compute the probability of observing every combination of bases (or residues) in the column according to the background distribution, and calculate the corresponding SP-score for the column. We then add probabilities for the same scores resulting from different base combinations. To make the computation feasible for the protein alphabet and for large numbers of sequences, we calculate the scores and probabilities in such an order that every new score and probability is computable from the previous one by a local update operation.</p>
            <p>2. Calculate the probability distribution <it>P</it><sub><it>l</it></sub>(<it>X</it>) for <it>l </it>random columns by convolution of <it>P</it><sub>1</sub>(<it>X</it>)as in <abbrgrp><abbr bid="B38">38</abbr></abbrgrp>, where we inductively construct a distribution for <it>i </it>columns based on the distribution for <it>i </it>- 1 columns, <it>P</it><sub><it>i</it>-1</sub>(<it>X</it>), and the single column distribution <it>P</it><sub>1</sub>(<it>X</it>).</p>
            <p>3. For a given score <it>s </it>of interest, we calculate the probability that an <it>l</it>-long pattern has score greater than or equal to <it>s </it>by chance alone. This probability is &#8721;<sub><it>x</it>>=<it>s </it></sub>P<sub><it>l</it></sub>(<it>x</it>).</p>
            <p>4. Finally, we compute the total number of possible motifs of length <it>l </it>in the data. If the sequences have lengths <it>L</it><sub>1</sub>, ..., <it>L</it><sub><it>N</it></sub>, then the search space size <it>L </it>= &#8719;<sub><it>i </it></sub>(<it>L</it><sub><it>i </it></sub>- <it>l </it>+ 1). The expected number of alignments with score at least <it>s </it>by chance alone, or the e-value, is equal to <it>L</it>* &#8721;<sub><it>x</it>>=<it>s </it></sub>P<sub><it>l</it></sub>(<it>x</it>).</p>
         </sec>
         <sec>
            <st>
               <p>Overview of approach</p>
            </st>
            <p>Our basic LP/DEE approach is to: (1) formulate an instance of motif finding as a graph problem (2) apply the DEE techniques described above in the order of increasing complexity so as to prune the graph (3) use mathematical programming to find a solution to the smaller graph problem and (4) evaluate statistical significance.</p>
            <p>While applying DEE, if the size of the graph becomes small enough (set at 800 vertices for the described experiments), we submit the appropriate LP to the CPLEX LP solver and, if necessary, to the ILP solver. To reduce the graph to that necessary small size, we apply the DEE variants, running each one of them until either the specified graph size has been reached, or to convergence so that no further pruning is possible. In particular, we first attempt to prune the graph using <it>basic clique-bounds DEE</it>, then we consider tighter bound computations, and lastly we employ <it>graph decomposition </it>in conjunction with the DEE methods.</p>
            <p>In rare cases the optimality-preserving DEE procedures are unable to prune the graph, and we perform what we call speculative pruning using higher <it>C</it>* values, which do not necessarily correspond to known cliques in graph <it>G</it>. Three outcomes of such pruning are possible: (i) The graph is eliminated completely. This guarantees that a clique of value <it>C</it>* does not exist in <it>G</it>. (ii) The pruning proves once again insufficient to reduce the graph. (iii) The pruning procedure converges to a small graph. We search the space of possible <it>C</it>* values until we find one that produces outcome (iii). To identify such a value we first translate the possible clique scores into their corresponding e-values, and then perform binary search on the e-value exponent. This method converges quickly, typically locating an appropriate <it>C</it>* in fewer than 10 iterations. If the optimal solution for the final reduced graph is better than the <it>C</it>* used in pruning, then it is also optimal for the original graph. Otherwise, the e-value corresponding to <it>C</it>* provides us with a lower bound on the significance of the actual optimal solution.</p>
         </sec>
         <sec>
            <st>
               <p>Extensions for other motif finding frameworks</p>
            </st>
            <sec>
               <st>
                  <p>Phylogenetic footprinting</p>
               </st>
               <p>An increasingly common way of finding regulatory sites is to look for them among upstream regions of a set of orthologous genes across species (e.g., <abbrgrp><abbr bid="B9">9</abbr></abbrgrp>). In this case additional data, in the form of the phylogenetic tree relating the species, is available and can be exploited. This is especially important when closely related species are part of the input, and, unweighted, they contribute duplicate information and skew the alignment. We use a phylogenetic tree and branch lengths when calculating the edge weights in the graph, with highly diverged sequence pairs getting larger weights. The precise weighting scheme follows the ideas of weighted progressive alignment <abbrgrp><abbr bid="B42">42</abbr></abbrgrp>, in which weights <it>&#945;</it><sub><it>i </it></sub>are computed for every sequence <it>i</it>. The calculation sums branch lengths along the path from the tree root to the sequence at the leaf, splitting shared branches among the descendant leaves, and thereby reducing the weight for related sequences. In essence, we solve a multiple sequence alignment problem with weighted SP-score using match/mismatch, where the computed weight for a pair of positions in sequences <it>i </it>and <it>j </it>is multiplied by <it>&#945;</it><sub><it>i </it></sub>&#215; <it>&#945;</it><sub><it>j</it></sub>. The rest of the algorithm operates as in the basic motif finding case above, employing the same LP formulation and DEE techniques.</p>
            </sec>
            <sec>
               <st>
                  <p>Subtle motifs</p>
               </st>
               <p>Another widely studied formulation of motif finding is the 'subtle' motifs formulation <abbrgrp><abbr bid="B17">17</abbr></abbrgrp>, in which an unknown pattern of a length <it>l </it>is implanted with <it>d </it>modifications into each of the input sequences. The graph version of the problem remains the same except that edges only exist between two vertices that correspond to subsequences whose Hamming distance is at most 2<it>d </it>(since otherwise they cannot both be implanted instances of the same pattern). Edges can either be unweighted, or weighted by the number of mismatches between the corresponding subsequences. Either is easily modeled via slight modification of the ILP given earlier (with variables corresponding to non-existent edges removed, and summations in the <it>edge </it>constraints taken only over existing edges), and the resulting ILP can be used in conjunction with the numerous graph pruning techniques previously developed for this problem (e.g. <abbrgrp><abbr bid="B17">17</abbr></abbrgrp>).</p>
            </sec>
            <sec>
               <st>
                  <p>Multiple motifs</p>
               </st>
               <p>Here we give extensions to address the issue of multiple motifs existing in a set of sequences. Discovery of distinct multiple motifs, such as sets of binding sites for two different transcription factors, can be done iteratively by first locating a single optimal motif, masking it out from the problem instance, and then looking for the next one. We mask the previous motif by deleting its solution vertices from the original graph, and then reapplying the LP/DEE techniques to locate the next optimal solution and its corresponding motif.</p>
               <p>To identify multiple occurrences of a motif in some of the input sequences, it is possible to iteratively solve several ILPs in order to find multiple near-optimal solutions, corresponding to the best cliques of successively decreasing total weights. At iteration <it>t</it>, we add <it>t </it>- 1 constraints to the ILP formulation so as to exclude all previously discovered solutions:</p>
               <p>
                  <m:math name="1748-7188-1-13-i18" xmlns:m="http://www.w3.org/1998/Math/MathML">
                     <m:semantics>
                        <m:mrow>
                           <m:mtable>
                              <m:mtr>
                                 <m:mtd>
                                    <m:mrow>
                                       <m:mstyle displaystyle="true">
                                          <m:munder>
                                             <m:mo>&#8721;</m:mo>
                                             <m:mrow>
                                                <m:mi>u</m:mi>
                                                <m:mo>&#8712;</m:mo>
                                                <m:msub>
                                                   <m:mi>S</m:mi>
                                                   <m:mi>k</m:mi>
                                                </m:msub>
                                             </m:mrow>
                                          </m:munder>
                                          <m:mrow>
                                             <m:msub>
                                                <m:mi>x</m:mi>
                                                <m:mi>u</m:mi>
                                             </m:msub>
                                             <m:mo>&#8804;</m:mo>
                                             <m:mi>N</m:mi>
                                             <m:mo>&#8722;</m:mo>
                                             <m:mn>1</m:mn>
                                          </m:mrow>
                                       </m:mstyle>
                                    </m:mrow>
                                 </m:mtd>
                                 <m:mtd>
                                    <m:mrow>
                                       <m:mtext>for&#160;</m:mtext>
                                       <m:mi>k</m:mi>
                                       <m:mo>=</m:mo>
                                       <m:mn>1</m:mn>
                                       <m:mo>,</m:mo>
                                       <m:mn>...</m:mn>
                                       <m:mo>,</m:mo>
                                       <m:mi>t</m:mi>
                                       <m:mo>&#8722;</m:mo>
                                       <m:mn>1</m:mn>
                                       <m:mo>,</m:mo>
                                    </m:mrow>
                                 </m:mtd>
                              </m:mtr>
                           </m:mtable>
                           <m:mtext>&#160;&#160;&#160;&#160;&#160;</m:mtext>
                           <m:mrow>
                              <m:mo>(</m:mo>
                              <m:mrow>
                                 <m:mn>10</m:mn>
                              </m:mrow>
                              <m:mo>)</m:mo>
                           </m:mrow>
                        </m:mrow>
                        <m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaafaqabeqacaaabaWaaabuaeaacqWG4baEdaWgaaWcbaGaemyDauhabeaakiabgsMiJkabd6eaojabgkHiTiabigdaXaWcbaGaemyDauNaeyicI4Saem4uam1aaSbaaWqaaiabdUgaRbqabaaaleqaniabggHiLdaakeaacqqGMbGzcqqGVbWBcqqGYbGCcqqGGaaicqWGRbWAcqGH9aqpcqaIXaqmcqGGSaalcqGGUaGlcqGGUaGlcqGGUaGlcqGGSaalcqWG0baDcqGHsislcqaIXaqmcqGGSaalaaGaaCzcaiaaxMaadaqadiqaaiabigdaXiabicdaWaGaayjkaiaawMcaaaaa@5202@</m:annotation>
                     </m:semantics>
                  </m:math>
               </p>
               <p>where <it>S</it><sub><it>k </it></sub>contains the optimal set of vertices found in iteration <it>k</it>. This requires that the new solution differs from all previous ones in at least one graph part. We note that to use this type of constraint for the basic formulation of the motif finding problem, the DEE methods given above have to be modified so as not to eliminate nodes taking part in near-optimal but not necessarily optimal solutions. For the subtle motifs problem, existing DEE methods (e.g., <abbrgrp><abbr bid="B17">17</abbr></abbrgrp>) only eliminate nodes and edges based on whether they can take part in any clique in the graph, and thus constraint 10 can be immediately applied to iteratively find cliques of successively decreasing weight.</p>
            </sec>
         </sec>
      </sec>
      <sec>
         <st>
            <p>Experimental results</p>
         </st>
         <p>We apply our LP/DEE approach to several motif finding problems. We attempt to discover motifs in instances arising from both DNA and protein sequence data, and compare them with known motifs and those found by other motif finding methods. We then consider the phylogenetic footprinting problem, and demonstrate the discovery of multiple motifs.</p>
         <sec>
            <st>
               <p>Protein motif finding</p>
            </st>
            <p>We study the performance of LP/DEE on a number of protein datasets with different characteristics (summarized in Table <tblr tid="T1">1</tblr>). The datasets are constructed from SwissProt <abbrgrp><abbr bid="B29">29</abbr></abbrgrp>, using the descriptions of <abbrgrp><abbr bid="B15">15</abbr></abbrgrp> for the first two datasets, <abbrgrp><abbr bid="B36">36</abbr></abbrgrp> for the next two, and <abbrgrp><abbr bid="B43">43</abbr></abbrgrp> for the last one. These datasets are highly variable in the number and length of their protein sequences, as well as in the degree of motif conservation. The motif length parameters are set based on the lengths described by the above authors, and the BLOSUM62 substitution matrix is used for all reported results.</p>
            <tbl id="T1">
               <title>
                  <p>Table 1</p>
               </title>
               <caption>
                  <p>Descriptions of protein datasets. # <it>Seq</it>. gives the number of input protein sequences. <it>Length </it>gives the length of the protein motif searched for. |<it>V</it>| gives the number of vertices in the original graph constructed from the dataset. <it>DEE </it>gives the methods used to prune the graph, and are denoted by (1) <it>clique-bounds </it>DEE, (2) tighter constrained bounds and (3) <it>graph decomposition</it>. |<it>V</it><sub><it>DEE</it></sub>| is the number of vertices in the graph after pruning. <it>E-value </it>lists the e-value of the motif found by the LP/DEE algorithm.</p>
               </caption>
               <tblbdy cols="7">
                  <r>
                     <c ca="left">
                        <p>
                           <b>Dataset</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p><b># Seq</b>.</p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>Length</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>|<it>V</it>|</b>
                        </p>
                     </c>
                     <c ca="left">
                        <p>
                           <b>DEE</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>|<it>V</it><sub>DEE</sub>|</b>
                        </p>
                     </c>
                     <c ca="left">
                        <p>
                           <b>E-value</b>
                        </p>
                     </c>
                  </r>
                  <r>
                     <c cspan="7">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Lipocalin</p>
                     </c>
                     <c ca="center">
                        <p>5</p>
                     </c>
                     <c ca="center">
                        <p>16</p>
                     </c>
                     <c ca="center">
                        <p>844</p>
                     </c>
                     <c ca="left">
                        <p>(1)</p>
                     </c>
                     <c ca="center">
                        <p>5</p>
                     </c>
                     <c ca="left">
                        <p>3.80 &#215; 10<sup>-16</sup></p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Helix-Turn-Helix</p>
                     </c>
                     <c ca="center">
                        <p>30</p>
                     </c>
                     <c ca="center">
                        <p>20</p>
                     </c>
                     <c ca="center">
                        <p>6870</p>
                     </c>
                     <c ca="left">
                        <p>(1,2,3)</p>
                     </c>
                     <c ca="center">
                        <p>260</p>
                     </c>
                     <c ca="left">
                        <p>3.88 &#215; 10<sup>-67</sup></p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Tumor Necrosis Factor</p>
                     </c>
                     <c ca="center">
                        <p>10</p>
                     </c>
                     <c ca="center">
                        <p>17</p>
                     </c>
                     <c ca="center">
                        <p>2329</p>
                     </c>
                     <c ca="left">
                        <p>(1)</p>
                     </c>
                     <c ca="center">
                        <p>10</p>
                     </c>
                     <c ca="left">
                        <p>1.50 &#215; 10<sup>-40</sup></p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Zinc Metallopeptidase</p>
                     </c>
                     <c ca="center">
                        <p>10</p>
                     </c>
                     <c ca="center">
                        <p>12</p>
                     </c>
                     <c ca="center">
                        <p>7761</p>
                     </c>
                     <c ca="left">
                        <p>(1,2)</p>
                     </c>
                     <c ca="center">
                        <p>10</p>
                     </c>
                     <c ca="left">
                        <p>5.82 &#215; 10<sup>-23</sup></p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Immunoglobulin Fold</p>
                     </c>
                     <c ca="center">
                        <p>18</p>
                     </c>
                     <c ca="center">
                        <p>10</p>
                     </c>
                     <c ca="center">
                        <p>7498</p>
                     </c>
                     <c ca="left">
                        <p>(1,2,3)</p>
                     </c>
                     <c ca="center">
                        <p>187</p>
                     </c>
                     <c ca="left">
                        <p>3.04 &#215; 10<sup>-24</sup></p>
                     </c>
                  </r>
               </tblbdy>
            </tbl>
            <p>For each of the test protein datasets, our approach uncovers the optimal solution according to the SP-measure. These discovered motifs correspond to those reported by <abbrgrp><abbr bid="B15">15</abbr><abbr bid="B36">36</abbr><abbr bid="B43">43</abbr></abbrgrp>, and their SP-scores are highly significant, with e-values less than 10<sup>-15 </sup>for all of them. As described by <abbrgrp><abbr bid="B15">15</abbr></abbrgrp>, the HTH dataset is very diverse, and the detection of the motif is a difficult task. Nonetheless, our HTH motif is identical to that of <abbrgrp><abbr bid="B15">15</abbr></abbrgrp>, and agrees with the known annotations in every sequence. We likewise find the lipocalin motif; it is a weak motif with few generally conserved residues that is in perfect correspondence with the known lipocalin signature. We also precisely recover the immunoglobulin fold, TNF and zinc metallopeptidase motifs. The protein datasets demonstrate the strength of our graph pruning techniques. The five datasets are of varying difficulty to solve, with some employing the basic <it>clique-bounds </it>DEE technique to prune the graphs, while others requiring more elaborate pruning that is constrained by three-way alignments (see Table <tblr tid="T1">1</tblr>). In each case, the size of the reduced graph is at least an order of magnitude smaller. For three of the five datasets, the pruning procedures alone are able to identify the underlying motifs.</p>
            <p>In contrast to <abbrgrp><abbr bid="B36">36</abbr></abbrgrp>, who limit sequence lengths to 500, we retain the original protein sequences, making the problem more difficult computationally. For example, the average sequence length in the zinc metallopeptidase dataset is approximately 800, and some sequences are as long as 1300 residues. The motif we recover is identical to the motif reported by <abbrgrp><abbr bid="B36">36</abbr></abbrgrp> in nine of ten sequences (see Additional Table <supplr sid="S1">1</supplr>); yet, with the difference in the last sequence, the motif discovered by our method is superior both in terms of sequence conservation and statistical significance (with an e-value of 5.7729 &#215; 10<sup>-23 </sup>for us vs 1.12155 &#215; 10<sup>-21 </sup>for <abbrgrp><abbr bid="B36">36</abbr></abbrgrp>).</p>
         </sec>
         <sec>
            <st>
               <p>Detecting bacterial regulatory elements</p>
            </st>
            <p>We apply our method to identify the binding sites of 36 <it>E.coli </it>regulatory proteins. We construct our dataset from that of <abbrgrp><abbr bid="B6">6</abbr><abbr bid="B28">28</abbr></abbrgrp>, as described in <abbrgrp><abbr bid="B32">32</abbr></abbrgrp>. For each binding site, we locate it within the genome and extract up to 600 bp of DNA sequence upstream from the gene it regulates. We remove binding sites for sigma factors, binding sites for transcription factors with fewer than three known sites, and those that could not be unambiguously located in the genome. Motif length parameters are set as reported by <abbrgrp><abbr bid="B28">28</abbr></abbrgrp>, except for <it>crp</it>, where a length of 18 instead of 22 is used. Background nucleotide frequencies are computed using the upstream regions for each dataset individually. The final dataset consists of 36 transcription factors, each regulating between 3 and 33 genes, with binding site length ranging between 11 and 48 (see Table <tblr tid="T2">2</tblr>).</p>
            <tbl id="T2">
               <title>
                  <p>Table 2</p>
               </title>
               <caption>
                  <p>Listing of the transcription factors' datasets (columns 1, 2, and 3) and the results of motif finding by LP/DEE. <it>TF </it>is the transcription factor dataset. <it>Seq </it>is the number of input sequences. <it>Len </it>is the length of the motif searched for. The rest of the listed measures refer to the motifs discovered by the LP/DEE algorithm: <it>IC </it>is the average per-column information content [44]; <it>RE </it>is the average per-column relative entropy; <it>E-value </it>is the e-value, computed according to our statistical significance assessment; <it>nPC </it>is the nucleotide level performance coefficient; and <it>sSn </it>is the site level sensitivity. The four starred entries indicate potentially non-optimal solutions; entries marked with &#8224; indicated usage of the ILP solver.</p>
               </caption>
               <tblbdy cols="8">
                  <r>
                     <c ca="left">
                        <p>
                           <b>TF</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>Seq</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>Len</b>
                        </p>
                     </c>
                     <c ca="left">
                        <p>
                           <b>IC</b>
                        </p>
                     </c>
                     <c ca="left">
                        <p>
                           <b>RE</b>
                        </p>
                     </c>
                     <c ca="left">
                        <p>
                           <b>E-value</b>
                        </p>
                     </c>
                     <c ca="left">
                        <p>
                           <b>nPC</b>
                        </p>
                     </c>
                     <c ca="left">
                        <p>
                           <b>sSn</b>
                        </p>
                     </c>
                  </r>
                  <r>
                     <c cspan="8">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>ada</p>
                     </c>
                     <c ca="center">
                        <p>3</p>
                     </c>
                     <c ca="center">
                        <p>31</p>
                     </c>
                     <c ca="left">
                        <p>1.3000</p>
                     </c>
                     <c ca="left">
                        <p>1.0846</p>
                     </c>
                     <c ca="left">
                        <p>9.16 &#215; 10<sup>-1</sup></p>
                     </c>
                     <c ca="left">
                        <p>0.1341</p>
                     </c>
                     <c ca="left">
                        <p>0.33</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>araC</p>
                     </c>
                     <c ca="center">
                        <p>4</p>
                     </c>
                     <c ca="center">
                        <p>48</p>
                     </c>
                     <c ca="left">
                        <p>1.1437</p>
                     </c>
                     <c ca="left">
                        <p>0.9940</p>
                     </c>
                     <c ca="left">
                        <p>1.15 &#215; 10<sup>-3</sup></p>
                     </c>
                     <c ca="left">
                        <p>0.3474</p>
                     </c>
                     <c ca="left">
                        <p>0.50</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>arcA</p>
                     </c>
                     <c ca="center">
                        <p>11</p>
                     </c>
                     <c ca="center">
                        <p>15</p>
                     </c>
                     <c ca="left">
                        <p>1.2505</p>
                     </c>
                     <c ca="left">
                        <p>1.1992</p>
                     </c>
                     <c ca="left">
                        <p>4.31 &#215; 10<sup>-6</sup></p>
                     </c>
                     <c ca="left">
                        <p>0.4224</p>
                     </c>
                     <c ca="left">
                        <p>0.73</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>argR</p>
                     </c>
                     <c ca="center">
                        <p>8</p>
                     </c>
                     <c ca="center">
                        <p>18</p>
                     </c>
                     <c ca="left">
                        <p>1.2990</p>
                     </c>
                     <c ca="left">
                        <p>1.2149</p>
                     </c>
                     <c ca="left">
                        <p>1.30 &#215; 10<sup>-7</sup></p>
                     </c>
                     <c ca="left">
                        <p>0.2857</p>
                     </c>
                     <c ca="left">
                        <p>0.50</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>cpxR</p>
                     </c>
                     <c ca="center">
                        <p>7</p>
                     </c>
                     <c ca="center">
                        <p>15</p>
                     </c>
                     <c ca="left">
                        <p>1.3290</p>
                     </c>
                     <c ca="left">
                        <p>1.2337</p>
                     </c>
                     <c ca="left">
                        <p>1.09 &#215; 10<sup>-5</sup></p>
                     </c>
                     <c ca="left">
                        <p>0.5556</p>
                     </c>
                     <c ca="left">
                        <p>0.71</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>crp*&#8224;</p>
                     </c>
                     <c ca="center">
                        <p>33</p>
                     </c>
                     <c ca="center">
                        <p>18</p>
                     </c>
                     <c ca="left">
                        <p>0.7196</p>
                     </c>
                     <c ca="left">
                        <p>0.7045</p>
                     </c>
                     <c ca="left">
                        <p>3.08 &#215; 10<sup>-9</sup></p>
                     </c>
                     <c ca="left">
                        <p>0.5570</p>
                     </c>
                     <c ca="left">
                        <p>0.76</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>cytR</p>
                     </c>
                     <c ca="center">
                        <p>5</p>
                     </c>
                     <c ca="center">
                        <p>18</p>
                     </c>
                     <c ca="left">
                        <p>1.2317</p>
                     </c>
                     <c ca="left">
                        <p>1.1069</p>
                     </c>
                     <c ca="left">
                        <p>2.48 &#215; 10<sup>-1</sup></p>
                     </c>
                     <c ca="left">
                        <p>0.0588</p>
                     </c>
                     <c ca="left">
                        <p>0.20</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>dnaA</p>
                     </c>
                     <c ca="center">
                        <p>6</p>
                     </c>
                     <c ca="center">
                        <p>15</p>
                     </c>
                     <c ca="left">
                        <p>1.4535</p>
                     </c>
                     <c ca="left">
                        <p>1.3300</p>
                     </c>
                     <c ca="left">
                        <p>6.12 &#215; 10<sup>-6</sup></p>
                     </c>
                     <c ca="left">
                        <p>1.0000</p>
                     </c>
                     <c ca="left">
                        <p>1.00</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>fadR</p>
                     </c>
                     <c ca="center">
                        <p>5</p>
                     </c>
                     <c ca="center">
                        <p>17</p>
                     </c>
                     <c ca="left">
                        <p>1.3466</p>
                     </c>
                     <c ca="left">
                        <p>1.2074</p>
                     </c>
                     <c ca="left">
                        <p>1.33 &#215; 10<sup>-2</sup></p>
                     </c>
                     <c ca="left">
                        <p>0.5455</p>
                     </c>
                     <c ca="left">
                        <p>0.80</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>fis*</p>
                     </c>
                     <c ca="center">
                        <p>8</p>
                     </c>
                     <c ca="center">
                        <p>35</p>
                     </c>
                     <c ca="left">
                        <p>0.8927</p>
                     </c>
                     <c ca="left">
                        <p>0.8376</p>
                     </c>
                     <c ca="left">
                        <p>1.37 &#215; 10<sup>-6</sup></p>
                     </c>
                     <c ca="left">
                        <p>0.1966</p>
                     </c>
                     <c ca="left">
                        <p>0.38</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>flhCD</p>
                     </c>
                     <c ca="center">
                        <p>3</p>
                     </c>
                     <c ca="center">
                        <p>31</p>
                     </c>
                     <c ca="left">
                        <p>1.3942</p>
                     </c>
                     <c ca="left">
                        <p>1.1656</p>
                     </c>
                     <c ca="left">
                        <p>4.79 &#215; 10<sup>-3</sup></p>
                     </c>
                     <c ca="left">
                        <p>0.0000</p>
                     </c>
                     <c ca="left">
                        <p>0.00</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>fnr</p>
                     </c>
                     <c ca="center">
                        <p>10</p>
                     </c>
                     <c ca="center">
                        <p>22</p>
                     </c>
                     <c ca="left">
                        <p>1.1025</p>
                     </c>
                     <c ca="left">
                        <p>1.0476</p>
                     </c>
                     <c ca="left">
                        <p>1.85 &#215; 10<sup>-9</sup></p>
                     </c>
                     <c ca="left">
                        <p>0.6176</p>
                     </c>
                     <c ca="left">
                        <p>0.80</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>fruR</p>
                     </c>
                     <c ca="center">
                        <p>10</p>
                     </c>
                     <c ca="center">
                        <p>16</p>
                     </c>
                     <c ca="left">
                        <p>1.2094</p>
                     </c>
                     <c ca="left">
                        <p>1.1491</p>
                     </c>
                     <c ca="left">
                        <p>5.52 &#215; 10<sup>-8</sup></p>
                     </c>
                     <c ca="left">
                        <p>0.8182</p>
                     </c>
                     <c ca="left">
                        <p>0.90</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>fur</p>
                     </c>
                     <c ca="center">
                        <p>7</p>
                     </c>
                     <c ca="center">
                        <p>18</p>
                     </c>
                     <c ca="left">
                        <p>1.3285</p>
                     </c>
                     <c ca="left">
                        <p>1.2332</p>
                     </c>
                     <c ca="left">
                        <p>1.28 &#215; 10<sup>-8</sup></p>
                     </c>
                     <c ca="left">
                        <p>0.4237</p>
                     </c>
                     <c ca="left">
                        <p>0.71</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>galR</p>
                     </c>
                     <c ca="center">
                        <p>7</p>
                     </c>
                     <c ca="center">
                        <p>16</p>
                     </c>
                     <c ca="left">
                        <p>1.5445</p>
                     </c>
                     <c ca="left">
                        <p>1.4347</p>
                     </c>
                     <c ca="left">
                        <p>1.52 &#215; 10<sup>-16</sup></p>
                     </c>
                     <c ca="left">
                        <p>0.5034</p>
                     </c>
                     <c ca="left">
                        <p>0.71</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>glpR</p>
                     </c>
                     <c ca="center">
                        <p>4</p>
                     </c>
                     <c ca="center">
                        <p>20</p>
                     </c>
                     <c ca="left">
                        <p>1.4227</p>
                     </c>
                     <c ca="left">
                        <p>1.2441</p>
                     </c>
                     <c ca="left">
                        <p>2.63 &#215; 10<sup>-2</sup></p>
                     </c>
                     <c ca="left">
                        <p>0.5534</p>
                     </c>
                     <c ca="left">
                        <p>0.75</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>hns</p>
                     </c>
                     <c ca="center">
                        <p>5</p>
                     </c>
                     <c ca="center">
                        <p>11</p>
                     </c>
                     <c ca="left">
                        <p>1.5175</p>
                     </c>
                     <c ca="left">
                        <p>1.3660</p>
                     </c>
                     <c ca="left">
                        <p>2.25</p>
                     </c>
                     <c ca="left">
                        <p>0.0000</p>
                     </c>
                     <c ca="left">
                        <p>0.00</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>ihf*</p>
                     </c>
                     <c ca="center">
                        <p>19</p>
                     </c>
                     <c ca="center">
                        <p>48</p>
                     </c>
                     <c ca="left">
                        <p>0.3932</p>
                     </c>
                     <c ca="left">
                        <p>0.3859</p>
                     </c>
                     <c ca="left">
                        <p>2.26 &#215; 10<sup>+8</sup></p>
                     </c>
                     <c ca="left">
                        <p>0.0381</p>
                     </c>
                     <c ca="left">
                        <p>0.16</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>lexA</p>
                     </c>
                     <c ca="center">
                        <p>17</p>
                     </c>
                     <c ca="center">
                        <p>20</p>
                     </c>
                     <c ca="left">
                        <p>1.1481</p>
                     </c>
                     <c ca="left">
                        <p>1.1192</p>
                     </c>
                     <c ca="left">
                        <p>1.01 &#215; 10<sup>-40</sup></p>
                     </c>
                     <c ca="left">
                        <p>0.7215</p>
                     </c>
                     <c ca="left">
                        <p>0.88</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>lrp</p>
                     </c>
                     <c ca="center">
                        <p>4</p>
                     </c>
                     <c ca="center">
                        <p>25</p>
                     </c>
                     <c ca="left">
                        <p>1.2879</p>
                     </c>
                     <c ca="left">
                        <p>1.1237</p>
                     </c>
                     <c ca="left">
                        <p>6.44 &#215; 10<sup>-2</sup></p>
                     </c>
                     <c ca="left">
                        <p>0.0989</p>
                     </c>
                     <c ca="left">
                        <p>0.25</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>malT</p>
                     </c>
                     <c ca="center">
                        <p>6</p>
                     </c>
                     <c ca="center">
                        <p>10</p>
                     </c>
                     <c ca="left">
                        <p>1.5071</p>
                     </c>
                     <c ca="left">
                        <p>1.3815</p>
                     </c>
                     <c ca="left">
                        <p>1.73 &#215; 10<sup>-1</sup></p>
                     </c>
                     <c ca="left">
                        <p>0.0000</p>
                     </c>
                     <c ca="left">
                        <p>0.00</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>metJ</p>
                     </c>
                     <c ca="center">
                        <p>5</p>
                     </c>
                     <c ca="center">
                        <p>16</p>
                     </c>
                     <c ca="left">
                        <p>1.6842</p>
                     </c>
                     <c ca="left">
                        <p>1.5195</p>
                     </c>
                     <c ca="left">
                        <p>3.37 &#215; 10<sup>-12</sup></p>
                     </c>
                     <c ca="left">
   