<?xml version='1.0'?>
<!DOCTYPE art SYSTEM 'http://www.biomedcentral.com/xml/article.dtd'>
<art>
   <ui>1748-7188-2-15</ui>
   <ji>1748-7188</ji>
   <fm>
      <dochead>Research</dochead>
      <bibl>
         <title>
            <p>Efficient and accurate P-value computation for Position Weight Matrices</p>
         </title>
         <aug>
            <au id="A1" ca="yes">
               <snm>Touzet</snm>
               <fnm>H&#233;l&#232;ne</fnm>
               <insr iid="I1"/>
               <insr iid="I2"/>
               <email>helene.touzet@lifl.fr</email>
            </au>
            <au id="A2" ca="yes">
               <snm>Varr&#233;</snm>
               <fnm>Jean-St&#233;phane</fnm>
               <insr iid="I1"/>
               <insr iid="I2"/>
               <email>jean-stephane.varre@lifl.fr</email>
            </au>
         </aug>
         <insg>
            <ins id="I1">
               <p>LIFL, UMR CNRS 8022, Universit&#233; des Sciences et Technologies de Lille, 59655 Villeneuve d'Ascq, France</p>
            </ins>
            <ins id="I2">
               <p>INRIA, 40 avenue Halley, 59650 Villeneuve d'Ascq, France</p>
            </ins>
         </insg>
         <source>Algorithms for Molecular Biology</source>
         <issn>1748-7188</issn>
         <pubdate>2007</pubdate>
         <volume>2</volume>
         <issue>1</issue>
         <fpage>15</fpage>
         <url>http://www.almob.org/content/2/1/15</url>
         <xrefbib>
            <pubidlist>
               <pubid idtype="pmpid">18072973</pubid>
               <pubid idtype="doi">10.1186/1748-7188-2-15</pubid>
            </pubidlist>
         </xrefbib>
      </bibl>
      <history>
         <rec>
            <date>
               <day>06</day>
               <month>7</month>
               <year>2007</year>
            </date>
         </rec>
         <acc>
            <date>
               <day>11</day>
               <month>12</month>
               <year>2007</year>
            </date>
         </acc>
         <pub>
            <date>
               <day>11</day>
               <month>12</month>
               <year>2007</year>
            </date>
         </pub>
      </history>
      <cpyrt>
         <year>2007</year>
         <collab>Touzet and Varr&#233;; licensee BioMed Central Ltd.</collab>
         <note>This is an Open Access article distributed under the terms of the Creative Commons Attribution License (<url>http://creativecommons.org/licenses/by/2.0</url>), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</note>
      </cpyrt>
      <abs>
         <sec>
            <st>
               <p>Abstract</p>
            </st>
            <sec>
               <st>
                  <p>Background</p>
               </st>
               <p>Position Weight Matrices (PWMs) are probabilistic representations of signals in sequences. They are widely used to model approximate patterns in DNA or in protein sequences. The usage of PWMs needs as a prerequisite to knowing the statistical significance of a word according to its score. This is done by defining the P-value of a score, which is the probability that the background model can achieve a score larger than or equal to the observed value. This gives rise to the following problem: Given a P-value, find the corresponding score threshold. Existing methods rely on dynamic programming or probability generating functions. For many examples of PWMs, they fail to give accurate results in a reasonable amount of time.</p>
            </sec>
            <sec>
               <st>
                  <p>Results</p>
               </st>
               <p>The contribution of this paper is two fold. First, we study the theoretical complexity of the problem, and we prove that it is NP-hard. Then, we describe a novel algorithm that solves the P-value problem efficiently. The main idea is to use a series of discretized score distributions that improves the final result step by step until some convergence criterion is met. Moreover, the algorithm is capable of calculating the exact P-value without any error, even for matrices with non-integer coefficient values. The same approach is also used to devise an accurate algorithm for the reverse problem: finding the P-value for a given score. Both methods are implemented in a software called TFM-PVALUE, that is freely available.</p>
            </sec>
            <sec>
               <st>
                  <p>Conclusion</p>
               </st>
               <p>We have tested TFM-PVALUE on a large set of PWMs representing transcription factor binding sites. Experimental results show that it achieves better performance in terms of computational time and precision than existing tools.</p>
            </sec>
         </sec>
      </abs>
   </fm>
   <bdy>
      <sec>
         <st>
            <p>Background</p>
         </st>
         <p>A key problem in the understanding of gene regulation is the identification of transcription factor binding sites. Transcription factor binding sites are often modeled by <it>Position Weighted Matrices </it>(PWMs for short), also known as <it>Position Specific Scoring Matrices </it>(PSSMs for short), or simply <it>matrices</it>. Examples are to be found in the Jaspar <abbrgrp><abbr bid="B1">1</abbr></abbrgrp> or Transfac <abbrgrp><abbr bid="B2">2</abbr></abbrgrp> databases. The usage of such matrices goes with global bioinformatics strategies that help to elucidate regulation mechanisms: comparative genomics, identification of over-represented motifs, identification of correlation between binding sites, ... Similar matrix-based models also serve to represent splice sites in messenger RNAs <abbrgrp><abbr bid="B3">3</abbr></abbrgrp> or signatures in amino acid sequences <abbrgrp><abbr bid="B4">4</abbr></abbrgrp>.</p>
         <p>Matrices are probabilistic descriptions of approximate patterns. Given a finite alphabet &#931; and a positive integer <it>m</it>, a matrix <it>M </it>is a function from &#931;<sup><it>m </it></sup>to &#8477; that associates a score to each word of &#931;<sup><it>m</it></sup>. More precisely, it is indexed by {1,...,<it>m</it>} &#215; &#931;. Each column corresponds to a position in the motif and each row to a letter in the alphabet &#931;. The coefficient <it>M </it>(<it>i</it>, <it>x</it>) gives the score at position <it>i </it>in [1, <it>m</it>] for the letter <it>x </it>in &#931;. Given a string <it>u </it>in &#931;<sup><it>m</it></sup>, the <it>score </it>of <it>M </it>on <it>u </it>is defined as the sum of the scores of each character symbol of <it>u</it>:</p>
         <p>
            <display-formula>
               <m:math name="1748-7188-2-15-i1" xmlns:m="http://www.w3.org/1998/Math/MathML">
                  <m:semantics>
                     <m:mrow>
                        <m:mtext>Score</m:mtext>
                        <m:mo stretchy="false">(</m:mo>
                        <m:mi>u</m:mi>
                        <m:mo>,</m:mo>
                        <m:mi>M</m:mi>
                        <m:mo stretchy="false">)</m:mo>
                        <m:mo>=</m:mo>
                        <m:mstyle displaystyle="true">
                           <m:munderover>
                              <m:mo>&#8721;</m:mo>
                              <m:mrow>
                                 <m:mi>i</m:mi>
                                 <m:mo>=</m:mo>
                                 <m:mn>1</m:mn>
                              </m:mrow>
                              <m:mi>m</m:mi>
                           </m:munderover>
                           <m:mrow>
                              <m:mi>M</m:mi>
                              <m:mo stretchy="false">(</m:mo>
                              <m:mi>i</m:mi>
                              <m:mo>,</m:mo>
                              <m:msub>
                                 <m:mi>u</m:mi>
                                 <m:mi>i</m:mi>
                              </m:msub>
                              <m:mo stretchy="false">)</m:mo>
                           </m:mrow>
                        </m:mstyle>
                        <m:mo>,</m:mo>
                     </m:mrow>
                     <m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xI8qiVKYPFjYdHaVhbbf9v8qqaqFr0xc9vqFj0dXdbba91qpepeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGacaGaaiaabeqaaeqabiWaaaGcbaGaee4uamLaee4yamMaee4Ba8MaeeOCaiNaeeyzauMaeiikaGIaemyDauNaeiilaWIaemyta0KaeiykaKIaeyypa0ZaaabCaeaacqWGnbqtcqGGOaakcqWGPbqAcqGGSaalcqWG1bqDdaWgaaWcbaGaemyAaKgabeaakiabcMcaPaWcbaGaemyAaKMaeyypa0JaeGymaedabaGaemyBa0ganiabggHiLdGccqGGSaalaaa@48DE@</m:annotation>
                  </m:semantics>
               </m:math>
            </display-formula>
         </p>
         <p>where <it>u</it><sub><it>i </it></sub>denotes the character symbol at position <it>i </it>in <it>u</it>.</p>
         <p>Searching for occurrences of a matrix in a sequence requires to choose an appropriate score threshold to decide whether a position is relevant or not. Let <it>&#945; </it>be such a score. We say that the matrix <it>M </it>has an <it>occurrence </it>in the sequence <it>S </it>at position <it>i </it>if Score(<it>S</it><sub><it>i </it></sub>... <it>S</it><sub><it>i</it>+<it>m</it>-1</sub>, <it>M</it>) &#8805; <it>&#945;</it>. The problem of efficiently finding occurrences of a matrix in a text has recently attracted a lot of interest <abbrgrp><abbr bid="B5">5</abbr><abbr bid="B6">6</abbr><abbr bid="B7">7</abbr></abbrgrp>. Here we address the problem of computing the score threshold <it>&#945;</it>. To determine such a score threshold, the standard method is to use a P-value function, which gives the statistical significance of an occurrence according to its score. The P-value P-value(<it>M</it>, <it>&#945;</it>) is the probability that the background model can achieve a score equal to or greater than <it>&#945;</it>. In other words, the P-value is the proportion of strings (with respect to the background model) whose score is greater than the threshold <it>&#945; </it>for <it>M</it>. In <abbrgrp><abbr bid="B8">8</abbr></abbrgrp>, the authors introduce a generic approach to P-value computation for non-parametric models. In the context of matrices, the computation can be carried out using probability generating functions or dynamic programming <abbrgrp><abbr bid="B9">9</abbr><abbr bid="B10">10</abbr><abbr bid="B11">11</abbr><abbr bid="B12">12</abbr></abbrgrp>. In both cases, the time complexity is proportional to the product of the length of the matrix and the number of possible different scores. If the matrix has non-negative integer coefficient values, then the number of possible different scores is bounded by <inline-formula><m:math name="1748-7188-2-15-i2" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mrow><m:mstyle displaystyle="true"><m:msubsup><m:mo>&#8721;</m:mo><m:mrow><m:mi>i</m:mi><m:mo>=</m:mo><m:mn>1</m:mn></m:mrow><m:mi>m</m:mi></m:msubsup><m:mrow><m:mi>max</m:mi><m:mo>&#8289;</m:mo><m:mo>{</m:mo><m:mi>M</m:mi><m:mo stretchy="false">(</m:mo><m:mi>i</m:mi><m:mo>,</m:mo><m:mi>x</m:mi><m:mo stretchy="false">)</m:mo><m:mo>|</m:mo><m:mi>x</m:mi><m:mo>&#8712;</m:mo><m:mi>&#931;</m:mi><m:mo>}</m:mo></m:mrow></m:mstyle></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xH8viVGI8Gi=hEeeu0xXdbba9frFj0xb9qqpG0dXdb9aspeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGacaGaaiaabeqaaeqabiWaaaGcbaWaaabmaeaacyGGTbqBcqGGHbqycqGG4baEcqGG7bWEcqWGnbqtcqGGOaakcqWGPbqAcqGGSaalcqWG4baEcqGGPaqkcqGG8baFcqWG4baEcqGHiiIZcqqHJoWucqGG9bqFaSqaaiabdMgaPjabg2da9iabigdaXaqaaiabd2gaTbqdcqGHris5aaaa@4639@</m:annotation></m:semantics></m:math></inline-formula>. It follows that known algorithms are pseudo-polynomial. In real life, matrices have actually real coefficient values, such as log-ratio matrices, or entropy matrices. In this context, the number of different scores that the matrix can achieve is significantly larger.</p>
         <p>Theoretically, it can be as high as |&#931;|<sup><it>m</it></sup>. The usual way to deal with real matrices is to round them at a given precision, such as a given number of digits after the decimal point. In this context, the number of scores depends strongly on the chosen precision. Figure <figr fid="F1">1</figr> displays such an example. It shows the number of distinct scores obtained with the matrix MA0041 from the Jaspar database for a variety of rounding values. With a precision set to 10<sup>-6</sup>, we get more than one million distinct scores. Existing algorithms have difficulties to deal with such a large number of scores. An alternative consists in using a rough estimation, such a 10<sup>-3</sup>. In this context, the estimated distribution induced by the round matrix is likely to give larger error rates. For example, Figure <figr fid="F2">2</figr> shows the logo <abbrgrp><abbr bid="B13">13</abbr></abbrgrp> of the matrix MA0045 of length 16 from the Jaspar database. We chose 5 as a score threshold, which corresponds approximately to a P-value equal to 10<sup>-3</sup>. The number of words whose score is greater than or equal to 5 is 4045101 onto the original matrix, compared to 4034054 for the round matrix with a precision of 10<sup>-3</sup>. This makes a difference of 11047 words. This error naturally affects the accuracy to the P-value. To estimate this, we conducted a large scale experiment on all Jaspar matrices (123 matrices) for a variety of precisions and a uniform P-value set to 10<sup>-3</sup>. We compared the number of words whose score is larger than the threshold when the P-value is computed from the corresponding round matrix to the correct number of words that is observed with the true matrix, without discretization. In each case, we indicate the percentage of matrices for which the number of words is different. Results are reported in Table <tblr tid="T1">1</tblr>. With a rounding at the third digits after the decimal point, 55 percent of matrices give false results. Even with a rounding at the sixth digits after the decimal point, there exist matrices for which the discretization gives a false result. This demonstrates that it may be necessary to use high precision scores to obtain accurrate results. The choice of the precision is a difficult compromise between accuracy and tractability. To the best of our knowledge, this question is passed over in silence by existing algorithms.</p>
         <fig id="F1">
            <title>
               <p>Figure 1</p>
            </title>
            <caption>
               <p>Number of scores for a round matrix</p>
            </caption>
            <text>
               <p><b>Number of scores for a round matrix</b>. The matrix MA0041 of length 12 from the Jaspar database has been round with a number of digits after the decimal point from 1 to 8. The results are presented by a histogram showing the number of distinct scores that the round matrix can achieve. The number of scores is in log scale. The grey bar shows the number of distinct words (that is 4<sup>12</sup>).</p>
            </text>
            <graphic file="1748-7188-2-15-1"/>
         </fig>
         <fig id="F2">
            <title>
               <p>Figure 2</p>
            </title>
            <caption>
               <p>The MA0045 Jaspar matrix logo</p>
            </caption>
            <text>
               <p><b>The MA0045 Jaspar matrix logo</b>. The logo of the matrix MA0045 from the Jaspar database on which experiments in the Background section have been done.</p>
            </text>
            <graphic file="1748-7188-2-15-2"/>
         </fig>
         <tbl id="T1">
            <title>
               <p>Table 1</p>
            </title>
            <caption>
               <p>Error with round matrices. We report the percentage of Jaspar matrices for which the P-value computed from a round matrix leads to a different number of words as for the P-value computed with the original matrix. The rounding ranges from 10<sup>-2 </sup>to 10<sup>-6</sup>, and the P-value is 10<sup>-3 </sup>for a multinomial background model.</p>
            </caption>
            <tblbdy cols="6">
               <r>
                  <c ca="center">
                     <p>Granularity</p>
                  </c>
                  <c ca="center">
                     <p>10<sup>-2</sup></p>
                  </c>
                  <c ca="center">
                     <p>10<sup>-3</sup></p>
                  </c>
                  <c ca="center">
                     <p>10<sup>-4</sup></p>
                  </c>
                  <c ca="center">
                     <p>10<sup>-5</sup></p>
                  </c>
                  <c ca="center">
                     <p>10<sup>-6</sup></p>
                  </c>
               </r>
               <r>
                  <c cspan="6">
                     <hr/>
                  </c>
               </r>
               <r>
                  <c ca="center">
                     <p>% matrices with error</p>
                  </c>
                  <c ca="center">
                     <p>76</p>
                  </c>
                  <c ca="center">
                     <p>55</p>
                  </c>
                  <c ca="center">
                     <p>30</p>
                  </c>
                  <c ca="center">
                     <p>15</p>
                  </c>
                  <c ca="center">
                     <p>7</p>
                  </c>
               </r>
            </tblbdy>
         </tbl>
         <p>In this paper, we study the theoretical complexity of the P-value problem and prove that it is intrinsically difficult. It is actually NP-hard. We then introduce a novel algorithm that achieves significant speed up compared to existing algorithms when we allow for some errors like other methods do. This algorithm is also capable to solve the P-value problem without error within a reasonable amount of time.</p>
      </sec>
      <sec>
         <st>
            <p>Complexity of the P-value problem</p>
         </st>
         <p>We begin by introducing formally the P-value problem. We actually define two complementary problems, depending on what is given and what is searched for. In both cases, we are given a finite alphabet &#931;, a matrix <it>M </it>of length <it>m </it>and a probability distribution on &#931;<sup><it>m</it></sup>. We say that <it>s </it>in &#8477; is an <it>accessible score </it>if there exists a word <it>u </it>in &#931;<sup><it>m </it></sup>such that Score(<it>u</it>, <it>M</it>) = <it>s</it>.</p>
         <p><it>P-value problem &#8211; from score to P-value: </it>Given a score value <it>&#945;</it>, find the probability of the set {<it>u </it>&#8712; &#931;<sup><it>m</it></sup>, Score(<it>u</it>, <it>M</it>) &#8805; <it>&#945;</it>}. This probability is denoted P-value(<it>M</it>, <it>&#945;</it>).</p>
         <p><it>Threshold problem &#8211; from P-value to score: </it>Given a P-value <it>P </it>(0 &#8804; <it>P </it>&#8804; 1), find the highest accessible score <it>&#945; </it>such that P-value(<it>M</it>, <it>&#945;</it>) &#8805; <it>P</it>. We write Threshold(<it>M</it>, <it>P</it>) for <it>&#945;</it>.</p>
         <p>As we will see later on in this paper, they are closely related problems. We show here that neither of them admits a polynomial algorithm, unless P = NP. For that, we first define the decision problem ACCESSIBLE SCORE as follows.</p>
         <p><it>Instance: </it>a finite alphabet &#931;, a matrix <it>M </it>of length <it>m </it>whose coefficients are natural numbers, a natural number <it>t</it></p>
         <p><it>Question: </it>does there exist a string <it>u </it>of &#931;<sup><it>m </it></sup>such that Score(<it>u</it>, <it>M</it>) = <it>t</it>?</p>
         <p><b>Theorem 1 </b>ACCESSIBLE SCORE <it>is NP-hard</it>.</p>
         <p>The proof of Theorem 1 is by reduction of the SUBSET SUM problem, which is a pseudo-polynomial NP-complete problem <abbrgrp><abbr bid="B14">14</abbr></abbrgrp>.</p>
         <p><it>Instance: </it>a set of positive integers <it>A </it>= {<it>a</it><sub>0</sub>,...,<it>a</it><sub><it>n</it></sub>} and a positive integer <it>s</it></p>
         <p><it>Question: </it>does there exist a subset <it>A' </it>of <it>A </it>such that the sum of the elements of <it>A' </it>equals exactly <it>s</it>?</p>
         <p><b>Lemma 1 </b><it>There exists a polynomial reduction from the </it>SUBSET SUM <it>problem to the </it>ACCESSIBLE SCORE <it>problem</it>.</p>
         <p><b>Proof</b>. Let <it>A </it>= {<it>a</it><sub>0</sub>,...,<it>a</it><sub><it>n</it></sub>} be a set of positive integers, and let <it>s </it>be the target integer. We define the matrix <it>M </it>of length <it>n </it>+ 1 on the two letter alphabet &#931; = {<it>x</it>, <it>y</it>} as follows: <it>M </it>(<it>i</it>, <it>x</it>) = <it>a</it><sub><it>i </it></sub>and <it>M </it>(<it>i</it>, <it>y</it>) = 0 for each <it>i</it>, 0 &#8804; <it>i </it>&#8804; <it>n</it>. The set <it>A </it>has 2<sup><it>n</it>+1 </sup>different subsets. So we can define a bijection <it>&#966; </it>from the set of subsets of <it>A </it>onto &#931;<sup><it>n</it>+1</sup>. For each subset <it>A'</it>, the word <it>&#966; </it>(<it>A'</it>) is such as the <it>i</it>th letter is <it>x </it>if and only if <it>a</it><sub><it>i </it></sub>&#8712; <it>A'</it>, otherwise the <it>i</it>th letter is <it>y</it>. It is easy to see that Score(<it>&#966; </it>(<it>A'</it>), <it>M</it>) = <it>s </it>if, and only if, &#8721;<sub><it>a</it>&#8712;<it>A' </it></sub><it>a </it>= <it>s</it>.</p>
         <p>It remains to prove that the ACCESSIBLE SCORE problem polynomially reduces to instances of the <it>From score to P-value </it>and <it>From P-value to score</it>. problems. We are now given a finite alphabet &#931;, a matrice <it>M </it>of length <it>m</it>, and a score value <it>t</it>.</p>
         <sec>
            <st>
               <p>Reduction to the <it>From score to P-value </it>problem</p>
            </st>
            <p>We assume that the probability of each non-empty word of &#931;<sup><it>m </it></sup>is non null. Under this hypothesis, the ACCESSIBLE SCORE problem admits a solution if, and only if, P-value(<it>M</it>, <it>t</it>) &#8800; P-value(<it>M</it>, <it>t </it>+ 1).</p>
         </sec>
         <sec>
            <st>
               <p>Reduction to the <it>From P-value to score </it>problem</p>
            </st>
            <p>We assume that the background model for &#931;* is provided with a multinomial model. In this context, all words of length <it>m </it>have the same probability: <inline-formula><m:math name="1748-7188-2-15-i3" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mrow><m:mfrac><m:mn>1</m:mn><m:mrow><m:msup><m:mrow><m:mrow><m:mo>|</m:mo><m:mi>&#931;</m:mi><m:mo>|</m:mo></m:mrow></m:mrow><m:mi>m</m:mi></m:msup></m:mrow></m:mfrac></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xH8viVGI8Gi=hEeeu0xXdbba9frFj0xb9qqpG0dXdb9aspeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGacaGaaiaabeqaaeqabiWaaaGcbaqcfa4aaSaaaeaacqaIXaqmaeaadaabdaqaaiabfo6atbGaay5bSlaawIa7amaaCaaabeqaaiabd2gaTbaaaaaaaa@338C@</m:annotation></m:semantics></m:math></inline-formula> and all P-values are of the form <inline-formula><m:math name="1748-7188-2-15-i4" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mrow><m:mfrac><m:mi>k</m:mi><m:mrow><m:msup><m:mrow><m:mrow><m:mo>|</m:mo><m:mi>&#931;</m:mi><m:mo>|</m:mo></m:mrow></m:mrow><m:mi>m</m:mi></m:msup></m:mrow></m:mfrac></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xH8viVGI8Gi=hEeeu0xXdbba9frFj0xb9qqpG0dXdb9aspeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGacaGaaiaabeqaaeqabiWaaaGcbaqcfa4aaSaaaeaacqWGRbWAaeaadaabdaqaaiabfo6atbGaay5bSlaawIa7amaaCaaabeqaaiabd2gaTbaaaaaaaa@33FB@</m:annotation></m:semantics></m:math></inline-formula>. Solving the ACCESSIBLE SCORE problem amounts to decide whether there exists an integer <it>k</it>, 0 &#8804; <it>k </it>&#8804; |&#931;|<sup><it>m</it></sup>, such that Threshold(<it>M</it>, <inline-formula><m:math name="1748-7188-2-15-i4" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mrow><m:mfrac><m:mi>k</m:mi><m:mrow><m:msup><m:mrow><m:mrow><m:mo>|</m:mo><m:mi>&#931;</m:mi><m:mo>|</m:mo></m:mrow></m:mrow><m:mi>m</m:mi></m:msup></m:mrow></m:mfrac></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xH8viVGI8Gi=hEeeu0xXdbba9frFj0xb9qqpG0dXdb9aspeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGacaGaaiaabeqaaeqabiWaaaGcbaqcfa4aaSaaaeaacqWGRbWAaeaadaabdaqaaiabfo6atbGaay5bSlaawIa7amaaCaaabeqaaiabd2gaTbaaaaaaaa@33FB@</m:annotation></m:semantics></m:math></inline-formula>) = <it>t</it>. The existence of such <it>k </it>can be decided with iterative computations of <it>From P-value to Score </it>for different values of <it>k</it>. This search can be performed within <it>O</it>(log<sub>2 </sub>(|&#931;|<sup><it>m</it></sup>)) steps using binary search, because <it>k </it>decreases monotonically in <it>t </it>and there are at most |&#931;|<sup><it>m </it></sup>different values for <it>k</it>.</p>
         </sec>
      </sec>
      <sec>
         <st>
            <p>Algorithms for the P-value problems</p>
         </st>
         <p>From now on, we assume that the positions in the sequence are independently distributed. We denote <it>p</it>(<it>x</it>) the background probability associated to the letter <it>x </it>of the alphabet &#931;. By extension, we write <it>p</it>(<it>u</it>) for the probability of the word <it>u </it>= <it>u</it><sub>1 </sub>... <it>u</it><sub><it>m</it></sub>: <it>p</it>(<it>u</it>) = <it>p</it>(<it>u</it><sub>1</sub>) &#215; &#8943; &#215; <it>p</it>(<it>u</it><sub><it>m</it></sub>).</p>
         <sec>
            <st>
               <p>Definition of the score distribution</p>
            </st>
            <p>The computation of the P-value is done through the computation of the <it>score distribution</it>. This concept is the core of the large majority of existing algorithms <abbrgrp><abbr bid="B9">9</abbr><abbr bid="B10">10</abbr><abbr bid="B11">11</abbr><abbr bid="B15">15</abbr></abbrgrp>. Given a matrix <it>M </it>of length <it>m </it>and a score <it>&#945;</it>, we define <it>Q</it>(<it>M</it>, <it>&#945;</it>) as the probability that the background model can achieve a score equal to <it>&#945;</it>. In other words, <it>Q </it>(<it>M</it>, <it>&#945;</it>) is the probability of the set {<it>u </it>&#8712; &#931;<sup><it>m </it></sup>| Score(<it>u</it>, <it>M</it>) = <it>&#945;</it>}. In the case where <it>s </it>is not an accessible score, then <it>Q</it>(<it>M</it>, <it>s</it>) = 0.</p>
            <p>The computation of <it>Q </it>is easily performed by dynamic programming. For that purpose, we need some preliminary notation. Given two integers <it>i</it>, <it>j </it>satisfying 0 &#8804; <it>i, j </it>&#8804; <it>m</it>, <it>M </it>[<it>i</it>..<it>j</it>] denotes the submatrix of <it>M </it>obtained by selecting only columns from <it>i </it>to <it>j </it>for all character symbols. <it>M </it>[<it>i</it>..<it>j</it>] is called a <it>slice </it>of <it>M</it>. By convention, if <it>i </it>> <it>j</it>, then <it>M </it>[<it>i</it>..<it>j</it>] is an empty matrix.</p>
            <p>The score distribution for the slice <it>M </it>[1..<it>i</it>] is expressed from the sore distribution of the previous slice <it>M </it>[1..<it>i </it>- 1] as follows.</p>
            <p>
               <display-formula id="M1">
                  <m:math name="1748-7188-2-15-i5" xmlns:m="http://www.w3.org/1998/Math/MathML">
                     <m:semantics>
                        <m:mrow>
                           <m:mtable columnalign="left">
                              <m:mtr columnalign="left">
                                 <m:mtd columnalign="left">
                                    <m:mrow>
                                       <m:mi>Q</m:mi>
                                       <m:mo stretchy="false">(</m:mo>
                                       <m:mi>M</m:mi>
                                       <m:mo stretchy="false">[</m:mo>
                                       <m:mn>1..0</m:mn>
                                       <m:mo stretchy="false">]</m:mo>
                                       <m:mo>,</m:mo>
                                       <m:mi>s</m:mi>
                                       <m:mo stretchy="false">)</m:mo>
                                    </m:mrow>
                                 </m:mtd>
                                 <m:mtd columnalign="left">
                                    <m:mo>=</m:mo>
                                 </m:mtd>
                                 <m:mtd columnalign="left">
                                    <m:mrow>
                                       <m:mrow>
                                          <m:mo>{</m:mo>
                                          <m:mrow>
                                             <m:mtable columnalign="left">
                                                <m:mtr columnalign="left">
                                                   <m:mtd columnalign="left">
                                                      <m:mn>1</m:mn>
                                                   </m:mtd>
                                                   <m:mtd columnalign="left">
                                                      <m:mrow>
                                                         <m:mtext>if&#160;</m:mtext>
                                                         <m:mi>s</m:mi>
                                                         <m:mo>=</m:mo>
                                                         <m:mn>0</m:mn>
                                                      </m:mrow>
                                                   </m:mtd>
                                                </m:mtr>
                                                <m:mtr columnalign="left">
                                                   <m:mtd columnalign="left">
                                                      <m:mn>0</m:mn>
                                                   </m:mtd>
                                                   <m:mtd columnalign="left">
                                                      <m:mrow>
                                                         <m:mtext>otherwise</m:mtext>
                                                      </m:mrow>
                                                   </m:mtd>
                                                </m:mtr>
                                             </m:mtable>
                                          </m:mrow>
                                       </m:mrow>
                                    </m:mrow>
                                 </m:mtd>
                              </m:mtr>
                              <m:mtr columnalign="left">
                                 <m:mtd columnalign="left">
                                    <m:mrow>
                                       <m:mi>Q</m:mi>
                                       <m:mo stretchy="false">(</m:mo>
                                       <m:mi>M</m:mi>
                                       <m:mo stretchy="false">[</m:mo>
                                       <m:mn>1..</m:mn>
                                       <m:mi>i</m:mi>
                                       <m:mo stretchy="false">]</m:mo>
                                       <m:mo>,</m:mo>
                                       <m:mi>s</m:mi>
                                       <m:mo stretchy="false">)</m:mo>
                                    </m:mrow>
                                 </m:mtd>
                                 <m:mtd columnalign="left">
                                    <m:mo>=</m:mo>
                                 </m:mtd>
                                 <m:mtd columnalign="left">
                                    <m:mrow>
                                       <m:mstyle displaystyle="true">
                                          <m:munder>
                                             <m:mo>&#8721;</m:mo>
                                             <m:mrow>
                                                <m:mi>x</m:mi>
                                                <m:mo>&#8712;</m:mo>
                                                <m:mi>&#931;</m:mi>
                                             </m:mrow>
                                          </m:munder>
                                          <m:mrow>
                                             <m:mi>Q</m:mi>
                                             <m:mo stretchy="false">(</m:mo>
                                             <m:mi>M</m:mi>
                                             <m:mo stretchy="false">[</m:mo>
                                             <m:mn>1..</m:mn>
                                             <m:mi>i</m:mi>
                                             <m:mo>&#8722;</m:mo>
                                             <m:mn>1</m:mn>
                                             <m:mo stretchy="false">]</m:mo>
                                             <m:mo>,</m:mo>
                                             <m:mi>s</m:mi>
                                             <m:mo>&#8722;</m:mo>
                                             <m:mi>M</m:mi>
                                             <m:mo stretchy="false">(</m:mo>
                                             <m:mi>i</m:mi>
                                             <m:mo>,</m:mo>
                                             <m:mi>x</m:mi>
                                             <m:mo stretchy="false">)</m:mo>
                                             <m:mo stretchy="false">)</m:mo>
                                             <m:mo>&#215;</m:mo>
                                             <m:mi>p</m:mi>
                                             <m:mo stretchy="false">(</m:mo>
                                             <m:mi>x</m:mi>
                                             <m:mo stretchy="false">)</m:mo>
                                          </m:mrow>
                                       </m:mstyle>
                                    </m:mrow>
                                 </m:mtd>
                              </m:mtr>
                           </m:mtable>
                        </m:mrow>
                        <m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xI8qiVKYPFjYdHaVhbbf9v8qqaqFr0xc9vqFj0dXdbba91qpepeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGacaGaaiaabeqaaeqabiWaaaGcbaqbaeaabiWaaaqaaiabdgfarjabcIcaOiabd2eanjabcUfaBjabigdaXiabc6caUiabc6caUiabicdaWiabc2faDjabcYcaSiabdohaZjabcMcaPaqaaiabg2da9aqaamaaceqabaqbaeaabiGaaaqaaiabigdaXaqaaiabbMgaPjabbAgaMjabbccaGiabdohaZjabg2da9iabicdaWaqaaiabicdaWaqaaiabb+gaVjabbsha0jabbIgaOjabbwgaLjabbkhaYjabbEha3jabbMgaPjabbohaZjabbwgaLbaaaiaawUhaaaqaaiabdgfarjabcIcaOiabd2eanjabcUfaBjabigdaXiabc6caUiabc6caUiabdMgaPjabc2faDjabcYcaSiabdohaZjabcMcaPaqaaiabg2da9aqaamaaqafabaGaemyuaeLaeiikaGIaemyta0Kaei4waSLaeGymaeJaeiOla4IaeiOla4IaemyAaKMaeyOeI0IaeGymaeJaeiyxa0LaeiilaWIaem4CamNaeyOeI0Iaemyta0KaeiikaGIaemyAaKMaeiilaWIaemiEaGNaeiykaKIaeiykaKIaey41aqRaemiCaaNaeiikaGIaemiEaGNaeiykaKcaleaacqWG4baEcqGHiiIZcqqHJoWuaeqaniabggHiLdaaaaaa@816B@</m:annotation>
                     </m:semantics>
                  </m:math>
               </display-formula>
            </p>
            <p>The time complexity is in <it>O</it>(<it>m</it>|&#931;|<it>S</it>), and the space complexity in <it>O</it>(<it>S</it>), where <it>S </it>is the number of scores that have to be visited. If coefficients of <it>M </it>are natural numbers, then <it>S </it>is bounded by <it>m </it>&#215; max {<it>M </it>(<it>i</it>, <it>x</it>) | <it>x </it>&#8712; &#931;, 1 &#8804; <it>i </it>&#8804; <it>m</it>}. Equation 1 enables to solve the <it>From score to P-value </it>and <it>From P-value to score </it>problems. Given a score <it>&#945;</it>, the P-value is obtained with the relation:</p>
            <p>
               <display-formula>
                  <m:math name="1748-7188-2-15-i6" xmlns:m="http://www.w3.org/1998/Math/MathML">
                     <m:semantics>
                        <m:mrow>
                           <m:mtext>P-value</m:mtext>
                           <m:mo stretchy="false">(</m:mo>
                           <m:mi>M</m:mi>
                           <m:mo>,</m:mo>
                           <m:mi>&#945;</m:mi>
                           <m:mo stretchy="false">)</m:mo>
                           <m:mo>=</m:mo>
                           <m:mstyle displaystyle="true">
                              <m:munder>
                                 <m:mo>&#8721;</m:mo>
                                 <m:mrow>
                                    <m:mi>s</m:mi>
                                    <m:mo>&#8805;</m:mo>
                                    <m:mi>&#945;</m:mi>
                                 </m:mrow>
                              </m:munder>
                              <m:mrow>
                                 <m:mi>Q</m:mi>
                                 <m:mo stretchy="false">(</m:mo>
                                 <m:mi>M</m:mi>
                                 <m:mo>,</m:mo>
                                 <m:mi>s</m:mi>
                                 <m:mo stretchy="false">)</m:mo>
                              </m:mrow>
                           </m:mstyle>
                        </m:mrow>
                        <m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xI8qiVKYPFjYdHaVhbbf9v8qqaqFr0xc9vqFj0dXdbba91qpepeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGacaGaaiaabeqaaeqabiWaaaGcbaGaeeiuaaLaeeyla0IaeeODayNaeeyyaeMaeeiBaWMaeeyDauNaeeyzauMaeiikaGIaemyta0KaeiilaWccciGae8xSdeMaeiykaKIaeyypa0ZaaabuaeaacqWGrbqucqGGOaakcqWGnbqtcqGGSaalcqWGZbWCcqGGPaqkaSqaaiabdohaZjabgwMiZkab=f7aHbqab0GaeyyeIuoaaaa@48A8@</m:annotation>
                     </m:semantics>
                  </m:math>
               </display-formula>
            </p>
            <p>Conversely, given <it>P</it>, Threshold (<it>M</it>, <it>P</it>) is computed from <it>Q </it>by searching for the greatest accessible score until the required P-value is reached.</p>
         </sec>
         <sec>
            <st>
               <p>Computing the score distribution for a range of scores</p>
            </st>
            <p>Formula 1 does not explicitly state which score ranges should be taken into account in intermediate steps of the calculation of <it>Q</it>. To this end, we introduce the <it>best score </it>and the <it>worst score </it>of a matrix slice.</p>
            <p><b>Definition 1 (Best and worst scores) </b><it>Let M be a matrix. The </it>best score <it>of the slice M </it>[<it>i</it>..<it>j</it>] <it>is defined as</it></p>
            <p>
               <display-formula>
                  <m:math name="1748-7188-2-15-i7" xmlns:m="http://www.w3.org/1998/Math/MathML">
                     <m:semantics>
                        <m:mrow>
                           <m:mtext>BS</m:mtext>
                           <m:mo stretchy="false">(</m:mo>
                           <m:mi>M</m:mi>
                           <m:mo stretchy="false">[</m:mo>
                           <m:mi>i</m:mi>
                           <m:mn>..</m:mn>
                           <m:mi>j</m:mi>
                           <m:mo stretchy="false">]</m:mo>
                           <m:mo stretchy="false">)</m:mo>
                           <m:mo>=</m:mo>
                           <m:mstyle displaystyle="true">
                              <m:munderover>
                                 <m:mo>&#8721;</m:mo>
                                 <m:mrow>
                                    <m:mi>k</m:mi>
                                    <m:mo>=</m:mo>
                                    <m:mi>i</m:mi>
                                 </m:mrow>
                                 <m:mi>j</m:mi>
                              </m:munderover>
                              <m:mrow>
                                 <m:mi>max</m:mi>
                                 <m:mo>&#8289;</m:mo>
                                 <m:mo>{</m:mo>
                                 <m:mi>M</m:mi>
                                 <m:mo stretchy="false">(</m:mo>
                                 <m:mi>k</m:mi>
                                 <m:mo>,</m:mo>
                                 <m:mi>x</m:mi>
                                 <m:mo stretchy="false">)</m:mo>
                                 <m:mo>|</m:mo>
                                 <m:mi>x</m:mi>
                                 <m:mo>&#8712;</m:mo>
                                 <m:mi>&#931;</m:mi>
                                 <m:mo>}</m:mo>
                              </m:mrow>
                           </m:mstyle>
                        </m:mrow>
                        <m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xI8qiVKYPFjYdHaVhbbf9v8qqaqFr0xc9vqFj0dXdbba91qpepeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGacaGaaiaabeqaaeqabiWaaaGcbaGaeeOqaiKaee4uamLaeiikaGIaemyta0Kaei4waSLaemyAaKMaeiOla4IaeiOla4IaemOAaOMaeiyxa0LaeiykaKIaeyypa0ZaaabCaeaacyGGTbqBcqGGHbqycqGG4baEcqGG7bWEcqWGnbqtcqGGOaakcqWGRbWAcqGGSaalcqWG4baEcqGGPaqkcqGG8baFcqWG4baEcqGHiiIZcqqHJoWucqGG9bqFaSqaaiabdUgaRjabg2da9iabdMgaPbqaaiabdQgaQbqdcqGHris5aaaa@5447@</m:annotation>
                     </m:semantics>
                  </m:math>
               </display-formula>
            </p>
            <p><it>Similarly, the </it>worst score <it>of the slice M </it>[<it>i</it>..<it>j</it>] <it>is defined as</it></p>
            <p>
               <display-formula>
                  <m:math name="1748-7188-2-15-i8" xmlns:m="http://www.w3.org/1998/Math/MathML">
                     <m:semantics>
                        <m:mrow>
                           <m:mtext>WS</m:mtext>
                           <m:mo stretchy="false">(</m:mo>
                           <m:mi>M</m:mi>
                           <m:mo stretchy="false">[</m:mo>
                           <m:mi>i</m:mi>
                           <m:mn>..</m:mn>
                           <m:mi>j</m:mi>
                           <m:mo stretchy="false">]</m:mo>
                           <m:mo stretchy="false">)</m:mo>
                           <m:mo>=</m:mo>
                           <m:mstyle displaystyle="true">
                              <m:munderover>
                                 <m:mo>&#8721;</m:mo>
                                 <m:mrow>
                                    <m:mi>k</m:mi>
                                    <m:mo>=</m:mo>
                                    <m:mi>i</m:mi>
                                 </m:mrow>
                                 <m:mi>j</m:mi>
                              </m:munderover>
                              <m:mrow>
                                 <m:mi>min</m:mi>
                                 <m:mo>&#8289;</m:mo>
                                 <m:mo>{</m:mo>
                                 <m:mi>M</m:mi>
                                 <m:mo stretchy="false">(</m:mo>
                                 <m:mi>k</m:mi>
                                 <m:mo>,</m:mo>
                                 <m:mi>x</m:mi>
                                 <m:mo stretchy="false">)</m:mo>
                                 <m:mo>|</m:mo>
                                 <m:mi>x</m:mi>
                                 <m:mo>&#8712;</m:mo>
                                 <m:mi>&#931;</m:mi>
                                 <m:mo>}</m:mo>
                              </m:mrow>
                           </m:mstyle>
                        </m:mrow>
                        <m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xI8qiVKYPFjYdHaVhbbf9v8qqaqFr0xc9vqFj0dXdbba91qpepeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGacaGaaiaabeqaaeqabiWaaaGcbaGaee4vaCLaee4uamLaeiikaGIaemyta0Kaei4waSLaemyAaKMaeiOla4IaeiOla4IaemOAaOMaeiyxa0LaeiykaKIaeyypa0ZaaabCaeaacyGGTbqBcqGGPbqAcqGGUbGBcqGG7bWEcqWGnbqtcqGGOaakcqWGRbWAcqGGSaalcqWG4baEcqGGPaqkcqGG8baFcqWG4baEcqGHiiIZcqqHJoWucqGG9bqFaSqaaiabdUgaRjabg2da9iabdMgaPbqaaiabdQgaQbqdcqGHris5aaaa@546D@</m:annotation>
                     </m:semantics>
                  </m:math>
               </display-formula>
            </p>
            <p>The notion of best scores is already present in <abbrgrp><abbr bid="B16">16</abbr></abbrgrp>, where it is used to speed up the search for occurrences of a matrix in a text. It gives rise to <it>look ahead scoring</it>. Best scores allow to stop the calculation of Score(<it>u</it>, <it>M</it>) in advance as soon as it is guaranteed that the score threshold cannot be achieved, because we know the maximal remaining score. It has been exploited in <abbrgrp><abbr bid="B5">5</abbr><abbr bid="B6">6</abbr></abbrgrp> in the same context. Here we adapt it to the score distribution problem. Let <it>&#945; </it>and <it>&#946; </it>be two scores such that <it>&#945; </it>&#8804; <it>&#946;</it>. If one wants to compute the score distribution <it>Q </it>for the range [<it>&#945;</it>, <it>&#946;</it>], then given an intermediate score <it>s </it>and a matrix position <it>i</it>, we say that <it>Q</it>(<it>M </it>[1..<it>i</it>], <it>s</it>) is <it>useful </it>if there exists a word <it>v </it>of length <it>m </it>- <it>i </it>such that <it>&#945; </it>&#8804; <it>s </it>+ Score(<it>v</it>, <it>M </it>[<it>i </it>+ 1..<it>m</it>]) &#8804; <it>&#946;</it>. Lemma 2 characterizes useful intermediate scores.</p>
            <p><b>Lemma 2 </b><it>Let M be a matrix of length m, let &#945; and &#946; be two score bounds defining a score range for which we want to compute the score distribution Q. Q</it>(<it>M </it>[1..<it>i</it>], <it>s</it>) <it>is useful if, and only if</it>,</p>
            <p>
               <display-formula><it>&#945; </it>- BS(<it>M </it>[<it>i </it>+ 1..<it>m</it>]) &#8804; <it>s </it>&#8804; <it>&#946; </it>- WS(<it>M </it>[<it>i </it>+ 1..<it>m</it>])</display-formula>
            </p>
            <p><b>Proof</b>. This is a straightforward consequence of Definition 1.</p>
            <p>This result is implemented in Algorithm SCOREDISTRIBUTION, displayed in Figure <figr fid="F3">3</figr>. The algorithm ensures that only accessible scores are visited. In practice, this is done by using a hash table for storing values of <it>Q</it>.</p>
            <fig id="F3">
               <title>
                  <p>Figure 3</p>
               </title>
               <caption>
                  <p>Algorithm ScoreDistribution</p>
               </caption>
               <text>
                  <p>Algorithm ScoreDistribution.</p>
               </text>
               <graphic file="1748-7188-2-15-3"/>
            </fig>
            <p>If one wants only to calculate the P-value of a given score without knowing the score distribution, Algorithm SCOREDISTRIBUTION can be further improved. We introduce a complementary optimization that leads to a significant speed up. The idea is that for <it>good </it>words, we can anticipate that the final score will be above the given threshold without calculating it.</p>
            <p><b>Definition 2 (Good words) </b><it>Let &#945; be a score and i be a position of M. Given u </it>= <it>u</it><sub>1 </sub>... <it>u</it><sub><it>i </it></sub><it>a word of </it>&#931;<sup><it>i</it></sup>, <it>we say that u is </it>good <it>for &#945; if the following conditions are fulfilled:</it></p>
            <p>1. Score(<it>u</it>, <it>M </it>[1..<it>i</it>]) &#8805; <it>&#945; </it>- WS(<it>M </it>[<it>i </it>+ 1..<it>m</it>])</p>
            <p>2. Score(<it>u</it><sub>1 </sub>... <it>u</it><sub><it>i</it>-1</sub>, <it>M </it>[1..<it>i </it>- 1]) &lt;<it>&#945; </it>- WS(<it>M </it>[<it>i</it>..<it>m</it>])</p>
            <p><b>Lemma 3 </b><it>Let u be a good word for &#945;. Then for all v in u</it>&#931;<sup><it>m</it>-|<it>u</it>|</sup>, <it>we have </it>Score(<it>v, M</it>) &#8805; <it>&#945;</it>.</p>
            <p><b>Proof</b>. Let <it>w </it>in &#931;<sup><it>m</it>-|<it>u</it>| </sup>such that <it>v </it>= <it>uw </it>and let <it>i </it>be the length of <it>u</it>. We have</p>
            <p>
               <display-formula>
                  <m:math name="1748-7188-2-15-i9" xmlns:m="http://www.w3.org/1998/Math/MathML">
                     <m:semantics>
                        <m:mrow>
                           <m:mtable columnalign="left">
                              <m:mtr columnalign="left">
                                 <m:mtd columnalign="left">
                                    <m:mrow>
                                       <m:mtext>Score</m:mtext>
                                       <m:mo stretchy="false">(</m:mo>
                                       <m:mi>v</m:mi>
                                       <m:mo>,</m:mo>
                                       <m:mi>M</m:mi>
                                       <m:mo stretchy="false">)</m:mo>
                                    </m:mrow>
                                 </m:mtd>
                                 <m:mtd columnalign="left">
                                    <m:mo>=</m:mo>
                                 </m:mtd>
                                 <m:mtd columnalign="left">
                                    <m:mrow>
                                       <m:mtext>Score</m:mtext>
                                       <m:mo stretchy="false">(</m:mo>
                                       <m:mi>u</m:mi>
                                       <m:mo>,</m:mo>
                                       <m:mi>M</m:mi>
                                       <m:mo stretchy="false">[</m:mo>
                                       <m:mn>1..</m:mn>
                                       <m:mi>i</m:mi>
                                       <m:mo stretchy="false">]</m:mo>
                                       <m:mo stretchy="false">)</m:mo>
                                       <m:mo>+</m:mo>
                                       <m:mtext>Score</m:mtext>
                                       <m:mo stretchy="false">(</m:mo>
                                       <m:mi>w</m:mi>
                                       <m:mo>,</m:mo>
                                       <m:mi>M</m:mi>
                                       <m:mo stretchy="false">[</m:mo>
                                       <m:mi>i</m:mi>
                                       <m:mo>+</m:mo>
                                       <m:mn>1..</m:mn>
                                       <m:mi>m</m:mi>
                                       <m:mo stretchy="false">]</m:mo>
                                       <m:mo stretchy="false">)</m:mo>
                                    </m:mrow>
                                 </m:mtd>
                              </m:mtr>
                              <m:mtr columnalign="left">
                                 <m:mtd columnalign="left">
                                    <m:mrow/>
                                 </m:mtd>
                                 <m:mtd columnalign="left">
                                    <m:mo>&#8805;</m:mo>
                                 </m:mtd>
                                 <m:mtd columnalign="left">
                                    <m:mrow>
                                       <m:mtext>Score</m:mtext>
                                       <m:mo stretchy="false">(</m:mo>
                                       <m:mi>u</m:mi>
                                       <m:mo>,</m:mo>
                                       <m:mi>M</m:mi>
                                       <m:mo stretchy="false">[</m:mo>
                                       <m:mn>1..</m:mn>
                                       <m:mi>i</m:mi>
                                       <m:mo stretchy="false">]</m:mo>
                                       <m:mo stretchy="false">)</m:mo>
                                       <m:mo>+</m:mo>
                                       <m:mtext>WS</m:mtext>
                                       <m:mo stretchy="false">(</m:mo>
                                       <m:mi>M</m:mi>
                                       <m:mo stretchy="false">[</m:mo>
                                       <m:mi>i</m:mi>
                                       <m:mo>+</m:mo>
                                       <m:mn>1..</m:mn>
                                       <m:mi>m</m:mi>
                                       <m:mo stretchy="false">]</m:mo>
                                       <m:mo stretchy="false">)</m:mo>
                                    </m:mrow>
                                 </m:mtd>
                              </m:mtr>
                              <m:mtr columnalign="left">
                                 <m:mtd columnalign="left">
                                    <m:mrow/>
                                 </m:mtd>
                                 <m:mtd columnalign="left">
                                    <m:mo>&#8805;</m:mo>
                                 </m:mtd>
                                 <m:mtd columnalign="left">
                                    <m:mi>&#945;</m:mi>
                                 </m:mtd>
                              </m:mtr>
                           </m:mtable>
                        </m:mrow>
                        <m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xI8qiVKYPFjYdHaVhbbf9v8qqaqFr0xc9vqFj0dXdbba91qpepeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGacaGaaiaabeqaaeqabiWaaaGcbaqbaeaabmWaaaqaaiabbofatjabbogaJjabb+gaVjabbkhaYjabbwgaLjabcIcaOiabdAha2jabcYcaSiabd2eanjabcMcaPaqaaiabg2da9aqaaiabbofatjabbogaJjabb+gaVjabbkhaYjabbwgaLjabcIcaOiabdwha1jabcYcaSiabd2eanjabcUfaBjabigdaXiabc6caUiabc6caUiabdMgaPjabc2faDjabcMcaPiabgUcaRiabbofatjabbogaJjabb+gaVjabbkhaYjabbwgaLjabcIcaOiabdEha3jabcYcaSiabd2eanjabcUfaBjabdMgaPjabgUcaRiabigdaXiabc6caUiabc6caUiabd2gaTjabc2faDjabcMcaPaqaaaqaaiabgwMiZcqaaiabbofatjabbogaJjabb+gaVjabbkhaYjabbwgaLjabcIcaOiabdwha1jabcYcaSiabd2eanjabcUfaBjabigdaXiabc6caUiabc6caUiabdMgaPjabc2faDjabcMcaPiabgUcaRiabbEfaxjabbofatjabcIcaOiabd2eanjabcUfaBjabdMgaPjabgUcaRiabigdaXiabc6caUiabc6caUiabd2gaTjabc2faDjabcMcaPaqaaaqaaiabgwMiZcqaaGGaciab=f7aHbaaaaa@8752@</m:annotation>
                     </m:semantics>
                  </m:math>
               </display-formula>
            </p>
            <p><b>Lemma 4 </b><it>Let u be a string of </it>&#931;<sup><it>m </it></sup><it>such that </it>Score(<it>u</it>, <it>M</it>) &#8805; <it>&#945;. Then there exists a unique prefix v of u such that v is good for &#945;</it>.</p>
            <p><b>Proof</b>. We first remark that if Score(<it>u</it>, <it>M</it>) &#8805; <it>&#945;</it>, then Score(<it>u</it>, <it>M</it>) &#8805; <it>&#945; </it>- WS(<it>M</it>[<it>m </it>+ 1..<it>m</it>]). So there exists at least one prefix of <it>u </it>satisfying the first condition of Definition 2: <it>u </it>itself. Now, consider a prefix <it>v </it>of length <it>i </it>such that Score(<it>v</it>, <it>M</it>[1..<it>i</it>]) &#8805; <it>&#945; </it>- WS(<it>M</it>[<it>i </it>+ 1..<it>m</it>]). Then for each letter <it>x </it>of &#931;, we have Score(<it>vx</it>, <it>M</it>[1..<it>i </it>+ 1]) &#8805; <it>&#945; </it>- WS(<it>M</it>[<it>i </it>+ 2..<it>m</it>]): It comes from the fact that <it>M</it>(<it>i </it>+ 1, <it>x</it>) &#8805; WS(<it>M</it>[<it>i </it>+ 1..<it>m</it>]) - WS(<it>M</it>[<it>i </it>+ 2..<it>m</it>]). This property implies that if a prefix <it>v </it>of <it>u </it>satisfies the first condition of Definition 2, then all longer prefixes also do. According to the second condition of Definition 2, it follows that only the shortest prefix <it>v </it>such that Score(<it>v</it>, <it>M</it>[1..<it>i</it>]) &#8805; <it>&#945; </it>- WS(<it>M</it>[<it>i </it>+ 1..<it>m</it>]) is a good word.</p>
            <p><b>Lemma 5 </b><it>Let M be a matrix of length m</it>.</p>
            <p>
               <display-formula>
                  <m:math name="1748-7188-2-15-i10" xmlns:m="http://www.w3.org/1998/Math/MathML">
                     <m:semantics>
                        <m:mrow>
                           <m:mtext>P-value</m:mtext>
                           <m:mo stretchy="false">(</m:mo>
                           <m:mi>M</m:mi>
                           <m:mo>,</m:mo>
                           <m:mi>&#945;</m:mi>
                           <m:mo stretchy="false">)</m:mo>
                           <m:mo>=</m:mo>
                           <m:mstyle displaystyle="true">
                              <m:munder>
                                 <m:mo>&#8721;</m:mo>
                                 <m:mrow>
                                    <m:mtable>
                                       <m:mtr>
                                          <m:mtd>
                                             <m:mrow>
                                                <m:mn>1</m:mn>
                                                <m:mo>&#8804;</m:mo>
                                                <m:mi>i</m:mi>
                                                <m:mo>&#8804;</m:mo>
                                                <m:mi>m</m:mi>
                                                <m:mo>,</m:mo>
                                                <m:mi>x</m:mi>
                                                <m:mo>&#8712;</m:mo>
                                                <m:mi>&#931;</m:mi>
                                                <m:mo>,</m:mo>
                                                <m:mi>s</m:mi>
                                                <m:mo>&lt;</m:mo>
                                                <m:mi>&#945;</m:mi>
                                                <m:mo>&#8722;</m:mo>
                                                <m:mtext>WS</m:mtext>
                                                <m:mo stretchy="false">(</m:mo>
                                                <m:mi>M</m:mi>
                                                <m:mo stretchy="false">[</m:mo>
                                                <m:mi>i</m:mi>
                                                <m:mn>..</m:mn>
                                                <m:mi>m</m:mi>
                                                <m:mo stretchy="false">]</m:mo>
                                                <m:mo stretchy="false">)</m:mo>
                                             </m:mrow>
                                          </m:mtd>
                                       </m:mtr>
                                       <m:mtr>
                                          <m:mtd>
                                             <m:mrow>
                                                <m:mi>s</m:mi>
                                                <m:mo>+</m:mo>
                                                <m:mi>M</m:mi>
                                                <m:mo stretchy="false">(</m:mo>
                                                <m:mi>i</m:mi>
                                                <m:mo>,</m:mo>
                                                <m:mi>x</m:mi>
                                                <m:mo stretchy="false">)</m:mo>
                                                <m:mo>&#8805;</m:mo>
                                                <m:mi>&#945;</m:mi>
                                                <m:mo>&#8722;</m:mo>
                                                <m:mtext>WS</m:mtext>
                                                <m:mo stretchy="false">(</m:mo>
                                                <m:mi>M</m:mi>
                                                <m:mo stretchy="false">[</m:mo>
                                                <m:mi>i</m:mi>
                                                <m:mo>+</m:mo>
                                                <m:mn>1..</m:mn>
                                                <m:mi>m</m:mi>
                                                <m:mo stretchy="false">]</m:mo>
                                                <m:mo stretchy="false">)</m:mo>
                                             </m:mrow>
                                          </m:mtd>
                                       </m:mtr>
                                    </m:mtable>
                                 </m:mrow>
                              </m:munder>
                              <m:mrow>
                                 <m:mi>Q</m:mi>
                                 <m:mo stretchy="false">(</m:mo>
                                 <m:mi>M</m:mi>
                                 <m:mo stretchy="false">[</m:mo>
                                 <m:mn>1..</m:mn>
                                 <m:mi>i</m:mi>
                                 <m:mo>&#8722;</m:mo>
                                 <m:mn>1</m:mn>
                                 <m:mo stretchy="false">]</m:mo>
                                 <m:mo>,</m:mo>
                                 <m:mi>s</m:mi>
                                 <m:mo stretchy="false">)</m:mo>
                                 <m:mo>&#215;</m:mo>
                                 <m:mi>p</m:mi>
                                 <m:mo stretchy="false">(</m:mo>
                                 <m:mi>x</m:mi>
                                 <m:mo stretchy="false">)</m:mo>
                              </m:mrow>
                           </m:mstyle>
                        </m:mrow>
                        <m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xI8qiVKYPFjYdHaVhbbf9v8qqaqFr0xc9vqFj0dXdbba91qpepeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGacaGaaiaabeqaaeqabiWaaaGcbaGaeeiuaaLaeeyla0IaeeODayNaeeyyaeMaeeiBaWMaeeyDauNaeeyzauMaeiikaGIaemyta0KaeiilaWccciGae8xSdeMaeiykaKIaeyypa0ZaaabuaeaacqWGrbqucqGGOaakcqWGnbqtcqGGBbWwcqaIXaqmcqGGUaGlcqGGUaGlcqWGPbqAcqGHsislcqaIXaqmcqGGDbqxcqGGSaalcqWGZbWCcqGGPaqkcqGHxdaTcqWGWbaCcqGGOaakcqWG4baEcqGGPaqkaSqaauaabeqaceaaaeaacqaIXaqmcqGHKjYOcqWGPbqAcqGHKjYOcqWGTbqBcqGGSaalcqWG4baEcqGHiiIZcqqHJoWucqGGSaalcqWGZbWCcqGH8aapcqWFXoqycqGHsislcqqGxbWvcqqGtbWucqGGOaakcqWGnbqtcqGGBbWwcqWGPbqAcqGGUaGlcqGGUaGlcqWGTbqBcqGGDbqxcqGGPaqkaeaacqWGZbWCcqGHRaWkcqWGnbqtcqGGOaakcqWGPbqAcqGGSaalcqWG4baEcqGGPaqkcqGHLjYScqWFXoqycqGHsislcqqGxbWvcqqGtbWucqGGOaakcqWGnbqtcqGGBbWwcqWGPbqAcqGHRaWkcqaIXaqmcqGGUaGlcqGGUaGlcqWGTbqBcqGGDbqxcqGGPaqkaaaabeqdcqGHris5aaaa@8CC7@</m:annotation>
                     </m:semantics>
                  </m:math>
               </display-formula>
            </p>
            <p><b>Proof</b>. We consider the set <inline-formula><m:math name="1748-7188-2-15-i11" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mtext mathvariant="script">P</m:mtext><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xH8viVGI8Gi=hEeeu0xXdbba9frFj0xb9qqpG0dXdb9aspeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGacaGaaiaabeqaaeqabiWaaaGcbaGaeeiuaafaaa@2CFA@</m:annotation></m:semantics></m:math></inline-formula>(<it>&#945;</it>) of words whose score is greater than or equal to <it>&#945;</it>: <inline-formula><m:math name="1748-7188-2-15-i11" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mtext mathvariant="script">P</m:mtext><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xH8viVGI8Gi=hEeeu0xXdbba9frFj0xb9qqpG0dXdb9aspeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGacaGaaiaabeqaaeqabiWaaaGcbaGaeeiuaafaaa@2CFA@</m:annotation></m:semantics></m:math></inline-formula>(<it>&#945;</it>) = {<it>w </it>&#8712; &#931;<sup><it>m</it></sup>|Score(<it>w</it>, <it>M</it>) &#8805; <it>&#945;</it>}. According to Lemma 4, each word of <inline-formula><m:math name="1748-7188-2-15-i11" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mtext mathvariant="script">P</m:mtext><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xH8viVGI8Gi=hEeeu0xXdbba9frFj0xb9qqpG0dXdb9aspeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGacaGaaiaabeqaaeqabiWaaaGcbaGaeeiuaafaaa@2CFA@</m:annotation></m:semantics></m:math></inline-formula>(<it>&#945;</it>) has a unique prefix that is good for <it>&#945;</it>. Conversely, Lemma 3 ensures that each word whose prefix is good for <it>&#945; </it>belongs to <inline-formula><m:math name="1748-7188-2-15-i11" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mtext mathvariant="script">P</m:mtext><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xH8viVGI8Gi=hEeeu0xXdbba9frFj0xb9qqpG0dXdb9aspeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGacaGaaiaabeqaaeqabiWaaaGcbaGaeeiuaafaaa@2CFA@</m:annotation></m:semantics></m:math></inline-formula>(<it>&#945;</it>). <inline-formula><m:math name="1748-7188-2-15-i11" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mtext mathvariant="script">P</m:mtext><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xH8viVGI8Gi=hEeeu0xXdbba9frFj0xb9qqpG0dXdb9aspeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGacaGaaiaabeqaaeqabiWaaaGcbaGaeeiuaafaaa@2CFA@</m:annotation></m:semantics></m:math></inline-formula>(<it>&#945;</it>) can thus be expressed as a union of disjoint sets.</p>
            <p>
               <display-formula id="M2">
                  <m:math name="1748-7188-2-15-i12" xmlns:m="http://www.w3.org/1998/Math/MathML">
                     <m:semantics>
                        <m:mrow>
                           <m:mtext mathvariant="script">P</m:mtext>
                           <m:mo stretchy="false">(</m:mo>
                           <m:mi>&#945;</m:mi>
                           <m:mo stretchy="false">)</m:mo>
                           <m:mo>=</m:mo>
                           <m:mstyle displaystyle="true">
                              <m:munder>
                                 <m:mo>&#8746;</m:mo>
                                 <m:mrow>
                                    <m:mi>u</m:mi>
                                    <m:mtext>&#160;is&#160;good&#160;for&#160;</m:mtext>
                                    <m:mi>&#945;</m:mi>
                                 </m:mrow>
                              </m:munder>
                              <m:mrow>
                                 <m:mi>u</m:mi>
                                 <m:msup>
                                    <m:mi>&#931;</m:mi>
                                    <m:mrow>
                                       <m:mi>m</m:mi>
                                       <m:mo>&#8722;</m:mo>
                                       <m:mrow>
                                          <m:mo>|</m:mo>
                                          <m:mi>u</m:mi>
                                          <m:mo>|</m:mo>
                                       </m:mrow>
                                    </m:mrow>
                                 </m:msup>
                              </m:mrow>
                           </m:mstyle>
                        </m:mrow>
                        <m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xI8qiVKYPFjYdHaVhbbf9v8qqaqFr0xc9vqFj0dXdbba91qpepeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGacaGaaiaabeqaaeqabiWaaaGcbaGaeeiuaaLaeiikaGccciGae8xSdeMaeiykaKIaeyypa0ZaambuaeaacqWG1bqDcqqHJoWudaahaaWcbeqaaiabd2gaTjabgkHiTmaaemaabaGaemyDauhacaGLhWUaayjcSdaaaaqaaiabdwha1jabbccaGiabbMgaPjabbohaZjabbccaGiabbEgaNjabb+gaVjabb+gaVjabbsgaKjabbccaGiabbAgaMjabb+gaVjabbkhaYjabbccaGiab=f7aHbqab0GaeSOkIufaaaa@4FD8@</m:annotation>
                     </m:semantics>
                  </m:math>
               </display-formula>
            </p>
            <p>It follows that</p>
            <p>
               <display-formula id="M3">
                  <m:math name="1748-7188-2-15-i13" xmlns:m="http://www.w3.org/1998/Math/MathML">
                     <m:semantics>
                        <m:mrow>
                           <m:mtext>P-value</m:mtext>
                           <m:mo stretchy="false">(</m:mo>
                           <m:mi>M</m:mi>
                           <m:mo>,</m:mo>
                           <m:mi>&#945;</m:mi>
                           <m:mo stretchy="false">)</m:mo>
                           <m:mo>=</m:mo>
                           <m:mstyle displaystyle="true">
                              <m:munder>
                                 <m:mo>&#8721;</m:mo>
                                 <m:mrow>
                                    <m:mi>u</m:mi>
                                    <m:mtext>&#160;is&#160;good&#160;for&#160;</m:mtext>
                                    <m:mi>&#945;</m:mi>
                                 </m:mrow>
                              </m:munder>
                              <m:mrow>
                                 <m:mi>p</m:mi>
                                 <m:mo stretchy="false">(</m:mo>
                                 <m:mi>u</m:mi>
                                 <m:mo stretchy="false">)</m:mo>
                              </m:mrow>
                           </m:mstyle>
                        </m:mrow>
                        <m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xI8qiVKYPFjYdHaVhbbf9v8qqaqFr0xc9vqFj0dXdbba91qpepeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGacaGaaiaabeqaaeqabiWaaaGcbaGaeeiuaaLaeeyla0IaeeODayNaeeyyaeMaeeiBaWMaeeyDauNaeeyzauMaeiikaGIaemyta0KaeiilaWccciGae8xSdeMaeiykaKIaeyypa0ZaaabuaeaacqWGWbaCcqGGOaakcqWG1bqDcqGGPaqkaSqaaiabdwha1jabbccaGiabbMgaPjabbohaZjabbccaGiabbEgaNjabb+gaVjabb+gaVjabbsgaKjabbccaGiabbAgaMjabb+gaVjabbkhaYjabbccaGiab=f7aHbqab0GaeyyeIuoaaaa@5498@</m:annotation>
                     </m:semantics>
                  </m:math>
               </display-formula>
            </p>
            <p>where <it>p</it>(<it>u</it>) denotes the probability of the string <it>u </it>in the background model. By definition of <it>Q</it>, we can deduce the expected result from Formula 3.</p>
            <p>Lemma 5 shows that it is not necessary to build the entire dynamic programming table for <it>Q</it>. Only values for <it>Q</it>(<it>M</it>[1..<it>i</it>], <it>s</it>) such that <it>s </it>&lt;<it>&#945; </it>- WS(<it>M</it>[<it>i </it>+ 1..<it>m</it>]) are to be computed. This gives rise to the FASTPVALUE algorithm, described in Figure <figr fid="F4">4</figr>.</p>
            <fig id="F4">
               <title>
                  <p>Figure 4</p>
               </title>
               <caption>
                  <p>Algorithm FastPvalue</p>
               </caption>
               <text>
                  <p>Algorithm FastPvalue.</p>
               </text>
               <graphic file="1748-7188-2-15-4"/>
            </fig>
         </sec>
         <sec>
            <st>
               <p>Permuting columns of the matrix</p>
            </st>
            <p>Algorithms 1 and 2 can also be used in combination with <it>permutated lookahead scoring </it><abbrgrp><abbr bid="B16">16</abbr></abbrgrp>. The matrix <it>M </it>can be transformed by permuting columns without modifying the overall score distribution. This is possible because the columns of the matrix are supposed to be independent. We show that it is also relevant for P-value calculation.</p>
            <p><b>Lemma 6 </b><it>Let M and N be two matrices of length m such that there exists a permutation &#960; on </it>{1,..., <it>m</it>} <it>satisfying, for each letter x of </it>&#931;, <it>M</it>(<it>i</it>, <it>x</it>) = <it>N</it>(<it>&#960;</it><sub><it>i</it></sub>, <it>x</it>)<it>. Then for any &#945;</it>, <it>Q</it>(<it>M</it>, <it>&#945;</it>) = <it>Q</it>(<it>N</it>, <it>&#945;</it>).</p>
            <p><b>Proof</b>. Let <it>u </it>be a word of &#931;<sup><it>m </it></sup>and let <inline-formula><m:math name="1748-7188-2-15-i14" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mrow><m:mi>v</m:mi><m:mo>=</m:mo><m:msub><m:mi>u</m:mi><m:mrow><m:msub><m:mi>&#960;</m:mi><m:mn>1</m:mn></m:msub></m:mrow></m:msub><m:mn>...</m:mn><m:mi>u</m:mi><m:msub><m:mi>&#960;</m:mi><m:mi>m</m:mi></m:msub></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xH8viVGI8Gi=hEeeu0xXdbba9frFj0xb9qqpG0dXdb9aspeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGacaGaaiaabeqaaeqabiWaaaGcbaGaemODayNaeyypa0JaemyDau3aaSbaaSqaaGGaciab=b8aWnaaBaaameaacqaIXaqmaeqaaaWcbeaakiabc6caUiabc6caUiabc6caUiabdwha1jab=b8aWnaaBaaaleaacqWGTbqBaeqaaaaa@3A49@</m:annotation></m:semantics></m:math></inline-formula>. By construction of <it>N</it>, we have Score(<it>u</it>, <it>M</it>) = Score(<it>v</it>, <it>N</it>). Since the background model is multinomial, we have <it>p</it>(<it>u</it>) = <it>p</it>(<it>v</it>). This completes the proof.</p>
            <p>The question is how to permute the columns of a given matrix to enhance the performances of the algorithms. In <abbrgrp><abbr bid="B6">6</abbr></abbrgrp>, it is suggested to sort columns by decreasing information content. We refine this rule of thumb and propose to minimize the total size of all score ranges involved in the dynamic programming decomposition for <it>Q </it>in Algorithm SCOREDISTRIBUTION. For each <it>i</it>, 1 &#8804; <it>i </it>&#8804; <it>m</it>, define <it>&#948;</it><sub><it>i </it></sub>as <it>&#948;</it><sub><it>i </it></sub>= BS(<it>M</it>[<it>i</it>..<it>i</it>]) - WS(<it>M</it>[<it>i</it>..<it>i</it>]).</p>
            <p><b>Lemma 7 </b><it>Let M be a matrix such that &#948;</it><sub>1 </sub>&#8805; ... &#8805; <it>&#948;</it><sub><it>m</it></sub>. <it>Then M minimizes the total size of all score ranges amongst all matrices that can be obtained by permutation of M</it>.</p>
            <p><b>Proof</b>. We write <it>SR</it>(<it>M</it>) for the total size of all score ranges of the matrix <it>M</it>. We have</p>
            <p>
               <display-formula>
                  <m:math name="1748-7188-2-15-i15" xmlns:m="http://www.w3.org/1998/Math/MathML">
                     <m:semantics>
                        <m:mrow>
                           <m:mtable columnalign="left">
                              <m:mtr columnalign="left">
                                 <m:mtd columnalign="left">
                                    <m:mrow>
                                       <m:mi>S</m:mi>
                                       <m:mi>R</m:mi>
                                       <m:mo stretchy="false">(</m:mo>
                                       <m:mi>M</m:mi>
                                       <m:mo stretchy="false">)</m:mo>
                                    </m:mrow>
                                 </m:mtd>
                                 <m:mtd columnalign="left">
                                    <m:mo>=</m:mo>
                                 </m:mtd>
                                 <m:mtd columnalign="left">
                                    <m:mrow>
                                       <m:mstyle displaystyle="true">
                                          <m:msubsup>
                                             <m:mo>&#8721;</m:mo>
                                             <m:mrow>
                                                <m:mi>i</m:mi>
                                                <m:mo>=</m:mo>
                                                <m:mn>1</m:mn>
                                             </m:mrow>
                                             <m:mrow>
                                                <m:mi>m</m:mi>
                                                <m:mo>&#8722;</m:mo>
                                                <m:mn>1</m:mn>
                                             </m:mrow>
                                          </m:msubsup>
                                          <m:mrow>
                                             <m:mo stretchy="false">(</m:mo>
                                             <m:mi>&#946;</m:mi>
                                             <m:mo>&#8722;</m:mo>
                                             <m:mtext>WS</m:mtext>
                                             <m:mo stretchy="false">(</m:mo>
                                             <m:mi>M</m:mi>
                                             <m:mo stretchy="false">[</m:mo>
                                             <m:mi>i</m:mi>
                                             <m:mo>+</m:mo>
                                             <m:mn>1..</m:mn>
                                             <m:mi>m</m:mi>
                                             <m:mo stretchy="false">]</m:mo>
                                             <m:mo stretchy="false">)</m:mo>
                                             <m:mo>&#8722;</m:mo>
                                             <m:mo stretchy="false">(</m:mo>
                                             <m:mi>&#945;</m:mi>
                                             <m:mo>&#8722;</m:mo>
                                             <m:mtext>BS</m:mtext>
                                             <m:mo stretchy="false">(</m:mo>
                                             <m:mi>M</m:mi>
                                             <m:mo stretchy="false">[</m:mo>
                                             <m:mi>i</m:mi>
                                             <m:mo>+</m:mo>
                                             <m:mn>1..</m:mn>
                                             <m:mi>m</m:mi>
                                             <m:mo stretchy="false">]</m:mo>
                                             <m:mo stretchy="false">)</m:mo>
                                             <m:mo stretchy="false">)</m:mo>
                                             <m:mo>+</m:mo>
                                             <m:mo stretchy="false">(</m:mo>
                                             <m:mi>&#946;</m:mi>
                                             <m:mo>&#8722;</m:mo>
                                             <m:mi>&#945;</m:mi>
                                             <m:mo stretchy="false">)</m:mo>
                                          </m:mrow>
                                       </m:mstyle>
                                    </m:mrow>
                                 </m:mtd>
                              </m:mtr>
                              <m:mtr columnalign="left">
                                 <m:mtd columnalign="left">
                                    <m:mrow/>
                                 </m:mtd>
                                 <m:mtd columnalign="left">
                                    <m:mo>=</m:mo>
                                 </m:mtd>
                                 <m:mtd columnalign="left">
                                    <m:mrow>
                                       <m:mi>m</m:mi>
                                       <m:mo stretchy="false">(</m:mo>
                                       <m:mi>&#946;</m:mi>
                                       <m:mo>&#8722;</m:mo>
                                       <m:mi>&#945;</m:mi>
                                       <m:mo stretchy="false">)</m:mo>
                                       <m:mo>+</m:mo>
                                       <m:mstyle displaystyle="true">
                                          <m:msubsup>
                                             <m:mo>&#8721;</m:mo>
                                             <m:mrow>
                                                <m:mi>i</m:mi>
                                                <m:mo>=</m:mo>
                                                <m:mn>2</m:mn>
                                             </m:mrow>
                                             <m:mi>m</m:mi>
                                          </m:msubsup>
                                          <m:mrow>
                                             <m:mtext>BS</m:mtext>
                                             <m:mo stretchy="false">(</m:mo>
                                             <m:mi>M</m:mi>
                                             <m:mo stretchy="false">[</m:mo>
                                             <m:mi>i</m:mi>
                                             <m:mn>..</m:mn>
                                             <m:mi>m</m:mi>
                                             <m:mo stretchy="false">]</m:mo>
                                             <m:mo stretchy="false">)</m:mo>
                                             <m:mo>&#8722;</m:mo>
                                             <m:mtext>WS</m:mtext>
                                             <m:mo stretchy="false">(</m:mo>
                                             <m:mi>M</m:mi>
                                             <m:mo stretchy="false">[</m:mo>
                                             <m:mi>i</m:mi>
                                             <m:mn>..</m:mn>
                                             <m:mi>m</m:mi>
                                             <m:mo stretchy="false">]</m:mo>
                                             <m:mo stretchy="false">)</m:mo>
                                          </m:mrow>
                                       </m:mstyle>
                                    </m:mrow>
                                 </m:mtd>
                              </m:mtr>
                              <m:mtr columnalign="left">
                                 <m:mtd columnalign="left">
                                    <m:mrow/>
                                 </m:mtd>
                                 <m:mtd columnalign="left">
                                    <m:mo>=</m:mo>
                                 </m:mtd>
                                 <m:mtd columnalign="left">
                                    <m:mrow>
                                       <m:mi>m</m:mi>
                                       <m:mo stretchy="false">(</m:mo>
                                       <m:mi>&#946;</m:mi>
                                       <m:mo>&#8722;</m:mo>
                                       <m:mi>&#945;</m:mi>
                                       <m:mo stretchy="false">)</m:mo>
                                       <m:mo>+</m:mo>
                                       <m:mstyle displaystyle="true">
                                          <m:msubsup>
                                             <m:mo>&#8721;</m:mo>
                                             <m:mrow>
                                                <m:mi>i</m:mi>
                                                <m:mo>=</m:mo>
                                                <m:mn>2</m:mn>
                                             </m:mrow>
                                             <m:mi>m</m:mi>
                                          </m:msubsup>
                                          <m:mrow>
                                             <m:mstyle displaystyle="true">
                                                <m:msubsup>
                                                   <m:mo>&#8721;</m:mo>
                                                   <m:mrow>
                                                      <m:mi>j</m:mi>
                                                      <m:mo>=</m:mo>
                                                      <m:mi>i</m:mi>
                                                   </m:mrow>
                                                   <m:mi>m</m:mi>
                                                </m:msubsup>
                                                <m:mrow>
                                                   <m:msub>
                                                      <m:mi>&#948;</m:mi>
                                                      <m:mi>j</m:mi>
                                                   </m:msub>
                                                </m:mrow>
                                             </m:mstyle>
                                          </m:mrow>
                                       </m:mstyle>
                                    </m:mrow>
                                 </m:mtd>
                              </m:mtr>
                              <m:mtr columnalign="left">
                                 <m:mtd columnalign="left">
                                    <m:mrow/>
                                 </m:mtd>
                                 <m:mtd columnalign="left">
                                    <m:mo>=</m:mo>
                                 </m:mtd>
                                 <m:mtd columnalign="left">
                                    <m:mrow>
                                       <m:mi>m</m:mi>
                                       <m:mo stretchy="false">(</m:mo>
                                       <m:mi>&#946;</m:mi>
                                       <m:mo>&#8722;</m:mo>
                                       <m:mi>&#945;</m:mi>
                                       <m:mo stretchy="false">)</m:mo>
                                       <m:mo>+</m:mo>
                                       <m:mstyle displaystyle="true">
                                          <m:msubsup>
                                             <m:mo>&#8721;</m:mo>
                                             <m:mrow>
                                                <m:mi>i</m:mi>
                                                <m:mo>=</m:mo>
                                                <m:mn>2</m:mn>
                                             </m:mrow>
                                             <m:mi>m</m:mi>
                                          </m:msubsup>
                                          <m:mrow>
                                             <m:mo stretchy="false">(</m:mo>
                                             <m:mi>i</m:mi>
                                             <m:mo>&#8722;</m:mo>
                                             <m:mn>1</m:mn>
                                             <m:mo stretchy="false">)</m:mo>
                                             <m:msub>
                                                <m:mi>&#948;</m:mi>
                                                <m:mi>i</m:mi>
                                             </m:msub>
                                          </m:mrow>
                                       </m:mstyle>
                                    </m:mrow>
                                 </m:mtd>
                              </m:mtr>
                           </m:mtable>
                        </m:mrow>
                        <m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xI8qiVKYPFjYdHaVhbbf9v8qqaqFr0xc9vqFj0dXdbba91qpepeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGacaGaaiaabeqaaeqabiWaaaGcbaqbaeaabqWaaaaabaGaem4uamLaemOuaiLaeiikaGIaemyta0KaeiykaKcabaGaeyypa0dabaWaaabmaeaacqGGOaakiiGacqWFYoGycqGHsislcqqGxbWvcqqGtbWucqGGOaakcqWGnbqtcqGGBbWwcqWGPbqAcqGHRaWkcqaIXaqmcqGGUaGlcqGGUaGlcqWGTbqBcqGGDbqxcqGGPaqkcqGHsislcqGGOaakcqWFXoqycqGHsislcqqGcbGqcqqGtbWucqGGOaakcqWGnbqtcqGGBbWwcqWGPbqAcqGHRaWkcqaIXaqmcqGGUaGlcqGGUaGlcqWGTbqBcqGGDbqxcqGGPaqkcqGGPaqkcqGHRaWkcqGGOaakcqWFYoGycqGHsislcqWFXoqycqGGPaqkaSqaaiabdMgaPjabg2da9iabigdaXaqaaiabd2gaTjabgkHiTiabigdaXaqdcqGHris5aaGcbaaabaGaeyypa0dabaGaemyBa0MaeiikaGIae8NSdiMaeyOeI0Iae8xSdeMaeiykaKIaey4kaSYaaabmaeaacqqGcbGqcqqGtbWucqGGOaakcqWGnbqtcqGGBbWwcqWGPbqAcqGGUaGlcqGGUaGlcqWGTbqBcqGGDbqxcqGGPaqkcqGHsislcqqGxbWvcqqGtbWucqGGOaakcqWGnbqtcqGGBbWwcqWGPbqAcqGGUaGlcqGGUaGlcqWGTbqBcqGGDbqxcqGGPaqkaSqaaiabdMgaPjabg2da9iabikdaYaqaaiabd2gaTbqdcqGHris5aaGcbaaabaGaeyypa0dabaGaemyBa0MaeiikaGIae8NSdiMaeyOeI0Iae8xSdeMaeiykaKIaey4kaSYaaabmaeaadaaeWaqaaiab=r7aKnaaBaaaleaacqWGQbGAaeqaaaqaaiabdQgaQjabg2da9iabdMgaPbqaaiabd2gaTbqdcqGHris5aaWcbaGaemyAaKMaeyypa0JaeGOmaidabaGaemyBa0ganiabggHiLdaakeaaaeaacqGH9aqpaeaacqWGTbqBcqGGOaakcqWFYoGycqGHsislcqWFXoqycqGGPaqkcqGHRaWkdaaeWaqaaiabcIcaOiabdMgaPjabgkHiTiabigdaXiabcMcaPiab=r7aKnaaBaaaleaacqWGPbqAaeqaaaqaaiabdMgaPjabg2da9iabikdaYaqaaiabd2gaTbqdcqGHris5aaaaaaa@C16D@</m:annotation>
                     </m:semantics>
                  </m:math>
               </display-formula>
            </p>
            <p>Since permutation of matrices induces a permutation of the sequence <it>&#948;</it><sub>2</sub>,..., <it>&#948;</it><sub><it>m</it></sub>, the value <inline-formula><m:math name="1748-7188-2-15-i16" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mrow><m:mstyle displaystyle="true"><m:msubsup><m:mo>&#8721;</m:mo><m:mrow><m:mi>i</m:mi><m:mo>=</m:mo><m:mn>2</m:mn></m:mrow><m:mi>m</m:mi></m:msubsup><m:mrow><m:mo stretchy="false">(</m:mo><m:mi>i</m:mi><m:mo>&#8722;</m:mo><m:mn>1</m:mn><m:mo stretchy="false">)</m:mo><m:msub><m:mi>&#948;</m:mi><m:mi>i</m:mi></m:msub></m:mrow></m:mstyle></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xH8viVGI8Gi=hEeeu0xXdbba9frFj0xb9qqpG0dXdb9aspeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGacaGaaiaabeqaaeqabiWaaaGcbaWaaabmaeaacqGGOaakcqWGPbqAcqGHsislcqaIXaqmcqGGPaqkiiGacqWF0oazdaWgaaWcbaGaemyAaKgabeaaaeaacqWGPbqAcqGH9aqpcqaIYaGmaeaacqWGTbqBa0GaeyyeIuoaaaa@3A9D@</m:annotation></m:semantics></m:math></inline-formula> is minimal when <it>&#948;</it><sub>1 </sub>&#8805; <it>&#948;</it><sub>2 </sub>&#8805; ... &#8805; <it>&#948;</it><sub><it>m</it></sub>.</p>
            <p>In the remaining of this paper, we shall always assume that the matrix <it>M </it>has been permuted so that it fulfills the condition on (<it>&#948;</it><sub><it>i</it></sub>)<sub>1&#8804;<it>i</it>&#8804;<it>m </it></sub>of Lemma 7. This is simply a pre-processing of the matrix that does not affect the course of the algorithms.</p>
         </sec>
         <sec>
            <st>
               <p>Efficient algorithms for computing the P-value without error</p>
            </st>
            <p>We now come to the presentation of two exact algorithms, which is are the main algorithms of this paper. In Algorithms SCOREDISTRIBUTION and FASTPVALUE, the number of accessible scores plays an essential role in the time and space complexity. As mentioned in the Background section, this number can be as large as |&#931;|<sup><it>m</it></sup>. In practice, it strongly depends on the involved matrix and on the way the score distribution is approximated by round matrices. The choice of the precision is critical. Algorithms SCOREDISTRIBUTION and FASTPVALUE should compromise between accuracy, with faithful approximation, and efficiency, with rough approximation.</p>
            <p>To overcome this problem, we propose to define successive discretized score distributions with growing accuracy. The key idea is to take advantage of the shape of the score distribution <it>Q</it>, and to use small granularity values only in the portions of the distribution where it is required. This is a kind of selective zooming process. Discretized score distributions are built from round matrices.</p>
            <p>
               <b>Definition 3 (Round matrix) </b>
               <it>Let M be a matrix of real coefficient values of length m and let &#949; be a positive real number. We denote M</it>
               <sub>
                  <it>&#949; </it>
               </sub>
               <it>the round matrix deduced from M by rounding each value by &#949;:</it>
            </p>
            <p>
               <display-formula>
                  <m:math name="1748-7188-2-15-i17" xmlns:m="http://www.w3.org/1998/Math/MathML">
                     <m:semantics>
                        <m:mrow>
                           <m:msub>
                              <m:mi>M</m:mi>
                              <m:mi>&#949;</m:mi>
                           </m:msub>
                           <m:mo stretchy="false">(</m:mo>
                           <m:mi>i</m:mi>
                           <m:mo>,</m:mo>
                           <m:mi>x</m:mi>
                           <m:mo stretchy="false">)</m:mo>
                           <m:mo>=</m:mo>
                           <m:mi>&#949;</m:mi>
                           <m:mrow>
                              <m:mo>&#8970;</m:mo>
                              <m:mrow>
                                 <m:mfrac>
                                    <m:mrow>
                                       <m:mi>M</m:mi>
                                       <m:mo stretchy="false">(</m:mo>
                                       <m:mi>i</m:mi>
                                       <m:mo>,</m:mo>
                                       <m:mi>x</m:mi>
                                       <m:mo stretchy="false">)</m:mo>
                                    </m:mrow>
                                    <m:mi>&#949;</m:mi>
                                 </m:mfrac>
                              </m:mrow>
                              <m:mo>&#8971;</m:mo>
                           </m:mrow>
                        </m:mrow>
                        <m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xI8qiVKYPFjYdHaVhbbf9v8qqaqFr0xc9vqFj0dXdbba91qpepeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGacaGaaiaabeqaaeqabiWaaaGcbaGaemyta00aaSbaaSqaaGGaciab=v7aLbqabaGccqGGOaakcqWGPbqAcqGGSaalcqWG4baEcqGGPaqkcqGH9aqpcqWF1oqzdaGbdaqcfayaamaalaaabaGaemyta0KaeiikaGIaemyAaKMaeiilaWIaemiEaGNaeiykaKcabaGae8xTdugaaaGccaGLWJVaay5+4daaaa@4521@</m:annotation>
                     </m:semantics>
                  </m:math>
               </display-formula>
            </p>
            <p><it>&#949; is called the </it>granularity. <it>Given &#949;, we can define E, the </it>maximal error <it>induced by M</it><sub><it>&#949;</it></sub>.</p>
            <p>
               <display-formula id="M4">
                  <m:math name="1748-7188-2-15-i18" xmlns:m="http://www.w3.org/1998/Math/MathML">
                     <m:semantics>
                        <m:mrow>
                           <m:mi>E</m:mi>
                           <m:mo>=</m:mo>
                           <m:mstyle displaystyle="true">
                              <m:munderover>
                                 <m:mo>&#8721;</m:mo>
                                 <m:mrow>
                                    <m:mi>i</m:mi>
                                    <m:mo>=</m:mo>
                                    <m:mn>1</m:mn>
                                 </m:mrow>
                                 <m:mi>m</m:mi>
                              </m:munderover>
                              <m:mrow>
                                 <m:mi>max</m:mi>
                                 <m:mo>&#8289;</m:mo>
                                 <m:mo>{</m:mo>
                                 <m:mi>M</m:mi>
                                 <m:mo stretchy="false">(</m:mo>
                                 <m:mi>i</m:mi>
                                 <m:mo>,</m:mo>
                                 <m:mi>x</m:mi>
                                 <m:mo stretchy="false">)</m:mo>
                                 <m:mo>&#8722;</m:mo>
                                 <m:msub>
                                    <m:mi>M</m:mi>
                                    <m:mi>&#949;</m:mi>
                                 </m:msub>
                                 <m:mo stretchy="false">(</m:mo>
                                 <m:mi>i</m:mi>
                                 <m:mo>,</m:mo>
                                 <m:mi>x</m:mi>
                                 <m:mo stretchy="false">)</m:mo>
                                 <m:mo>,</m:mo>
                                 <m:mi>x</m:mi>
                                 <m:mo>&#8712;</m:mo>
                                 <m:mi>&#931;</m:mi>
                                 <m:mo>}</m:mo>
                              </m:mrow>
                           </m:mstyle>
                        </m:mrow>
                        <m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xI8qiVKYPFjYdHaVhbbf9v8qqaqFr0xc9vqFj0dXdbba91qpepeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGacaGaaiaabeqaaeqabiWaaaGcbaGaemyrauKaeyypa0ZaaabCaeaacyGGTbqBcqGGHbqycqGG4baEcqGG7bWEcqWGnbqtcqGGOaakcqWGPbqAcqGGSaalcqWG4baEcqGGPaqkcqGHsislcqWGnbqtdaWgaaWcbaacciGae8xTdugabeaakiabcIcaOiabdMgaPjabcYcaSiabdIha4jabcMcaPiabcYcaSiabdIha4jabgIGiolabfo6atjabc2ha9bWcbaGaemyAaKMaeyypa0JaeGymaedabaGaemyBa0ganiabggHiLdaaaa@519A@</m:annotation>
                     </m:semantics>
                  </m:math>
               </display-formula>
            </p>
            <p><b>Lemma 8 </b><it>Let M be a matrix, &#949; the granularity, and E the maximal error associated. For each word u of </it>&#931;<sup><it>m</it></sup><it>, we have </it>0 &#8804; Score(<it>u</it>, <it>M</it>) - Score(<it>u</it>, <it>M</it><sub><it>&#949;</it></sub>) &#8804; <it>E</it>.</p>
            <p><b>Proof</b>. This is a straightforward consequence of Definition 3 for <it>M</it><sub><it>&#949; </it></sub>and <it>E</it>.</p>
            <p>
               <b>Lemma 9 </b>
               <it>Let M, N and N' be three matrices of length m, E, E' be two non-negative real numbers, &#945;, &#946; be two scores such that &#945; &#8804; &#946;, satisfying the following hypotheses:</it>
            </p>
            <p><it>(i) for each word u in </it>&#931;<sup><it>m</it></sup>, Score(<it>u</it>, <it>N</it>) &#8804; Score(<it>u</it>, <it>M</it>) &#8804; Score(<it>u</it>, <it>N</it>) + <it>E</it>,</p>
            <p><it>(ii) for each word u in </it>&#931;<sup><it>m</it></sup>, Score(<it>u</it>, <it>N'</it>) &#8804; Score(<it>u</it>, <it>N</it>) &#8804; Score(<it>u</it>, <it>M</it>) &#8804; Score(<it>u</it>, <it>N'</it>) + <it>E'</it>,</p>
            <p><it>(iii) </it>P-value(<it>N</it>, <it>&#945; </it>- <it>E</it>) = P-value(<it>N</it>, <it>&#945;</it>),</p>
            <p><it>(iv) </it>P-value(<it>N'</it>, <it>&#946; </it>- <it>E'</it>) = P-value(<it>N'</it>, <it>&#946;</it>),</p>
            <p>
               <it>then</it>
            </p>
            <p>
               <display-formula>
                  <m:math name="1748-7188-2-15-i19" xmlns:m="http://www.w3.org/1998/Math/MathML">
                     <m:semantics>
                        <m:mrow>
                           <m:mstyle displaystyle="true">
                              <m:munder>
                                 <m:mo>&#8721;</m:mo>
                                 <m:mrow>
                                    <m:mtable>
                                       <m:mtr>
                                          <m:mtd>
                                             <m:mrow>
                                                <m:mi>&#945;</m:mi>
                                                <m:mo>&#8804;</m:mo>
                                                <m:mi>t</m:mi>
                                                <m:mo>&lt;</m:mo>
                                                <m:mi>&#946;</m:mi>
                                             </m:mrow>
                                          </m:mtd>
                                       </m:mtr>
                                       <m:mtr>
                                          <m:mtd>
                                             <m:mrow>
                                                <m:mi>t</m:mi>
                                                <m:mtext>&#160;</m:mtext>
                                                <m:mi>a</m:mi>
                                                <m:mi>c</m:mi>
                                                <m:mi>c</m:mi>
                                                <m:mi>e</m:mi>
                                                <m:mi>s</m:mi>
                                                <m:mi>s</m:mi>
                                                <m:mi>i</m:mi>
                                                <m:mi>b</m:mi>
                                                <m:mi>l</m:mi>
                                                <m:mi>e</m:mi>
                                             </m:mrow>
                                          </m:mtd>
                                       </m:mtr>
                                    </m:mtable>
                                 </m:mrow>
                              </m:munder>
                              <m:mrow>
                                 <m:mi>Q</m:mi>
                                 <m:mo stretchy="false">(</m:mo>
                                 <m:mi>N</m:mi>
                                 <m:mo>,</m:mo>
                                 <m:mi>t</m:mi>
                                 <m:mo stretchy="false">)</m:mo>
                              </m:mrow>
                           </m:mstyle>
                           <m:mo>=</m:mo>
                           <m:mstyle displaystyle="true">
                              <m:munder>
                                 <m:mo>&#8721;</m:mo>
                                 <m:mrow>
                                    <m:mtable>
                                       <m:mtr>
                                          <m:mtd>
                                             <m:mrow>
                                                <m:mi>&#945;</m:mi>
                                                <m:mo>&#8804;</m:mo>
                                                <m:mi>t</m:mi>
                                                <m:mo>&lt;</m:mo>
                                                <m:mi>&#946;</m:mi>
                                             </m:mrow>
                                          </m:mtd>
                                       </m:mtr>
                                       <m:mtr>
                                          <m:mtd>
                                             <m:mrow>
                                                <m:mi>t</m:mi>
                                                <m:mtext>&#160;</m:mtext>
                                                <m:mi>a</m:mi>
                                                <m:mi>c</m:mi>
                                                <m:mi>c</m:mi>
                                                <m:mi>e</m:mi>
                                                <m:mi>s</m:mi>
                                                <m:mi>s</m:mi>
                                                <m:mi>i</m:mi>
                                                <m:mi>b</m:mi>
                                                <m:mi>l</m:mi>
                                                <m:mi>e</m:mi>
                                             </m:mrow>
                                          </m:mtd>
                                       </m:mtr>
                                    </m:mtable>
                                 </m:mrow>
                              </m:munder>
                              <m:mrow>
                                 <m:mi>Q</m:mi>
                                 <m:mo stretchy="false">(</m:mo>
                                 <m:mi>M</m:mi>
                                 <m:mo>,</m:mo>
                                 <m:mi>t</m:mi>
                                 <m:mo stretchy="false">)</m:mo>
                              </m:mrow>
                           </m:mstyle>
                        </m:mrow>
                        <m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xI8qiVKYPFjYdHaVhbbf9v8qqaqFr0xc9vqFj0dXdbba91qpepeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGacaGaaiaabeqaaeqabiWaaaGcbaWaaabuaeaacqWGrbqucqGGOaakcqWGobGtcqGGSaalcqWG0baDcqGGPaqkaSqaauaabeqaceaaaeaaiiGacqWFXoqycqGHKjYOcqWG0baDcqGH8aapcqWFYoGyaeaacqWG0baDcqqGGaaicqWGHbqycqWGJbWycqWGJbWycqWGLbqzcqWGZbWCcqWGZbWCcqWGPbqAcqWGIbGycqWGSbaBcqWGLbqzaaaabeqdcqGHris5aOGaeyypa0ZaaabuaeaacqWGrbqucqGGOaakcqWGnbqtcqGGSaalcqWG0baDcqGGPaqkaSqaauaabeqaceaaaeaacqWFXoqycqGHKjYOcqWG0baDcqGH8aapcqWFYoGyaeaacqWG0baDcqqGGaaicqWGHbqycqWGJbWycqWGJbWycqWGLbqzcqWGZbWCcqWGZbWCcqWGPbqAcqWGIbGycqWGSbaBcqWGLbqzaaaabeqdcqGHris5aaaa@6C5D@</m:annotation>
                     </m:semantics>
                  </m:math>
               </display-formula>
            </p>
            <p><b>Proof</b>. Let <it>u </it>be a string in &#931;<sup><it>m</it></sup>. It is enough to establish that <it>&#945; </it>&#8804; Score(<it>u</it>, <it>N</it>) &lt;<it>&#946; </it>if, and only if, <it>&#945; </it>&#8804; Score(<it>u</it>, <it>M</it>) &lt;<it>&#946;</it>. The proof is divided into four parts.</p>
            <p>- If <it>&#945; </it>&#8804; Score(<it>u</it>, <it>N</it>), then <it>&#945; </it>&#8804; Score(<it>u</it>, <it>M</it>): This is a consequence of Score(<it>u</it>, <it>N</it>) &#8804; Score(<it>u</it>, <it>M</it>) in (i).</p>
            <p>- If <it>&#945; </it>&#8804; Score(<it>u</it>, <it>M</it>), then <it>&#945; </it>&#8804; Score(<it>u</it>, <it>N</it>): By hypothesis (i) on <it>E</it>, <it>&#945; </it>&#8804; Score(<it>u</it>, <it>M</it>) implies <it>&#945; </it>- <it>E </it>&#8804; Score(<it>u</it>, <it>N</it>). Since P-value(<it>N</it>, <it>&#945; </it>- <it>E</it>) = P-value(<it>N</it>, <it>&#945;</it>) with (iii), it follows that <it>&#945; </it>&#8804; Score(<it>u</it>, <it>N</it>).</p>
            <p>- If Score(<it>u</it>, <it>N</it>) &lt;<it>&#946;</it>, then Score(<it>u</it>, <it>M</it>) &lt;<it>&#946;</it>: By hypothesis (ii), Score(<it>u</it>, <it>N</it>) &lt;<it>&#946; </it>implies that Score(<it>u</it>, <it>N'</it>) &lt;<it>&#946;</it>. According to (iv), this ensures that Score(<it>u</it>, <it>N'</it>) &lt;<it>&#946; </it>- <it>E'</it>, which with (ii) guarantees Score(<it>u</it>, <it>M</it>) &lt;<it>&#946;</it></p>
            <p>- If Score(<it>u</it>, <it>M</it>) &lt;<it>&#946;</it>, then Score(<it>u</it>, <it>N</it>) &lt;<it>&#946;</it>: This is a consequence of Score(<it>u</it>, <it>N</it>) &#8804; Score(<it>u</it>, <it>M</it>) in (i).</p>
            <p>What does this statement tell us ? It provides a sufficient condition for the distribution score <it>Q </it>computed with a round matrix to be valid for the initial matrix <it>M</it>. Assume that you can observe two plateaux ending respectively at <it>&#945; </it>and <it>&#946; </it>in the score distribution of <it>M</it><sub><it>&#949;</it></sub>. Then the approximation of the total probability for the score range [<it>&#945;</it>, <it>&#946;</it>[obtained with the round matrix is indeed the exact probability. In other words, there is no need to use smaller granularity values in this region to improve the result.</p>
            <sec>
               <st>
                  <p>From score to P-value</p>
               </st>
               <p>Lemma 9 is used through a stepwise algorithm to compute the P-value of a score threshold. Let <it>&#945; </it>be the score for which we want to determine the associated P-value. We estimate the score distribution <it>Q </it>iteratively. For that, we consider a series of round matrices <it>M</it><sub><it>&#949; </it></sub>for decreasing values of <it>&#949;</it>, and calculate successive values P-value (<it>M</it><sub><it>&#949;</it></sub>, <it>&#945;</it>). The efficiency of the method is guaranteed by two properties. First, we introduce a stop condition that allows us to stop as soon as it is guaranteed that the exact value of the P-value is reached. Second, we carefully select relevant portions of the score distribution for which the computation should go on. This tends to restrain the score range to inspect at each step. The algorithm is displayed in Figure <figr fid="F5">5</figr>.</p>
               <fig id="F5">
                  <title>
                     <p>Figure 5</p>
                  </title>
                  <caption>
                     <p>Algorithm From Score to P-value</p>
                  </caption>
                  <text>
                     <p>Algorithm From Score to P-value.</p>
                  </text>
                  <graphic file="1748-7188-2-15-5"/>
               </fig>
               <p>The correctness of the algorithm comes from the two next Lemmas. The first Lemma establishes that the loop invariants hold.</p>
               <p><b>Lemma 10 </b><it>Throughout Algorithm 3, the variables &#946; and P satisfy the invariant relation P </it>= P-value(<it>M</it>, <it>&#946;</it>).</p>
               <p><b>Proof</b>. This is a consequence of invariant 1 and invariant 2 in Algorithm 3. Both invariants are valid for initial conditions. When <it>P </it>= 0 and <it>&#946; </it>= BS(<it>M</it>) + 1: P-value(<it>M</it>, BS(<it>M</it>) + 1) = 0. Regarding <it>N'</it>, choose <it>N' </it>= <it>M</it><sub><it>&#949;</it></sub>.</p>
               <p>There are two cases to consider for invariant 1.</p>
               <p>- If <it>s </it>does not exist. <it>P </it>and <it>&#946; </it>remain unchanged, so we still have <it>P </it>= P-value(<it>M</it>, <it>&#946;</it>). Regarding invariant 2, if there exists such a matrix <it>N' </it>at the former step for <it>M</it><sub><it>k&#949;</it></sub>, then it is still suitable for <it>M</it><sub><it>&#949;</it></sub>.</p>
               <p>- If <it>s </it>actually exists. invariant 1 implies that <it>P </it>is updated to P-value(<it>M</it>, <it>&#946;</it>) + &#8721;<sub><it>s</it>&#8804;<it>t</it>&lt;<it>&#946; </it></sub><it>Q</it>(<it>M</it><sub><it>&#949;</it></sub>, <it>t</it>).</p>
               <p>According to Lemma 9 and invariant 2, we have &#8721;<sub><it>s</it>&#8804;<it>t</it>&lt;<it>&#946; </it></sub><it>Q</it>(<it>M</it><sub><it>&#949;</it></sub>, <it>t</it>) = &#8721;<sub><it>s</it>&#8804;<it>t</it>&lt;<it>&#946; </it></sub><it>Q</it>(<it>M</it>, <it>t</it>). Hence <it>P </it>= P-value(<it>M</it>, <it>s</it>). Since <it>&#946; </it>is updated to <it>s</it>, it follows that <it>P </it>= P-value(<it>M</it>, <it>&#946;</it>). Regarding invariant 2, take <it>N' </it>= <it>M</it><sub><it>&#949;</it></sub>.</p>
               <p>The second Lemma shows that when the stop condition is met, the final value of the variable <it>P </it>is indeed the expected result P-value(<it>M</it>, <it>&#945;</it>).</p>
               <p><b>Lemma 11 </b><it>At the end of Algorithm 3, P </it>= P-value(<it>M</it>, <it>&#945;</it>).</p>
               <p><b>Proof</b>. When <it>s </it>= <it>&#945; </it>- <it>E</it>, then <it>&#946; </it>= <it>&#945;</it>. According to Lemma 10, it implies <it>P </it>= P-value(<it>M</it><sub><it>&#949;</it></sub>, <it>&#945;</it>). Since the stop condition implies that P-value(<it>M</it><sub><it>&#949;</it></sub>, <it>&#945; </it>- <it>E</it>) = P-value(<it>M</it><sub><it>&#949;</it></sub>, <it>&#945;</it>), Lemma 9 ensures that P-value(<it>M</it><sub><it>&#949;</it></sub>, <it>&#945;</it>) = P-value(<it>M</it>, <it>&#945;</it>).</p>
            </sec>
            <sec>
               <st>
                  <p>From P-value to score</p>
               </st>
               <p>Similarly, Lemma 9 is used to design an algorithm to compute the score threshold associated to a given P-value. We first show that the score threshold obtained with a round matrix for a P-value gives some insight about the potential score interval for the initial matrix <it>M</it>.</p>
               <p><b>Lemma 12 </b><it>Let M be a matrix, &#949; a granularity and E the maximal error associated. Given P</it>, 0 &#8804; <it>P </it>&#8804; 1, <it>we have</it></p>
               <p>
                  <display-formula>Threshold(<it>M</it><sub><it>&#949;</it></sub>, <it>P</it>) &#8804; Threshold(<it>M</it>, <it>P</it>) &#8804; Threshold(<it>M</it><sub><it>&#949;</it></sub>, <it>P</it>) + <it>E</it></display-formula>
               </p>
               <p><b>Proof</b>. Let <it>&#946; </it>= Threshold(<it>M</it><sub><it>&#949;</it></sub>, <it>P</it>). According to Lemma 8, P-value(<it>M</it><sub><it>&#949;</it></sub>, <it>&#946;</it>) &#8805; <it>P </it>implies P-value(<it>M</it>, <it>&#946;</it>) &#8805; <it>P</it>, which yields <it>&#946; </it>&#8804; Threshold(<it>M</it>, <it>P</it>). So it remains to establish that Threshold(<it>M</it>, <it>P</it>) &#8804; <it>&#946; </it>+ <it>E</it>. If P-value(<it>M</it>, <it>&#946; </it>+ <it>E</it>) = 0, then the highest accessible score for <it>M </it>is smaller than <it>&#946; </it>+ <it>E</it>. In this case, the expected result is straightforward. Otherwise, there exists <it>&#946;' </it>such that <it>&#946;' </it>is the lowest accessible score for <it>M </it>that is strictly greater than <it>&#946; </it>+ <it>E</it>. Since <it>s </it>&#8594; P-value(<it>M</it>, <it>s</it>) is a decreasing function in <it>s</it>, we have to verify that P-value(<it>M</it>, <it>&#946;'</it>) &lt;<it>P </it>to complete the proof of the Lemma. Assume that P-value(<it>M</it>, <it>&#946;'</it>) &#8805; <it>P</it>. Let <it>&#947; </it>= min {Score(<it>u</it>, <it>M</it><sub><it>&#949;</it></sub>)|<it>u </it>&#8712; &#931;<sup><it>m </it></sup>&#8743; Score(<it>u</it>, <it>M</it>) &#8805; <it>&#946;'</it>}. On the one hand, the definition of <it>&#947; </it>implies that</p>
               <p>
                  <display-formula id="M5">P-value(<it>M</it>, <it>&#946;'</it>) &#8804; P-value(<it>M</it><sub><it>&#949;</it></sub>, <it>&#947;</it>)</display-formula>
               </p>
               <p>On the other hand, <it>&#947; </it>is an accessible score for <it>M</it><sub><it>&#949; </it></sub>that satisfies <it>&#947; </it>&#8805; <it>&#946;' </it>- <it>E </it>> <it>&#946;</it>. By hypothesis of <it>&#946;</it>, it follows that</p>
               <p>
                  <display-formula id="M6">P-value(<it>M</it><sub><it>&#949;</it></sub>, <it>&#947;</it>) &lt;<it>P</it></display-formula>
               </p>
               <p>Equations 5 and 6 contradict the assumption that P-value(<it>M</it>, <it>&#946;'</it>) &#8805; <it>P</it>. Thus P-value(<it>M</it>, <it>&#946;'</it>) &lt;<it>P</it>.</p>
               <p>The sketch of the algorithm is as follows. Let <it>P </it>be the desired P-value. We compute iteratively the associated score threshold for successive decreasing values of <it>&#949;</it>. At each step, we use Lemma 12 to speed the calculation for the matrix <it>M</it><sub><it>&#949;</it></sub>. This Lemma allows us to restrain the computation of the detailed score distribution <it>Q </it>to a small interval of length 2 &#215; <it>E</it>. For the remaining of the distribution, we can use the FASTPVALUE algorithm. Lemma 13 ensures that when P-value(<it>M</it><sub><it>&#949;</it></sub>, <it>&#945; </it>- <it>E</it>) = P-value(<it>M</it><sub><it>&#949;</it></sub>, <it>&#945;</it>), then <it>&#945; </it>is the required score value for <it>M</it>. The algorithm is displayed in more details in Figure <figr fid="F6">6</figr>.</p>
               <fig id="F6">
                  <title>
                     <p>Figure 6</p>
                  </title>
                  <caption>
                     <p>Algorithm From P-value to Score</p>
                  </caption>
                  <text>
                     <p>Algorithm From P-value to Score.</p>
                  </text>
                  <graphic file="1748-7188-2-15-6"/>
               </fig>
               <p><b>Lemma 13 </b><it>Let M be a matrix, &#949; the granularity and E the maximal error associated. If </it>P-value(<it>M</it><sub><it>&#949;</it></sub>, <it>&#945; </it>- <it>E</it>) = P-value(<it>M</it><sub><it>&#949;</it></sub>, <it>&#945;</it>)<it>, then </it>P-value(<it>M</it>, <it>&#945;</it>) = P-value(<it>M</it><sub><it>&#949;</it></sub>, <it>&#945;</it>).</p>
               <p><b>Proof</b>. This is a corollary of Lemma 9 with <it>M</it><sub><it>&#949; </it></sub>in the role of <it>N </it>and <it>N'</it>, and BS(<it>M</it>) + <it>E </it>in the role of <it>&#946;</it>.</p>
            </sec>
         </sec>
      </sec>
      <sec>
         <st>
            <p>Experimental Results</p>
         </st>
         <p>The ideas presented in this paper have been incorporated in a software called TFM-PVALUE (TFM stands for <it>Transcription factor matrix</it>). The software is written in C++ and implements the FROM PVALUE TO SCORE and FROM SCORE TO PVALUE algorithms as described in Algorithms 5 and 6, together with permutated lookahead scoring. It is available for download at <abbrgrp><abbr bid="B17">17</abbr></abbrgrp>. In the worst case, TFM-PVALUE does not improve the theoretical complexity of the score threshold problem. This was expected from the NP-hardness proof provided in the second section. Nevertheless, experimental results show considerable speedups in practice.</p>
         <sec>
            <st>
               <p>Methods</p>
            </st>
            <p>We chose a multinomial background model with identically and independently distributed character symbols on the four letter alphabet {<it>A</it>, <it>C</it>, <it>G</it>, <it>T</it>} to conduct our experiments. The decreasing step (<it>k</it>) in the algorithm was set to 10 and the initial granularity (<it>&#949;</it>) was set to 0.1. The test set is made of the Jaspar database of transcription factor binding sites <abbrgrp><abbr bid="B1">1</abbr></abbrgrp>. It contains 123 matrices, whose length ranges from 4 to 30. The matrices are transformed into log-ratio matrices following the technique given in <abbrgrp><abbr bid="B18">18</abbr></abbrgrp>. For each P-value <it>P</it>, we report only results for matrices whose length is suitable for <it>P</it>: we requested that the probability of a single word is smaller than <it>P</it>. So a matrix of length <it>m </it>cannot not achieve a P-value smaller than <inline-formula><m:math name="1748-7188-2-15-i20" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mrow><m:mfrac><m:mn>1</m:mn><m:mrow><m:msup><m:mn>4</m:mn><m:mi>m</m:mi></m:msup></m:mrow></m:mfrac></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xH8viVGI8Gi=hEeeu0xXdbba9frFj0xb9qqpG0dXdb9aspeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGacaGaaiaabeqaaeqabiWaaaGcbaqcfa4aaSaaaeaacqaIXaqmaeaacqaI0aandaahaaqabeaacqWGTbqBaaaaaaaa@2FDC@</m:annotation></m:semantics></m:math></inline-formula>. For example, matrices of length 4 have not been considered for a P-value equal to 10<sup>-3</sup>, and matrices of length smaller than 10 have not be considered for a P-value equal to 10<sup>-6</sup>.</p>
            <p>Experimental results are concerned with the error rate depending on the chosen granularity. To estimate the error made at a given granularity, we first computed <it>&#945;</it><sub><it>&#949;</it></sub>, the score threshold associated to the P-value with the round matrix <it>M</it><sub><it>&#949;</it></sub>, and <it>a </it>the score threshold associated to the P-value with the original matrix <it>M</it>. We then denumerate the number of words whose score is between <it>&#945;</it><sub><it>&#949; </it></sub>and <it>&#945; </it>for <it>M</it>. Concerning the time efficiency, all computation times were measured on a 2.33 GHz Intel Core 2 Duo processor with 2 Go of main memory under Mac OS 10.4.</p>
            <p>Concerning FROM P-VALUE TO SCORE, We also compared our results with those of algorithm LAZYDISTRIBUTION described in <abbrgrp><abbr bid="B6">6</abbr></abbrgrp>. To the best of our knowledge, this algorithm is the most efficient algorithm today to compute the score associated to a P-value. It uses the dynamic programming formulas of Equation 1 in a lazy way and takes advantage of permutated lookahead scoring as presented in the previous Section. We implemented it in C++, like TFM-PVALUE.</p>
         </sec>
         <sec>
            <st>
               <p>Computation times for a given granularity</p>
            </st>
            <p>In this first experiment, we study the time performance of TFM-PVALUE compared to LAZYDISTRIBUTION when using the same approximation for the distribution score. So in both cases we use round matrices with the same granularity. To set a maximal granularity for TFM-PVALUE, we interrupt the loop of decreasing granularities and output the score threshold found at this granularity. We thus obtain exactly the same score threshold as LAZYDISTRIBUTION.</p>
            <sec>
               <st>
                  <p>Granularity 10<sup>-3</sup></p>
               </st>
               <p>We first chose a granularity of 10<sup>-3 </sup>for the two algorithms and computed the score associated to P-values equal to 10<sup>-3 </sup>and 10<sup>-6 </sup>for each matrix of the Jaspar database (see Figure <figr fid="F7">7</figr>). The results show that TFM-PVALUE outperforms LAZYDISTRIBUTION in both cases. With the P-value set to 10<sup>-3</sup>, the average computation time is 0.64 second per matrix for LAZYDISTRIBUTION compared to 0.03 second for TFM-PVALUE. Considering each matrix individually, TFM-PVALUE is 61 times faster than LAZYDISTRIBUTION. With the P-value set to 10<sup>-6</sup>, the average computation time is 0.118 second per matrix for LAZYDISTRIBUTION and 0.019 second for TFM-PVALUE. Considering each matrix individually, TFM-PVALUE is 15 times faster than LAZYDISTRIBUTION.</p>
               <fig id="F7">
                  <title>
                     <p>Figure 7</p>
                  </title>
                  <caption>
                     <p>Time efficiency for granularity 10<sup>-3</sup></p>
                  </caption>
                  <text>
                     <p><b>Time efficiency for granularity 10<sup>-3</sup></b>. We compare the running time for the computation of the score threshold associated to a given P-value for FROM P-VALUE TO SCORE and LAZYDISTRIBUTION onto the Jaspar matrices with a granularity set to 10<sup>-3</sup>. We choose two P-value levels: 10<sup>-3 </sup>and 10<sup>-6</sup>. There are 122 matrices (resp. 75 matrices) that can achieve a P-value equal to 10<sup>-3 </sup>(resp. 10<sup>-6</sup>). For each algorithm, we classified the matrices into four groups according to the time needed to complete the computation: less than 0.1 second, from 0.1 second to 1 second, from 1 second to 1 minute, and greater than 1 minute. The results are represented by a histogram with four bars. The height of each bar gives the percentage of matrices involved and the number at the top of each bar indicates the corresponding number of matrices.</p>
                  </text>
                  <graphic file="1748-7188-2-15-7"/>
               </fig>
            </sec>
            <sec>
               <st>
                  <p>Granularity 10<sup>-6</sup></p>
               </st>
               <p>We then repeated the same procedure as above with a smaller granularity, 10<sup>-6 </sup>instead of 10<sup>-3</sup>. Results are reported in Figure <figr fid="F8">8</figr>. When the granularity decreases, the computation time of LAZYDISTRIBUTION dramatically increases. With the P-value set to 10<sup>-3</sup>, LAZYDISTRIBUTION needs a running time greater than one minute for 89 percent of the matrices (109 out of 122). TFM-Pvalue needs less than 0.1 second for 85 percent of the matrices (104 out of 122). With the P-value set to P-value = 10<sup>-6</sup>, LAZYDISTRIBUTION needs a computation time greater than 1 minute for 62 percent of matrices (47 out of 75). TFM-PVALUE needs less than 0.1 second for 89 percent of matrices (67 out of 75). Moreover, if we compare the histogram for TFM-PVALUE in Figure <figr fid="F8">8</figr> with the histogram for LAZYDISTRIBUTION in Figure <figr fid="F7">7</figr>, it appears that TFM-PVALUE is still more efficient, whereas the granularity is a thousand fold larger. This demonstrates that we are able to provide more accurate results within the same amount of time. The same conclusion holds for the amount of memory needed to achieve the computation (data not shown).</p>
               <fig id="F8">
                  <title>
                     <p>Figure 8</p>
                  </title>
                  <caption>
                     <p>Time efficiency for granularity 10<sup>-6</sup></p>
                  </caption>
                  <text>
                     <p><b>Time efficiency for granularity 10<sup>-6</sup></b>. We compare the computation time for the score associated to a P-value of 10<sup>-3 </sup>and 10<sup>-6 </sup>onto the Jaspar matrices when the granularity is set to 10<sup>-6 </sup>for TFM-PVALUE and LAZYDISTRIBUTION. The histogram has the same meaning as in Figure 3.</p>
                  </text>
                  <graphic file="1748-7188-2-15-8"/>
               </fig>
            </sec>
         </sec>
         <sec>
            <st>
               <p>Ability to compute accurate thresholds</p>
            </st>
            <p>In the second series of experiments, we tested the ability of TFM-PVALUE to get exact score thresholds within a reasonable amount of time. We ran FROM P-VALUE TO SCORE and FROM SCORE TO P-VALUE without setting a maximal granularity so that the algorithms stop when they reach the correct result. We tried several P-values, from 10<sup>-3 </sup>to 10<sup>-6</sup>, for all matrices of suitable length. Runtime is reported in Figure <figr fid="F9">9</figr> for FROM P-VALUE TO SCORE and in Figure <figr fid="F10">10</figr> for FROM SCORE TO P-VALUE. Regarding FROM SCORE TO P-VALUE, the time required to compute the score thresholds remains very small for a large majority of matrices: less than 0.01 second for 253 out of the 383 computations for P-values from 10<sup>-3 </sup>to 10<sup>-6</sup>, and less than 0.1 second for 337 computations. As expected, results for FROM SCORE TO P-VALUE are very similar: less than 0.01 second for 332 out of the 383 computations for P-values from 10<sup>-3 </sup>to 10<sup>-6</sup>, and less than 0.1 second for 358 computations.</p>
            <fig id="F9">
               <title>
                  <p>Figure 9</p>
               </title>
               <caption>
                  <p>Runtime of TFM-Pvalue &#8211; From P-value to Score without any granularity bound</p>
               </caption>
               <text>
                  <p><b>Runtime of TFM-Pvalue &#8211; From P-value to Score without any granularity bound</b>. This histogram shows time measurements for the P-VALUE TO SCORE algorithm without any granularity bound. The algorithm stops when it is guaranteed to find the exact P-value, without error. We ran tests on a variety of P-value parameters: 10<sup>-3</sup>, 10<sup>-4</sup>, 10<sup>-5</sup>, and 10<sup>-6</sup>. As previously, we report the proportion of matrices for which the runtime was less then 0.1 second, between 0.1 second and 1 second, between 1 second and 1 minu