<?xml version='1.0'?>
<!DOCTYPE art SYSTEM 'http://www.biomedcentral.com/xml/article.dtd'>
<art>
   <ui>1471-2105-9-372</ui>
   <ji>1471-2105</ji>
   <fm>
      <dochead>Research article</dochead>
      <bibl>
         <title>
            <p>Computational detection of significant variation in binding affinity across two sets of sequences with application to the analysis of replication origins in yeast</p>
         </title>
         <aug>
            <au id="A1" ca="yes">
               <snm>Keich</snm>
               <fnm>Uri</fnm>
               <insr iid="I1"/>
               <email>keich@cs.cornell.edu</email>
            </au>
            <au id="A2">
               <snm>Gao</snm>
               <fnm>Hong</fnm>
               <insr iid="I2"/>
               <email>hg53@cornell.edu</email>
            </au>
            <au id="A3">
               <snm>Garretson</snm>
               <mi>S</mi>
               <fnm>Jeffrey</fnm>
               <insr iid="I1"/>
               <email>jsg73@cornell.edu</email>
            </au>
            <au id="A4">
               <snm>Bhaskar</snm>
               <fnm>Anand</fnm>
               <insr iid="I1"/>
               <email>anand.bhaskar@gmail.com</email>
            </au>
            <au id="A5">
               <snm>Liachko</snm>
               <fnm>Ivan</fnm>
               <insr iid="I3"/>
               <email>il34@cornell.edu</email>
            </au>
            <au id="A6">
               <snm>Donato</snm>
               <fnm>Justin</fnm>
               <insr iid="I4"/>
               <email>jdonato@wisc.edu</email>
            </au>
            <au id="A7">
               <snm>Tye</snm>
               <mi>K</mi>
               <fnm>Bik</fnm>
               <insr iid="I3"/>
               <email>bt16@cornell.edu</email>
            </au>
         </aug>
         <insg>
            <ins id="I1">
               <p>Department of Computer Science, Cornell University, Ithaca, NY, 14853, USA </p>
            </ins>
            <ins id="I2">
               <p>Department of Biological Statistics &amp; Computational Biology, Cornell University, Ithaca, NY, 14853, USA </p>
            </ins>
            <ins id="I3">
               <p>Department of Molecular Biology &amp; Genetics, Cornell University, Ithaca, NY, 14853, USA </p>
            </ins>
            <ins id="I4">
               <p>Department of Bacteriology, University of Wisconsin, Madison, WI, 53706, USA </p>
            </ins>
         </insg>
         <source>BMC Bioinformatics</source>
         <issn>1471-2105</issn>
         <pubdate>2008</pubdate>
         <volume>9</volume>
         <issue>1</issue>
         <fpage>372</fpage>
         <url>http://www.biomedcentral.com/1471-2105/9/372</url>
         <xrefbib>
            <pubidlist>
               <pubid idtype="pmpid">18786274</pubid>
               <pubid idtype="doi">10.1186/1471-2105-9-372</pubid>
            </pubidlist>
         </xrefbib>
      </bibl>
      <history>
         <rec>
            <date>
               <day>01</day>
               <month>5</month>
               <year>2008</year>
            </date>
         </rec>
         <acc>
            <date>
               <day>12</day>
               <month>9</month>
               <year>2008</year>
            </date>
         </acc>
         <pub>
            <date>
               <day>12</day>
               <month>9</month>
               <year>2008</year>
            </date>
         </pub>
      </history>
      <cpyrt>
         <year>2008</year>
         <collab>Keich et al; licensee BioMed Central Ltd.</collab>
         <note>This is an Open Access article distributed under the terms of the Creative Commons Attribution License (<url>http://creativecommons.org/licenses/by/2.0</url>), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</note>
      </cpyrt>
      <abs>
         <sec>
            <st>
               <p>Abstract</p>
            </st>
            <sec>
               <st>
                  <p>Background</p>
               </st>
               <p>In analyzing the stability of DNA replication origins in <it>Saccharomyces cerevisiae </it>we faced the question whether one set of sequences is significantly enriched in the number and/or the quality of the matches of a particular position weight matrix relative to another set.</p>
            </sec>
            <sec>
               <st>
                  <p>Results</p>
               </st>
               <p>We present SADMAMA, a computational solution to a address this problem. SADMAMA implements two types of statistical tests to answer this question: one type is based on simplified models, while the other relies on bootstrapping, and as such might be preferable to users who are averse to such models. The bootstrap approach incorporates a novel "site-protected" resampling procedure which solves a problem we identify with naive resampling.</p>
            </sec>
            <sec>
               <st>
                  <p>Conclusion</p>
               </st>
               <p>SADMAMA's utility is demonstrated here by offering a plausible explanation to the differential ARS activity observed in our previous <it>mcm1-1 </it>mutant experiments <abbrgrp><abbr bid="B1">1</abbr></abbrgrp>, by suggesting the relevance of multiple weak ACS matches to efficient replication origin function in <it>Saccharomyces cerevisiae</it>, and by suggesting an explanation to the observed negative effect <it>FKH2 </it>has on chromatin silencing <abbrgrp><abbr bid="B2">2</abbr></abbrgrp>. SADMAMA is available for download from <url>http://www.cs.cornell.edu/~keich/</url>.</p>
            </sec>
         </sec>
      </abs>
   </fm>
   <bdy>
      <sec>
         <st>
            <p>Background</p>
         </st>
         <p>In analyzing the stability of DNA replication origins in <it>S. cerevisiae </it>(see Stable vs. unstable ARSs in <it>mcm1-1 </it>mutant below) we faced the question of whether one set of sequences has more and/or better binding sites of a particular transcription factor than the other. One way to address this question is through wet lab experiments such as chromatin immunoprecipitation. Here we offer a computational alternative, which can be effective provided the PWM (position weight matrix, e.g. <abbrgrp><abbr bid="B3">3</abbr></abbrgrp>) representation of the transcription factor is known. An obvious advantage of our computational approach is that it is much cheaper to execute and it provides a built-in statistical significance analysis.</p>
         <p>There are many computational tools that scan for "good" matches of a given PWM (e.g., <abbrgrp><abbr bid="B4">4</abbr><abbr bid="B5">5</abbr><abbr bid="B6">6</abbr><abbr bid="B7">7</abbr></abbrgrp>). Similarly there are tools that look at the significance of PWM matches in a set of sequences (e.g., <abbrgrp><abbr bid="B8">8</abbr></abbrgrp>). None of these however directly apply to our problem, where the null assumption is that there is "essentially" no difference in the binding sites between the two input sets, even though both might be enriched, deficient, or neutral in sites when compared to "background" sequences. Elkon et al. look for enrichment in the number of sites in a subset of a genomic scale set of promoters <abbrgrp><abbr bid="B9">9</abbr></abbrgrp>. In particular their approach is not applicable when the sets of sequences are either disjoint or small. There has also been work on discriminative de-novo motif finding (e.g., <abbrgrp><abbr bid="B10">10</abbr></abbrgrp>) where the goal is to <it>find </it>a PWM that discriminates between two sets of sequences. This is quite different from our stated goal where the PWM is given and the focus is on assigning significance to the difference in the number and/or quality of sites. Robin et al. study a very similar problem to ours, only in the context of a pattern representation of the motif <abbrgrp><abbr bid="B11">11</abbr></abbrgrp>. In Conclusions section we emphasize some of the differences between this paper and their theoretical work. Here we present SADMAMA (Significance Assessment of the Difference in MAtrix MAtches) &#8211; the tool we have developed to address the aforementioned problem. SADMAMA implements two different strategies for testing the difference in site frequency as well as site quality between the two input sets. The quicker approach relies on a couple of simplified statistical models from which we derive and carefully implement the appropriate tests. As an alternative for accepting our simplified models, we offer bootstrapping which, by its nature, requires fewer assumptions, but consumes more time. The development of our bootstrap procedure required some innovation since, as we show below, a naive resampling approach can create false positives. That is, it can indicate a significant difference between two input sets that are essentially equivalent as far as the PWM sites are concerned.</p>
         <p>Our motivation for developing SADMAMA came from our study of replication origins in <it>S. cerevisiae </it>(reviewed in <abbrgrp><abbr bid="B12">12</abbr></abbrgrp>). DNA replication is a fundamental process essential for cell proliferation. While the proteins that are involved in initiating DNA replication are essentially conserved from yeast to humans, the implicated sequence motifs that these conserved factors interact with are poorly understood outside of <it>S. cerevisiae </it>(<abbrgrp><abbr bid="B13">13</abbr><abbr bid="B12">12</abbr></abbrgrp>). Moreover, even for <it>S. cerevisiae </it>the replication initiation process is not completely understood. For example, it is known that the roughly 400 replication origins in <it>S. cerevisiae </it>(<abbrgrp><abbr bid="B14">14</abbr><abbr bid="B15">15</abbr><abbr bid="B16">16</abbr></abbrgrp>), called ARSs (Autonomously Replicating Sequences), differ in several important aspects from one another. These include timing and efficiency of origin firing, as well as sensitivity to mutations in proteins involved in replication initiation. However, much of this variability is yet to be explained and this is an active area of research. Our study in <abbrgrp><abbr bid="B1">1</abbr></abbrgrp> was designed to identify ARSs that are preferentially used in yeast strains defective for replication initiation. SADMAMA was specifically designed to suggest sequence motifs that might explain the preferential usage we observed in <abbrgrp><abbr bid="B1">1</abbr></abbrgrp>. Such information could help us gain insight into the determinants that regulate replication origin usage. Given our motivation for SADMAMA's development, it is fitting that we demonstrate its utility in that context:</p>
         <p>&#8226; We show that SADMAMA provides a possible explanation for the difference in replication efficiency among two sets of ARSs we identified in <abbrgrp><abbr bid="B1">1</abbr></abbrgrp>.</p>
         <p>&#8226; Essential to replication initiation is the binding of the ORC (Origin Recognition Complex) to the ACS (ARS Consensus Sequence) <abbrgrp><abbr bid="B17">17</abbr></abbrgrp>. Using a screen for fragments of <it>S. kluyveri </it>DNA that have ARS function in <it>S. cerevisiae</it>, we provide evidence that support a recent conjecture that ORC binding in some <it>S. cerevisiae </it>ARSs requires multiple, seemingly redundant ACS matches <abbrgrp><abbr bid="B18">18</abbr></abbrgrp>.</p>
         <p>&#8226; Finally, we demonstrate how SADMAMA can be used for exploratory data analysis.</p>
      </sec>
      <sec>
         <st>
            <p>Results and Discussion</p>
         </st>
         <sec>
            <st>
               <p>Statistical Models and Tests</p>
            </st>
            <sec>
               <st>
                  <p>Scoring words and identifying sites</p>
               </st>
               <p>Since our goal is to assess the difference between the PWM matches in the two input sets we first need to define what we consider as a match. In order to do so we first need to specify how we score each putative site, or word of length <it>l</it>, where <it>l </it>is the length, or width, of the PWM. We use the log-likelihood ratio score <inline-formula><m:math name="1471-2105-9-372-i1" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mrow><m:mi>s</m:mi><m:mo stretchy="false">(</m:mo><m:mi>w</m:mi><m:mo stretchy="false">)</m:mo><m:mo>=</m:mo><m:mi>log</m:mi><m:mo>&#8289;</m:mo><m:mfrac><m:mrow><m:msub><m:mi>p</m:mi><m:mi>M</m:mi></m:msub><m:mo stretchy="false">(</m:mo><m:mi>w</m:mi><m:mo stretchy="false">)</m:mo></m:mrow><m:mrow><m:msub><m:mi>p</m:mi><m:mn>0</m:mn></m:msub><m:mo stretchy="false">(</m:mo><m:mi>w</m:mi><m:mo stretchy="false">)</m:mo></m:mrow></m:mfrac></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaagaart1ev2aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xH8viVGI8Gi=hEeeu0xXdbba9frFj0xb9qqpG0dXdb9aspeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGaciGaaiaabeqaaeqabiWaaaGcbaGaem4CamNaeiikaGIaem4DaCNaeiykaKIaeyypa0JagiiBaWMaei4Ba8Maei4zaCwcfa4aaSaaaeaacqWGWbaCdaWgaaqaaiabd2eanbqabaGaeiikaGIaem4DaCNaeiykaKcabaGaemiCaa3aaSbaaeaacqaIWaamaeqaaiabcIcaOiabdEha3jabcMcaPaaaaaa@41A6@</m:annotation></m:semantics></m:math></inline-formula> where <inline-formula><m:math name="1471-2105-9-372-i2" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mrow><m:msub><m:mi>p</m:mi><m:mi>M</m:mi></m:msub><m:mo stretchy="false">(</m:mo><m:mi>w</m:mi><m:mo stretchy="false">)</m:mo><m:mo>=</m:mo><m:mstyle displaystyle="true"><m:msubsup><m:mo>&#8719;</m:mo><m:mrow><m:mi>i</m:mi><m:mo>=</m:mo><m:mn>1</m:mn></m:mrow><m:mi>l</m:mi></m:msubsup><m:mrow><m:msub><m:mi>M</m:mi><m:mrow><m:mi>i</m:mi><m:msub><m:mi>w</m:mi><m:mi>i</m:mi></m:msub></m:mrow></m:msub></m:mrow></m:mstyle></m:mrow><m:annotation encoding="MathType-MTEF">
MathType@MTEF@5@5@+=feaagaart1ev2aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xH8viVGI8Gi=hEeeu0xXdbba9frFj0xb9qqpG0dXdb9aspeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGaciGaaiaabeqaaeqabiWaaaGcbaGaemiCaa3aaSbaaSqaaiabd2eanbqabaGccqGGOaakcqWG3bWDcqGGPaqkcqGH9aqpdaqeWaqaaiabd2eannaaBaaaleaacqWGPbqAcqWG3bWDdaWgaaadbaGaemyAaKgabeaaaSqabaaabaGaemyAaKMaeyypa0JaeGymaedabaGaemiBaWganiabg+Givdaaaa@3F12@</m:annotation></m:semantics></m:math></inline-formula> and <it>M</it><sub><it>ij </it></sub>is the frequency of letter <it>j </it>in position <it>i </it>of the motif, and <it>p</it><sub>0</sub>(<it>w</it>) is the null likelihood of <it>w</it>. Note that <it>M </it>here represents the PWM as a PFM (Position Frequency Matrix). In this paper we will generally not make the distinction between the two. Also note that given that our null model is a Markov chain the annotation <it>p</it><sub>0</sub>(<it>w</it>) is somewhat misleading as this probability typically depends on the few characters preceding <it>w </it>in the sequence.</p>
               <p>A word <it>w </it>is considered a match if its score exceeds a user specified threshold. For example, only words whose scores lie in the top 0.1% of the null scores are considered matches (in practice this threshold is determined using an appropriate null training model). While this defines whether a single word <it>w </it>is considered a match or not, we would often hesitate to consider two matches that almost completely overlap as two distinct matches. Here again we rely on the user to specify the amount of overlap that is tolerated between distinct sites, and we apply a greedy strategy to choose sites that conform to the specified overlap.</p>
            </sec>
            <sec>
               <st>
                  <p>Measuring the difference in the number of sites: The binomial model</p>
               </st>
               <p>To assign significance to the site-frequency difference between the sets, we assume that matches (sites) occur in each of the sets according to a binomial model <it>B</it>(<it>n</it><sub><it>i</it></sub>, <it>p</it><sub><it>i</it></sub>) <it>i </it>= 1, 2, where <it>n</it><sub><it>i </it></sub>is the number of possible sites in the corresponding set (roughly the set length), and <it>p</it><sub><it>i </it></sub>is the site frequency. Clearly this simplistic model glosses over the problem of dependence between overlapping sites. However, overlap is not a real issue with most PWMs given a reasonably high threshold. Such thresholds are typically chosen anyhow, as binding sites are meant to be rather rare.</p>
               <p>The null hypothesis <it>H</it><sub>0 </sub>is that, <it>p</it><sub>1 </sub>= <it>p</it><sub>2</sub>. Note that this is different from the "null background", which specifies how the background is generated. In particular, <it>H</it><sub>0 </sub>does not assume that all matches are merely random background matches, rather that they are some mixture of random background matches and "real sites". The alternative hypothesis can be a two sided <it>p</it><sub>1 </sub>&#8800; <it>p</it><sub>2 </sub>or a one sided <it>p</it><sub>1 </sub>> <it>p</it><sub>2 </sub>or <it>p</it><sub>2 </sub>> <it>p</it><sub>1</sub>. Assuming our binomial model, we can readily test for violation of the null assumption based on the fact that conditioned on the joint number of matches, the number of matches in the first set has a hypergeometric distribution if <it>p</it><sub>1 </sub>= <it>p</it><sub>2 </sub>(see the Methods section for details). We therefore compute the two one-sided-alternative <it>p</it>-values by summing up the appropriate tails of the hypergeometric distribution.</p>
            </sec>
            <sec>
               <st>
                  <p>Measuring the difference in the quality of sites</p>
               </st>
               <p>We offer two ways to measure the difference in the quality of the sites. Our null assumption is that the scores of the sites from the two sets form two independent samples from the same, unknown, distribution. A plausible alternative is that one distribution tends to produce better scores than the other, or more precisely, that it is stochastically greater. The Mann-Whitney test is a non-parametric test that is optimized for testing the alternative that one distribution is a shifted version of the other. While we cannot assume this particular alternative here, this test should still be a reasonably good choice.</p>
               <p>Alternatively, SADMAMA can perform a t-test of the difference between the two averaged scores. However, if the motif length <it>l </it>is not very large, the score distribution can be very far from normal (e.g. <abbrgrp><abbr bid="B19">19</abbr></abbrgrp>). Since the t-test relies on the normal assumption, it should be taken with a grain of salt here (in a future release we hope to provide a test of the validity of the assumptions required by this t-test). Since the match scores can be repeated, especially when <it>l </it>is small, we are often forced to use the tied version of the Mann-Whitney test. This becomes important when the overall number of matches in at least one of the sets is rather small (say &#8804; 10). In this range, the use of the normal approximation to the Mann-Whitney test is generally discouraged and exact calculation should be used. The latter are significantly more costly for the tied case than for the no-ties case. By default, SADMAMA decides on its own which method to use when estimating the significance of the test. If the samples are sufficiently large it uses the normal approximation. Otherwise, it uses exact methods to evaluate the significance of the test. If no-ties are present, it relies on Harding's exact algorithm <abbrgrp><abbr bid="B20">20</abbr></abbrgrp>, while if there are ties, it uses a naive dynamic programming approach written by Niranjan Nagarajan.</p>
               <p>Keep in mind that if one tests for a difference in the quality of the sites in addition to the frequency of sites, then you should, in principle, adjust for multiple testing. Note that in general we cannot assume that these two tests are independent of one another.</p>
            </sec>
            <sec>
               <st>
                  <p>The bootstrap approach</p>
               </st>
               <p>SADMAMA offers a bootstrap <abbrgrp><abbr bid="B21">21</abbr></abbrgrp> inspired set of tests as an alternative to the simplified models described above. Bootstrap is a "plug-in" method: to estimate some parameter of a complex distribution we conceptually plug into the appropriate formula an approximating distribution that is typically derived from a small sample of the original distribution. It is often the case that even after plugging in the simplified, approximating distribution we still need to resort to Monte Carlo methods to estimate the target parameter. These methods work by generating random samples from the simplified distribution, computing a relevant statistic, and finally estimating the parameter of interest from all these samples of the statistic. In our case the complex distribution is the one which generated the two sets, which is not really well defined, and which, in particular, does not yield additional samples. Our model of the simplified distribution is that the two input sets are generated by sampling (with replacements) contiguous blocks or substrings of <it>b </it>letters from some joint pool. SADMAMA's default assumption is that this joint sequence pool is simply the concatenation of the two input sets. The parameters we are after, are <it>p</it>-values of the statistics that measure the differences in the quality and quantity of sites between the two sets. In particular, SADMAMA can keep track of the difference in site density as well as the difference in mean site score between the two sets.</p>
               <p>For example, to evaluate the difference in site frequency between the two input sets, SADMAMA first finds this difference. It then creates a large number of "bootstrap images" of the two input sets by resampling <it>b</it>-long substrings, or blocks, from the concatenated original sets. Using these Monte Carlo images, SADMAMA generates an empirical distribution of the difference in site density, from which we can readily deduce an "empirical <it>p</it>-value" of the difference in site density between the original input sets. With increasing number of bootstrap images, this empirical <it>p</it>-value should better approximate the <it>p</it>-value defined by our simplified distribution. The latter should in turn be a reasonable approximation of the "real" <it>p</it>-value defined in terms of the original, complex distribution.</p>
               <p>While in principle this is how SADMAMA implements bootstrapping, there is one more issue we had to address. Generally, when resampling the input sets we would like to avoid using blocks that are too long, as those hinder "proper mixing". The problem with smaller blocks is that, especially when the block size <it>b </it>is smaller than the motif width, essentially all the original sites that were present in the joint pool are obliterated during resampling. This is not an issue if both sets consist only of the "background signal". However, if the two sets are highly enriched with sites, yet in the same way, this kind of bootstrap test might erroneously report significant difference in site density. The reason is that the difference between the two enriched sets might be significant when compared with the typical difference between sets that were essentially made to look like background sets by inadvertently destroying all the sites (see the section on Applications of SADMAMA below).</p>
               <p>To avoid such false positives, we implemented in SADMAMA a novel "site-protected bootstrap" approach. It is designed to allow us to sample from the original sites even if the block size is smaller than the motif width. More explicitly, each randomly chosen block might be extended so as to avoid chopping sites. The decision whether or not to extend, or protect, each such block is made in a probabilistic and independent fashion. The length of the extension is the minimal one necessary to avoid chopping any site that started (or, ended if reverse complement search is considered) within the <it>original </it>block. The probability of extending a block is defined so as to make the expected frequency of sites in the combined bootstrap sets the same as the frequency of sites in the original pool. See the Methods section for details on the technique and the section on Applications of SADMAMA for examples of its utility.</p>
               <p>In general we found that the bootstrap tests follow closely the simplified models based tests. While the bootstrap approach might seem more attractive as it is not derived from an arguably overly simplified model, it takes considerably longer to run to get reliable estimates.</p>
            </sec>
         </sec>
         <sec>
            <st>
               <p>Applications of SADMAMA</p>
            </st>
            <sec>
               <st>
                  <p>Stable vs. unstable ARSs in <it>mcm1-1 </it>mutant</p>
               </st>
               <p>Mcm1 is a transcription factor that has been shown to affect the efficiency of replication origins both directly, by binding to replication origins (<abbrgrp><abbr bid="B22">22</abbr><abbr bid="B23">23</abbr></abbrgrp>), and indirectly, by regulating the expression of several factors of the pre-replication complex <abbrgrp><abbr bid="B24">24</abbr></abbrgrp>. The <it>mcm1-1 </it>point mutant has been shown to exhibit DNA replication defects in <it>S. cerevisiae </it><abbrgrp><abbr bid="B25">25</abbr></abbrgrp>.</p>
               <p>Functionally, ARSs are divided into two types based on their ability to function in <it>mcm </it>mutant strains such as <it>mcm1-1</it>. Stable, or A-type ARSs function efficiently in both wild-type and mutant cells, whereas unstable, or B-type ARSs function poorly in <it>mcm </it>mutant backgrounds.</p>
               <p>Several previous studies have shown a relationship between replication initiation and local transcription patterns (<abbrgrp><abbr bid="B26">26</abbr><abbr bid="B1">1</abbr></abbrgrp>). More precisely, in <abbrgrp><abbr bid="B1">1</abbr></abbrgrp> we show that transcriptional interference correlates with reduced ARS activity in that 80% of ARSs located in such transcriptionally active zones are B-type, whereas only 45% are B-type in transcriptionally inactive zones (see Table <tblr tid="T1">1</tblr>). While transcriptional interference is statistically significant, it is clearly not the sole determinant of ARS activity under this unfavorable condition (<it>mcm1-1 </it>mutant).</p>
               <tbl id="T1">
                  <title>
                     <p>Table 1</p>
                  </title>
                  <caption>
                     <p>Classification of A-type and B-type ARSs based on local transcription patterns</p>
                  </caption>
                  <tblbdy cols="3">
                     <r>
                        <c ca="center">
                           <p>Transcription pattern</p>
                        </c>
                        <c cspan="2" ca="center">
                           <p>ARS efficiency in <it>mcm1-1</it></p>
                        </c>
                     </r>
                     <r>
                        <c cspan="3">
                           <hr/>
                        </c>
                     </r>
                     <r>
                        <c>
                           <p/>
                        </c>
                        <c ca="center">
                           <p>Unstable (B-type)</p>
                        </c>
                        <c ca="center">
                           <p>Stable (A-type)</p>
                        </c>
                     </r>
                     <r>
                        <c cspan="3">
                           <hr/>
                        </c>
                     </r>
                     <r>
                        <c ca="center">
                           <p>&#8594; &#8226; &#8594; &#8592; &#8226; &#8592; &#8594; &#8226; &#8592; (+)</p>
                        </c>
                        <c ca="center">
                           <p>32</p>
                        </c>
                        <c ca="center">
                           <p>8</p>
                        </c>
                     </r>
                     <r>
                        <c ca="center">
                           <p>&#8592; &#8226; &#8594; (-)</p>
                        </c>
                        <c ca="center">
                           <p>13</p>
                        </c>
                        <c ca="center">
                           <p>16</p>
                        </c>
                     </r>
                  </tblbdy>
                  <tblfn>
                     <p>Arrow represents direction of transcription. Filled circle represents location of ARS. (+) = transcriptional interference; (-) = no transcriptional interference. Using Fisher's exact with a test two-sided alternative, independence is rejected at 0.0018.</p>
                  </tblfn>
               </tbl>
               <p>Higher affinity for Mcm1 has been suggested to be a distinguishing feature for telomeric ARSs that are constitutively active in the <it>mcm1-1 </it>mutant <abbrgrp><abbr bid="B23">23</abbr></abbrgrp>. In particular, footprinting assays identified a set of binding sites of Mcm1 in these ARSs. Interestingly, many of these sites can be considered as "half sites", in that they match only half of the canonical Mcm1 binding site. It is thus tempting to conjecture that stable or A-type ARSs would, in general, exhibit better (possibly half) binding sites for Mcm1 than B-type ARSs. Similarly, it was suggested that Abf1 may also have a positive effect on the formation of the pre-replication complex (e.g., <abbrgrp><abbr bid="B27">27</abbr><abbr bid="B28">28</abbr><abbr bid="B29">29</abbr></abbrgrp>) and is therefore another natural candidate for our differential binding affinity analysis.</p>
               <p>To test these hypotheses we applied SADMAMA to analyze the difference in the quality and/or number of these PWM matches (MCM1, half-MCM1, ABF1) between the stable and unstable sets of transcriptionally active, or (+), ARSs. SADMAMA did not find statistically significant variation in the quality or the number of MCM1 matches (see the Methods section for more details). However, it found that the stable set has more half-MCM1 sites (threshold 0.05% <it>p</it>-value 0.007) or alternatively better sites (threshold 0.1% <it>p</it>-value 0.002). Similarly the stable set has more ABF1 sites (threshold 0.1% <it>p</it>-value 0.002), or alternatively better sites (threshold 0.5% <it>p</it>-value 0.003). These <it>p</it>-values should be adjusted for the fact that we considered 3 thresholds (0.5%, 0.1%, and 0.05%) but they are still significant at the 5% level even after this adjustment. We had no reason to suspect a difference in ACS matches, and indeed SADMAMA's corresponding <it>p</it>-values were unimpressive even before the multiple testing adjustment. The <it>p</it>-values reported above were generated using the hypergeometric or Mann-Whitney tests. However, Monte Carlo site-protected bootstrap tests gave very similar results (block size <it>b </it>= 12).</p>
               <p>For comparison we also applied SADMAMA to study the difference in these PWM matches between the stable and unstable set of transcriptionally inactive, or (-), ARSs. This time no significant <it>p</it>-values were reported. Taken together these results support the hypothesis that half binding sites of MCM1 as well as sites of ABF1 in flanking regions of an ARS may protect the ARS from incoming transcription traffic (<abbrgrp><abbr bid="B23">23</abbr><abbr bid="B29">29</abbr></abbrgrp>) but would have little influence on the stable ARSs that are not subjected to transcriptional interference.</p>
            </sec>
            <sec>
               <st>
                  <p><it>S. kluyveri </it>vs. <it>S. cerevisiae </it>ACS</p>
               </st>
               <p>To get a better understanding of DNA replication initiation in <it>S. cerevisiae</it>, we performed a screen to isolate fragments of <it>S. kluyveri </it>DNA that have ARS function in <it>S. cerevisiae</it>. Specifically, we cloned random fragments of <it>S. kluyveri </it>DNA into an ARS-less vector, transformed the resulting genomic libraries into <it>S. cerevisiae</it>, and isolated 46 distinct plasmids which showed ARS activity (<it>S. kluyveri </it>ARSs below). Using the same protocol we also isolated 36 native <it>S. cerevisiae </it>ARSs (<it>S. cerevisiae </it>ARSs below). Naturally, one wonders what confers <it>S. cerevisiae </it>replication activity to these <it>S. kluyveri </it>DNA segments. In particular, we should compare them to our native <it>S. cerevisiae </it>ARSs, and SADMAMA is a convenient tool for that.</p>
               <p>We looked for significant differences between the <it>S. kluyveri </it>and the <it>S. cerevisiae </it>set of ARSs in terms of binding sites of several auxiliary DNA binding factors that are known to be associated with replication initiation: Mcm1, half sites of Mcm1, Rap1, and Abf1. SADMAMA did not find significant variation in any of these. However, surprisingly SADMAMA did find significantly more ACS matches in the <it>S. kluyveri </it>ARSs than in the <it>S. cerevisiae </it>ARSs (threshold 0.05% <it>p</it>-value 0.0004). Interestingly, when it came to quality of sites, SADMAMA reported that the <it>S. cerevisiae </it>ARSs had better sites (threshold 0.05% <it>p</it>-value 0.008). This analysis suggests that the <it>cerevisiae </it>replication initiation machinery can function with multiple weaker ACS sites such as the ones we found in the <it>S. kluyveri </it>ARSs as well as with the fewer but better native sites. This conjecture is consistent with a recent related analysis of native <it>S. cerevisiae </it>ARSs that contain multiple ACS matches <abbrgrp><abbr bid="B18">18</abbr></abbrgrp>.</p>
               <p>It is interesting to note that the <it>S. cerevisiae </it>ARSs (the overwhelming majority of which lie within intergenic regions) have the <it>same </it>AT content as the general <it>S. cerevisiae </it>intergenic average: 66%. However, 69% of the <it>S. kluyveri </it>ARSs are made of AT, which is significantly <it>higher </it>than the 58% AT content for general <it>S. kluyveri </it>intergenic regions. Since the ACS matrix is itself AT-rich, we asked whether these <it>S. kluyveri </it>ARSs owe their functionality only to a local spike in the AT content. Using SADMAMA we addressed this question in two different ways. First we compared the <it>S. kluyveri </it>set of ARSs with 10,000 random permutations of itself. In all of those 10,000 comparisons SADMAMA found that the permuted set had a statistically significant smaller number of sites (see the Methods section details).</p>
               <p>Similarly, we used SADMAMA to compare the ACS PWM with 4,000 column-wise random permutations of itself. In only 19 of these 4,000 comparisons did the <it>S. kluyveri </it>set have more sites of the permuted PWM than the original PWM (keep in mind that the ACS PWM is very AT rich itself so many of the permutations should not look that different from the original PWM). Taken together, these two tests indicate that there is more "ACS information" in our <it>S. kluyveri </it>ARSs than their AT content alone yields.</p>
            </sec>
            <sec>
               <st>
                  <p>Site-protected bootstrap</p>
               </st>
               <p>To test the utility of the "site-protected" bootstrap option in a realistic setting we generated two sets of <it>S. cerevisiae </it>ARSs by arbitrarily splitting a subset of the confirmed ARSs in the DNA replication origin database OriDB <abbrgrp><abbr bid="B30">30</abbr></abbrgrp>, into two roughly equal sets: an "even" and an "odd" one. Given the arbitrary nature of the split between the sets we expect that there should not be a substantial difference in ACS sites between the two. Note, however, that both sets are highly enriched with ACS sites (see the subsection on Bootstrap tests in the Methods section for details).</p>
               <p>Using the hypergeometric test SADMAMA found, as expected, no significant difference in ACS site-frequency between these two sets. However, when using the naive bootstrap approach with block size <it>b </it>&#8804; 15 SADMAMA consistently reported that one of the sets is significantly enriched in sites. On the other hand, when the "site-protected" bootstrap option was turned on, SADMAMA consistently found the difference in site-frequency between the two sets to be insignificant (even for <it>b </it>= 1).</p>
               <p>A look at Figure <figr fid="F1">1</figr> and Figure <figr fid="F2">2</figr> explains what is going on. The total number of sites in a naively resampled pair of sets is typically significantly smaller than the number of sites in the input sets whereas the site-protected option manages to be consistent with the total number of sites in the two input sets. Note how the range of values observed in Figure <figr fid="F1">1</figr> is significantly smaller than the range observed in Figure <figr fid="F2">2</figr>. This smaller range suggests that normal random fluctuations observed when sampling from the latter distribution might be considered very significant when compared against fluctuations observed when sampling the first distribution.</p>
               <fig id="F1">
                  <title>
                     <p>Figure 1</p>
                  </title>
                  <caption>
                     <p>A histogram of the total number of sites in 10,000 naively resampled pair of sets</p>
                  </caption>
                  <text>
                     <p><b>A histogram of the total number of sites in 10,000 naively resampled pair of sets</b>. The mean total number of sites is 33. For comparison, there are 173 sites in the input pair of sets. Here <it>b </it>= 10 (see the subsection on Bootstrap tests in the Methods section for additional settings).</p>
                  </text>
                  <graphic file="1471-2105-9-372-1"/>
               </fig>
               <fig id="F2">
                  <title>
                     <p>Figure 2</p>
                  </title>
                  <caption>
                     <p>A histogram of the total number of sites in 10,000 site-protected resampled pair of sets</p>
                  </caption>
                  <text>
                     <p><b>A histogram of the total number of sites in 10,000 site-protected resampled pair of sets</b>. The mean total number of sites is 175. For comparison, there are 173 sites in the input pair of sets (<it>b </it>= 10).</p>
                  </text>
                  <graphic file="1471-2105-9-372-2"/>
               </fig>
            </sec>
            <sec>
               <st>
                  <p>Exploratory data analysis with SADMAMA</p>
               </st>
               <p>SADMAMA can also be used to study potential enrichment of binding sites within a single set. For example, we studied whether the set of all 325 confirmed <it>S. cerevisiae </it>ARSs taken from OriDB <abbrgrp><abbr bid="B30">30</abbr></abbrgrp> shows enrichment of PWM matches for any one of the 79 <it>S. cerevisiae </it>transcription factor PWMs defined by Morozov and Siggia <abbrgrp><abbr bid="B31">31</abbr></abbrgrp>. For each such PWM SADMAMA tested whether the frequency of sites in the ARS set is significantly higher than in ARS-less <it>S. cerevisiae </it>intergenic file (see the Methods section). After adjusting for multiple testing, the only PWM that showed such statistically significant site frequency enrichment is the one representing <it>FKH2</it>. Interestingly, Fkh2 is known to interact with Mcm1 to form a complex that regulates the cell cycle dependent expression of the CLB2 cluster in G2/M phases in S. cerevisiae <abbrgrp><abbr bid="B32">32</abbr></abbrgrp>.</p>
               <p>Upon closer inspection of the set of ARSs we found that many of the sites SADMAMA attributed to <it>FKH2 </it>overlapped with ACS matches and indeed aligned properly the matrices are quite similar. Moreover, after masking the high scoring ACS matches in the set of ARSs SADMAMA found the <it>FKH2 </it>site enrichment insignificant (see the Methods section for more details). Finally, the actual binding location data for <it>FKH2 </it><abbrgrp><abbr bid="B33">33</abbr></abbrgrp> exhibits no significant correlation with ACS sites located in confirmed ARSs.</p>
               <p>This result seems somewhat disappointing given that the enrichment of <it>FKH2 </it>sites can apparently be explained by the obvious enrichment of ACS sites. However, SADMAMA's results still leave us with a potentially interesting question: does the similarity between binding sites of <it>FKH2 </it>and the ACS have any biological importance? Analysis of the literature suggests a positive answer is conceivable. Specifically, when overexpressed, Fkh2p is known to have a negative role in silencing the silent mating-type cassette <it>HMRa </it>in <it>S. cerevisiae </it><abbrgrp><abbr bid="B2">2</abbr></abbrgrp>. Moreover, it is known that ORC binding to the ACS is associated with the chromatin silencing process at this locus (e.g. <abbrgrp><abbr bid="B34">34</abbr><abbr bid="B35">35</abbr><abbr bid="B36">36</abbr></abbrgrp>). Consistent with the similarity we observed in their binding sites, it is tempting to conjecture that <it>FKH2 </it>might interfere with the chromatin silencing process by offering some form of competitive binding to ORC. Since the interference of Fkh2p was observed when it was overexpressed, the lack of support from the location data <abbrgrp><abbr bid="B33">33</abbr></abbrgrp> does not rule out this conjecture.</p>
            </sec>
         </sec>
      </sec>
      <sec>
         <st>
            <p>Conclusion</p>
         </st>
         <p>SADMAMA offers a computational solution to a novel problem: does one set of sequences have a statistically significant increase in the number and/or the quality of sites of a given PWM than another set. Note that setting the second set as a large background set SADMAMA can also be used to assign significance to matches in a single input set. SADMAMA implements two types of tests: one type is based on simplified sequence models while the other relies on bootstrapping and as such might be preferable to users who are averse to simplifying models. Generally, when resampling the input sets we would like to avoid using blocks that are too long, as those hinder "proper mixing". However, as we show, a naive resampling procedure using shorter blocks can bias the tests. SADMAMA implements a new stochastic feature, which we term site-protected resampling, and which successfully solves this problem.</p>
         <p>SADMAMA's utility is demonstrated here by offering a plausible explanation to the differential ARS activity observed in our previous <it>mcm1-1 </it>mutant experiments <abbrgrp><abbr bid="B1">1</abbr></abbrgrp>, by suggesting the relevance of multiple weak ACS matches to efficient replication origin function in <it>S. cerevisiae</it>, and by suggesting an explanation to the observed negative effect <it>FKH2 </it>has on chromatin silencing <abbrgrp><abbr bid="B2">2</abbr></abbrgrp>.</p>
         <p>To the best of our knowledge, we are the first to present a tool for studying the difference in matrix matches between two sets. Very recently, and independently of us, Robin et al. posed the analogous problem in the context of pattern representation of a motif <abbrgrp><abbr bid="B11">11</abbr></abbrgrp>. Our hypergeometric test derived from our binomial modelling of the number of sites is somewhat similar to their binomial test, which is derived from a Poisson model. However, since we deal with matrices, we also study the difference in quality of sites which they do not. SADMAMA also offers a bootstrap approach which is not discussed by Robin et al. Finally, we provide a computational tool while they describe statistical tests.</p>
         <p>We identified several ways to improve and expand SADMAMA's current set of features. To name a couple, SADMAMA currently assumes that the input sequences are independent which therefore excludes it from analyzing phylogenetically related sequences. Given the increased availability of related genomes, extending SADMAMA to handle such cases is highly pertinent. Similarly, for some cases one can argue that a more appropriate motif sites model is that each sequence is endowed with a small, say Poisson drawn, number of sites. Currently, SADMAMA fails to correctly handle this model if there are significant differences in the length of the sequences, and extending it to address this model as well is highly desirable. Finally, helping the users with analyzing multiple tests when such are specified could increase SADMAMA's utility. For example, when more than one site-threshold is considered, or when both the frequency and the quality of sites are examined.</p>
      </sec>
      <sec>
         <st>
            <p>Methods</p>
         </st>
         <sec>
            <st>
               <p>Hypergeometric test</p>
            </st>
            <p>The abstraction of our binomial model for the number of sites in each of the input sets coupled with our null assumption that <it>p</it><sub>1 </sub>= <it>p</it><sub>2 </sub>= <it>p </it>is as follows. Suppose <it>X </it>is a binomial <it>B</it>(<it>n</it>, <it>p</it>) random variable and <it>Y</it>, which is independent of <it>X</it>, is <it>B</it>(<it>m</it>, <it>p</it>) (same <it>p</it>). Conditioned on <it>X </it>+ <it>Y </it>= <it>k </it>(total number of sites in both sets), <it>X </it>has a hypergeometric distribution <it>H</it>(<it>n</it>, <it>m</it>, <it>k</it>):</p>
            <p>
               <display-formula>
                  <m:math name="1471-2105-9-372-i3" xmlns:m="http://www.w3.org/1998/Math/MathML">
                     <m:semantics>
                        <m:mrow>
                           <m:mtable columnalign="left">
                              <m:mtr columnalign="left">
                                 <m:mtd columnalign="left">
                                    <m:mrow>
                                       <m:mi>P</m:mi>
                                       <m:mo stretchy="false">(</m:mo>
                                       <m:mi>X</m:mi>
                                       <m:mo>=</m:mo>
                                       <m:mi>l</m:mi>
                                       <m:mo>|</m:mo>
                                       <m:mi>X</m:mi>
                                       <m:mo>+</m:mo>
                                       <m:mi>Y</m:mi>
                                       <m:mo>=</m:mo>
                                       <m:mi>k</m:mi>
                                       <m:mo stretchy="false">)</m:mo>
                                    </m:mrow>
                                 </m:mtd>
                                 <m:mtd columnalign="left">
                                    <m:mo>=</m:mo>
                                 </m:mtd>
                                 <m:mtd columnalign="left">
                                    <m:mrow>
                                       <m:mfrac>
                                          <m:mrow>
                                             <m:mi>P</m:mi>
                                             <m:mo stretchy="false">(</m:mo>
                                             <m:mi>X</m:mi>
                                             <m:mo>=</m:mo>
                                             <m:mi>l</m:mi>
                                             <m:mo>,</m:mo>
                                             <m:mi>Y</m:mi>
                                             <m:mo>=</m:mo>
                                             <m:mi>k</m:mi>
                                             <m:mo>&#8722;</m:mo>
                                             <m:mi>l</m:mi>
                                             <m:mo stretchy="false">)</m:mo>
                                          </m:mrow>
                                          <m:mrow>
                                             <m:mi>P</m:mi>
                                             <m:mo stretchy="false">(</m:mo>
                                             <m:mi>X</m:mi>
                                             <m:mo>+</m:mo>
                                             <m:mi>Y</m:mi>
                                             <m:mo>=</m:mo>
                                             <m:mi>k</m:mi>
                                             <m:mo stretchy="false">)</m:mo>
                                          </m:mrow>
                                       </m:mfrac>
                                    </m:mrow>
                                 </m:mtd>
                              </m:mtr>
                              <m:mtr columnalign="left">
                                 <m:mtd columnalign="left">
                                    <m:mrow/>
                                 </m:mtd>
                                 <m:mtd columnalign="left">
                                    <m:mo>=</m:mo>
                                 </m:mtd>
                                 <m:mtd columnalign="left">
                                    <m:mrow>
                                       <m:mfrac>
                                          <m:mrow>
                                             <m:mrow>
                                                <m:mo>(</m:mo>
                                                <m:mrow>
                                                   <m:mtable>
                                                      <m:mtr>
                                                         <m:mtd>
                                                            <m:mi>n</m:mi>
                                                         </m:mtd>
                                                      </m:mtr>
                                                      <m:mtr>
                                                         <m:mtd>
                                                            <m:mi>l</m:mi>
                                                         </m:mtd>
                                                      </m:mtr>
                                                   </m:mtable>
                                                </m:mrow>
                                                <m:mo>)</m:mo>
                                             </m:mrow>
                                             <m:msup>
                                                <m:mi>p</m:mi>
                                                <m:mi>l</m:mi>
                                             </m:msup>
                                             <m:msup>
                                                <m:mrow>
                                                   <m:mo stretchy="false">(</m:mo>
                                                   <m:mn>1</m:mn>
                                                   <m:mo>&#8722;</m:mo>
                                                   <m:mi>p</m:mi>
                                                   <m:mo stretchy="false">)</m:mo>
                                                </m:mrow>
                                                <m:mrow>
                                                   <m:mi>n</m:mi>
                                                   <m:mo>&#8722;</m:mo>
                                                   <m:mn>1</m:mn>
                                                </m:mrow>
                                             </m:msup>
                                             <m:mrow>
                                                <m:mo>(</m:mo>
                                                <m:mrow>
                                                   <m:mtable>
                                                      <m:mtr>
                                                         <m:mtd>
                                                            <m:mi>m</m:mi>
                                                         </m:mtd>
                                                      </m:mtr>
                                                      <m:mtr>
                                                         <m:mtd>
                                                            <m:mrow>
                                                               <m:mi>k</m:mi>
                                                               <m:mo>&#8722;</m:mo>
                                                               <m:mi>l</m:mi>
                                                            </m:mrow>
                                                         </m:mtd>
                                                      </m:mtr>
                                                   </m:mtable>
                                                </m:mrow>
                                                <m:mo>)</m:mo>
                                             </m:mrow>
                                             <m:msup>
                                                <m:mi>p</m:mi>
                                                <m:mrow>
                                                   <m:mi>k</m:mi>
                                                   <m:mo>&#8722;</m:mo>
                                                   <m:mi>l</m:mi>
                                                </m:mrow>
                                             </m:msup>
                                             <m:msup>
                                                <m:mrow>
                                                   <m:mo stretchy="false">(</m:mo>
                                                   <m:mn>1</m:mn>
                                                   <m:mo>&#8722;</m:mo>
                                                   <m:mi>p</m:mi>
                                                   <m:mo stretchy="false">)</m:mo>
                                                </m:mrow>
                                                <m:mrow>
                                                   <m:mi>m</m:mi>
                                                   <m:mo>&#8722;</m:mo>
                                                   <m:mo stretchy="false">(</m:mo>
                                                   <m:mi>k</m:mi>
                                                   <m:mo>&#8722;</m:mo>
                                                   <m:mi>l</m:mi>
                                                   <m:mo stretchy="false">)</m:mo>
                                                </m:mrow>
                                             </m:msup>
                                          </m:mrow>
                                          <m:mrow>
                                             <m:mrow>
                                                <m:mo>(</m:mo>
                                                <m:mrow>
                                                   <m:mtable>
                                                      <m:mtr>
                                                         <m:mtd>
                                                            <m:mrow>
                                                               <m:mi>m</m:mi>
                                                               <m:mo>+</m:mo>
                                                               <m:mi>n</m:mi>
                                                            </m:mrow>
                                                         </m:mtd>
                                                      </m:mtr>
                                                      <m:mtr>
                                                         <m:mtd>
                                                            <m:mi>k</m:mi>
                                                         </m:mtd>
                                                      </m:mtr>
                                                   </m:mtable>
                                                </m:mrow>
                                                <m:mo>)</m:mo>
                                             </m:mrow>
                                             <m:msup>
                                                <m:mi>p</m:mi>
                                                <m:mi>k</m:mi>
                                             </m:msup>
                                             <m:msup>
                                                <m:mrow>
                                                   <m:mo stretchy="false">(</m:mo>
                                                   <m:mn>1</m:mn>
                                                   <m:mo>&#8722;</m:mo>
                                                   <m:mi>p</m:mi>
                                                   <m:mo stretchy="false">)</m:mo>
                                                </m:mrow>
                                                <m:mrow>
                                                   <m:mi>n</m:mi>
                                                   <m:mo>+</m:mo>
                                                   <m:mi>m</m:mi>
                                                   <m:mo>&#8722;</m:mo>
                                                   <m:mi>k</m:mi>
                                                </m:mrow>
                                             </m:msup>
                                          </m:mrow>
                                       </m:mfrac>
                                    </m:mrow>
                                 </m:mtd>
                              </m:mtr>
                              <m:mtr columnalign="left">
                                 <m:mtd columnalign="left">
                                    <m:mrow/>
                                 </m:mtd>
                                 <m:mtd columnalign="left">
                                    <m:mo>=</m:mo>
                                 </m:mtd>
                                 <m:mtd columnalign="left">
                                    <m:mrow>
                                       <m:mfrac>
                                          <m:mrow>
                                             <m:mrow>
                                                <m:mo>(</m:mo>
                                                <m:mrow>
                                                   <m:mtable>
                                                      <m:mtr>
                                                         <m:mtd>
                                                            <m:mi>n</m:mi>
                                                         </m:mtd>
                                                      </m:mtr>
                                                      <m:mtr>
                                                         <m:mtd>
                                                            <m:mi>l</m:mi>
                                                         </m:mtd>
                                                      </m:mtr>
                                                   </m:mtable>
                                                </m:mrow>
                                                <m:mo>)</m:mo>
                                             </m:mrow>
                                             <m:mrow>
                                                <m:mo>(</m:mo>
                                                <m:mrow>
                                                   <m:mtable>
                                                      <m:mtr>
                                                         <m:mtd>
                                                            <m:mi>m</m:mi>
                                                         </m:mtd>
                                                      </m:mtr>
                                                      <m:mtr>
                                                         <m:mtd>
                                                            <m:mrow>
                                                               <m:mi>k</m:mi>
                                                               <m:mo>&#8722;</m:mo>
                                                               <m:mi>l</m:mi>
                                                            </m:mrow>
                                                         </m:mtd>
                                                      </m:mtr>
                                                   </m:mtable>
                                                </m:mrow>
                                                <m:mo>)</m:mo>
                                             </m:mrow>
                                          </m:mrow>
                                          <m:mrow>
                                             <m:mrow>
                                                <m:mo>(</m:mo>
                                                <m:mrow>
                                                   <m:mtable>
                                                      <m:mtr>
                                                         <m:mtd>
                                                            <m:mrow>
                                                               <m:mi>m</m:mi>
                                                               <m:mo>+</m:mo>
                                                               <m:mi>n</m:mi>
                                                            </m:mrow>
                                                         </m:mtd>
                                                      </m:mtr>
                                                      <m:mtr>
                                                         <m:mtd>
                                                            <m:mi>k</m:mi>
                                                         </m:mtd>
                                                      </m:mtr>
                                                   </m:mtable>
                                                </m:mrow>
                                                <m:mo>)</m:mo>
                                             </m:mrow>
                                          </m:mrow>
                                       </m:mfrac>
                                       <m:mo>.</m:mo>
                                    </m:mrow>
                                 </m:mtd>
                              </m:mtr>
                           </m:mtable>
                        </m:mrow>
                        <m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaagaart1ev2aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xI8qiVKYPFjYdHaVhbbf9v8qqaqFr0xc9vqFj0dXdbba91qpepeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGaciGaaiaabeqaaeqabiWaaaGcbaqbaeaabmWaaaqaaiabdcfaqjabcIcaOiabdIfayjabg2da9iabdYgaSjabcYha8jabdIfayjabgUcaRiabdMfazjabg2da9iabdUgaRjabcMcaPaqaaiabg2da9aqcfayaamaalaaabaGaemiuaaLaeiikaGIaemiwaGLaeyypa0JaemiBaWMaeiilaWIaemywaKLaeyypa0Jaem4AaSMaeyOeI0IaemiBaWMaeiykaKcabaGaemiuaaLaeiikaGIaemiwaGLaey4kaSIaemywaKLaeyypa0Jaem4AaSMaeiykaKcaaaGcbaaabaGaeyypa0dajuaGbaWaaSaaaeaadaqadaqaauaabeqaceaaaeaacqWGUbGBaeaacqWGSbaBaaaacaGLOaGaayzkaaGaemiCaa3aaWbaaeqabaGaemiBaWgaaiabcIcaOiabigdaXiabgkHiTiabdchaWjabcMcaPmaaCaaabeqaaiabd6gaUjabgkHiTiabigdaXaaadaqadaqaauaabeqaceaaaeaacqWGTbqBaeaacqWGRbWAcqGHsislcqWGSbaBaaaacaGLOaGaayzkaaGaemiCaa3aaWbaaeqabaGaem4AaSMaeyOeI0IaemiBaWgaaiabcIcaOiabigdaXiabgkHiTiabdchaWjabcMcaPmaaCaaabeqaaiabd2gaTjabgkHiTiabcIcaOiabdUgaRjabgkHiTiabdYgaSjabcMcaPaaaaeaadaqadaqaauaabeqaceaaaeaacqWGTbqBcqGHRaWkcqWGUbGBaeaacqWGRbWAaaaacaGLOaGaayzkaaGaemiCaa3aaWbaaeqabaGaem4AaSgaaiabcIcaOiabigdaXiabgkHiTiabdchaWjabcMcaPmaaCaaabeqaaiabd6gaUjabgUcaRiabd2gaTjabgkHiTiabdUgaRbaaaaaakeaaaeaacqGH9aqpaeaajuaGdaWcaaqaamaabmaabaqbaeqabiqaaaqaaiabd6gaUbqaaiabdYgaSbaaaiaawIcacaGLPaaadaqadaqaauaabeqaceaaaeaacqWGTbqBaeaacqWGRbWAcqGHsislcqWGSbaBaaaacaGLOaGaayzkaaaabaWaaeWaaeaafaqabeGabaaabaGaemyBa0Maey4kaSIaemOBa4gabaGaem4AaSgaaaGaayjkaiaawMcaaaaakiabc6caUaaaaaa@A441@</m:annotation>
                     </m:semantics>
                  </m:math>
               </display-formula>
            </p>
            <p>Thus the <it>p</it>-value of an observed value <it>X </it>= <it>x </it>against the one sided alternative <it>p</it><sub>1 </sub>> <it>p</it><sub>2 </sub>is</p>
            <p>
               <display-formula>
                  <m:math name="1471-2105-9-372-i4" xmlns:m="http://www.w3.org/1998/Math/MathML">
                     <m:semantics>
                        <m:mrow>
                           <m:mstyle displaystyle="true">
                              <m:munderover>
                                 <m:mo>&#8721;</m:mo>
                                 <m:mrow>
                                    <m:mi>l</m:mi>
                                    <m:mo>=</m:mo>
                                    <m:mi>x</m:mi>
                                 </m:mrow>
                                 <m:mi>k</m:mi>
                              </m:munderover>
                              <m:mrow>
                                 <m:mfrac>
                                    <m:mrow>
                                       <m:mrow>
                                          <m:mo>(</m:mo>
                                          <m:mrow>
                                             <m:mtable>
                                                <m:mtr>
                                                   <m:mtd>
                                                      <m:mrow>
                                                         <m:msub>
                                                            <m:mi>n</m:mi>
                                                            <m:mn>1</m:mn>
                                                         </m:msub>
                                                      </m:mrow>
                                                   </m:mtd>
                                                </m:mtr>
                                                <m:mtr>
                                                   <m:mtd>
                                                      <m:mi>l</m:mi>
                                                   </m:mtd>
                                                </m:mtr>
                                             </m:mtable>
                                          </m:mrow>
                                          <m:mo>)</m:mo>
                                       </m:mrow>
                                       <m:mrow>
                                          <m:mo>(</m:mo>
                                          <m:mrow>
                                             <m:mtable>
                                                <m:mtr>
                                                   <m:mtd>
                                                      <m:mrow>
                                                         <m:msub>
                                                            <m:mi>n</m:mi>
                                                            <m:mn>2</m:mn>
                                                         </m:msub>
                                                      </m:mrow>
                                                   </m:mtd>
                                                </m:mtr>
                                                <m:mtr>
                                                   <m:mtd>
                                                      <m:mrow>
                                                         <m:mi>k</m:mi>
                                                         <m:mo>&#8722;</m:mo>
                                                         <m:mi>l</m:mi>
                                                      </m:mrow>
                                                   </m:mtd>
                                                </m:mtr>
                                             </m:mtable>
                                          </m:mrow>
                                          <m:mo>)</m:mo>
                                       </m:mrow>
                                    </m:mrow>
                                    <m:mrow>
                                       <m:mrow>
                                          <m:mo>(</m:mo>
                                          <m:mrow>
                                             <m:mtable>
                                                <m:mtr>
                                                   <m:mtd>
                                                      <m:mrow>
                                                         <m:msub>
                                                            <m:mi>n</m:mi>
                                                            <m:mn>1</m:mn>
                                                         </m:msub>
                                                         <m:mo>+</m:mo>
                                                         <m:msub>
                                                            <m:mi>n</m:mi>
                                                            <m:mn>2</m:mn>
                                                         </m:msub>
                                                      </m:mrow>
                                                   </m:mtd>
                                                </m:mtr>
                                                <m:mtr>
                                                   <m:mtd>
                                                      <m:mi>k</m:mi>
                                                   </m:mtd>
                                                </m:mtr>
                                             </m:mtable>
                                          </m:mrow>
                                          <m:mo>)</m:mo>
                                       </m:mrow>
                                    </m:mrow>
                                 </m:mfrac>
                              </m:mrow>
                           </m:mstyle>
                           <m:mo>,</m:mo>
                        </m:mrow>
                        <m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaagaart1ev2aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xI8qiVKYPFjYdHaVhbbf9v8qqaqFr0xc9vqFj0dXdbba91qpepeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGaciGaaiaabeqaaeqabiWaaaGcbaWaaabCaKqbagaadaWcaaqaamaabmaabaqbaeqabiqaaaqaaiabd6gaUnaaBaaabaGaeGymaedabeaaaeaacqWGSbaBaaaacaGLOaGaayzkaaWaaeWaaeaafaqabeGabaaabaGaemOBa42aaSbaaeaacqaIYaGmaeqaaaqaaiabdUgaRjabgkHiTiabdYgaSbaaaiaawIcacaGLPaaaaeaadaqadaqaauaabeqaceaaaeaacqWGUbGBdaWgaaqaaiabigdaXaqabaGaey4kaSIaemOBa42aaSbaaeaacqaIYaGmaeqaaaqaaiabdUgaRbaaaiaawIcacaGLPaaaaaaaleaacqWGSbaBcqGH9aqpcqWG4baEaeaacqWGRbWAa0GaeyyeIuoakiabcYcaSaaa@4B19@</m:annotation>
                     </m:semantics>
                  </m:math>
               </display-formula>
            </p>
            <p>where <it>k </it>is the combined number of sites observed in both sets, and <it>n</it><sub>1 </sub>and <it>n</it><sub>2 </sub>are the number of feasible site locations in the input sets (slightly less than their lengths due to "edge effects": a site cannot begin too close to a sequence end). Technically we use Catherine Loader's carefully implemented package <abbrgrp><abbr bid="B37">37</abbr></abbrgrp> to execute the crux of the computation.</p>
         </sec>
         <sec>
            <st>
               <p>Site-protected bootstrap</p>
            </st>
            <p>The success of the site-protected bootstrap option in SADMAMA hinges on its ability to set a "reasonable" value for <it>&#945;</it>, the probability that SADMAMA protects the block (more precisely, it minimally extends the randomly chosen block so as to include all sites that started within that original block). SADMAMA's strategy is to choose <it>&#945; </it>so that the expected total frequency of sites across the two sets is close to (ideally the same as) the site frequency <it>&#957; </it>in the sample pool. By default, the latter is the concatenation of the two input sets. Setting <it>&#945; </it>= 0 amounts to the naive resampling approach as no block will be extended. As we saw in the section on Applications of SADMAMA this tends to generate samples with site frequency &lt;<it>&#957; </it>(Figure <figr fid="F1">1</figr>). These site-poor samples can in turn inflate the overall significance of the test. On the other hand, setting <it>&#945; </it>= 1 amounts to protecting every sampled block. This setting does not take into account potential new sites appearing across the seams of sampled blocks and therefore it tends to generate samples with site frequency > <it>&#957;</it>. This in turn could yield a test which is too conservative.</p>
            <p>SADMAMA sets <it>&#945; </it>before the main resampling loop begins. Our goal is to set <it>&#945; </it>so that the frequency of sites in an infinitely long sequence constructed from the site-protected resampling procedure will be exactly <it>&#957;</it>. In reality we settle for a fairly long sequence generated by this procedure. But how long is long enough? Clearly, this length should be a function of <it>&#957;</it>: the smaller <it>&#957; </it>is, the longer the training sequence needs be. More generally, how can we be confident we have a "reasonable" estimate of a Bernoulli success probability <it>&#957;</it>? One way is to generate sufficiently many trials so that the size of our confidence interval for <it>&#957; </it>is a small fraction, <it>&#947;</it>, of <it>&#957; </it>(<it>&#947; </it>= 0.05 is SADMAMA's default). Here we aim at estimating a site frequency which is roughly <it>&#957; </it>so using a Wald (normal) confidence interval implies</p>
            <p>
               <display-formula>
                  <m:math name="1471-2105-9-372-i5" xmlns:m="http://www.w3.org/1998/Math/MathML">
                     <m:semantics>
                        <m:mrow>
                           <m:mi>c</m:mi>
                           <m:msqrt>
                              <m:mrow>
                                 <m:mi>&#957;</m:mi>
                                 <m:mo stretchy="false">(</m:mo>
                                 <m:mn>1</m:mn>
                                 <m:mo>&#8722;</m:mo>
                                 <m:mi>&#957;</m:mi>
                                 <m:mo stretchy="false">)</m:mo>
                                 <m:mo>/</m:mo>
                                 <m:mi>n</m:mi>
                              </m:mrow>
                           </m:msqrt>
                           <m:mo>&#8804;</m:mo>
                           <m:mi>&#947;</m:mi>
                           <m:mi>&#957;</m:mi>
                           <m:mo>,</m:mo>
                        </m:mrow>
                        <m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaagaart1ev2aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xI8qiVKYPFjYdHaVhbbf9v8qqaqFr0xc9vqFj0dXdbba91qpepeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGaciGaaiaabeqaaeqabiWaaaGcbaGaem4yam2aaOaaaeaacqaH9oGBcqGGOaakcqaIXaqmcqGHsislcqaH9oGBcqGGPaqkcqGGVaWlcqWGUbGBaSqabaGccqGHKjYOcqaHZoWzcqaH9oGBcqGGSaalaaa@3CD5@</m:annotation>
                     </m:semantics>
                  </m:math>
               </display-formula>
            </p>
            <p>where <it>c </it>is a small factor determining the size of the confidence interval (<it>c </it>= 3 by default), and <it>n </it>is the resampled sequence length we seek. It follows that we should set</p>
            <p>
               <display-formula>
                  <m:math name="1471-2105-9-372-i6" xmlns:m="http://www.w3.org/1998/Math/MathML">
                     <m:semantics>
                        <m:mrow>
                           <m:mi>n</m:mi>
                           <m:mo>&#8805;</m:mo>
                           <m:msup>
                              <m:mrow>
                                 <m:mrow>
                                    <m:mo>(</m:mo>
                                    <m:mrow>
                                       <m:mfrac>
                                          <m:mi>c</m:mi>
                                          <m:mi>&#947;</m:mi>
                                       </m:mfrac>
                                    </m:mrow>
                                    <m:mo>)</m:mo>
                                 </m:mrow>
                              </m:mrow>
                              <m:mn>2</m:mn>
                           </m:msup>
                           <m:mfrac>
                              <m:mrow>
                                 <m:mn>1</m:mn>
                                 <m:mo>&#8722;</m:mo>
                                 <m:mi>&#957;</m:mi>
                              </m:mrow>
                              <m:mi>&#957;</m:mi>
                           </m:mfrac>
                           <m:mo>.</m:mo>
                        </m:mrow>
                        <m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaagaart1ev2aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xI8qiVKYPFjYdHaVhbbf9v8qqaqFr0xc9vqFj0dXdbba91qpepeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGaciGaaiaabeqaaeqabiWaaaGcbaGaemOBa4MaeyyzIm7aaeWaaKqbagaadaWcaaqaaiabdogaJbqaaiabeo7aNbaaaOGaayjkaiaawMcaamaaCaaaleqabaGaeGOmaidaaKqbaoaalaaabaGaeGymaeJaeyOeI0IaeqyVd4gabaGaeqyVd4gaaOGaeiOla4caaa@3C6D@</m:annotation>
                     </m:semantics>
                  </m:math>
               </display-formula>
            </p>
            <p>In practice, to keep runtime and memory requirements under control SADMAMA caps the size of this sequence using a compilation time parameter (currently set at 10<sup>6</sup>).</p>
            <p>One approach to setting <it>&#945; </it>would be to design a binary search keeping in mind the stochastic nature of the resampling procedure. The main downside of such an approach is that generating and then scanning a large sequence for sites can be time consuming. While one can imagine various tricks to speed up this process we chose a different shortcut.</p>
            <p>As mentioned above, there are two types of sites in our site-protected resampled sequence of length <it>n</it>. Type I sites are sites that are entirely contained in a resampled block, i.e., they also appear in the original sample pool. Type II sites, are newly generated sites that span two or more resampled blocks. These resampled blocks are adjacent in the resampled sequence but not in the original sample pool. Let <it>K </it>be the (random) number of blocks required to generate the resampled sequence and let <it>B</it><sub><it>i </it></sub>denote the <it>i</it>th random block. Then the random number of sites, <it>S</it><sub><it>&#945;</it></sub>, is given by</p>
            <p>
               <display-formula>
                  <m:math name="1471-2105-9-372-i7" xmlns:m="http://www.w3.org/1998/Math/MathML">
                     <m:semantics>
                        <m:mrow>
                           <m:msub>
                              <m:mi>S</m:mi>
                              <m:mi>&#945;</m:mi>
                           </m:msub>
                           <m:mo>=</m:mo>
                           <m:mstyle displaystyle="true">
                              <m:munderover>
                                 <m:mo>&#8721;</m:mo>
                                 <m:mrow>
                                    <m:mi>i</m:mi>
                                    <m:mo>=</m:mo>
                                    <m:mn>1</m:mn>
                                 </m:mrow>
                                 <m:mi>K</m:mi>
                              </m:munderover>
                              <m:mrow>
                                 <m:msub>
                                    <m:mn>1</m:mn>
                                    <m:mrow>
                                       <m:mo>{</m:mo>
                                       <m:msub>
                                          <m:mi>B</m:mi>
                                          <m:mi>i</m:mi>
                                       </m:msub>
                                       <m:mtext>&#160;has&#160;a&#160;type&#160;I&#160;site</m:mtext>
                                       <m:mo>}</m:mo>
                                    </m:mrow>
                                 </m:msub>
                                 <m:mo>+</m:mo>
                                 <m:msub>
                                    <m:mn>1</m:mn>
                                    <m:mrow>
                                       <m:mo>{</m:mo>
                                       <m:mtext>a&#160;type&#160;II&#160;site&#160;starts&#160;in&#160;</m:mtext>
                                       <m:msub>
                                          <m:mi>B</m:mi>
                                          <m:mi>i</m:mi>
                                       </m:msub>
                                       <m:mo>}</m:mo>
                                    </m:mrow>
                                 </m:msub>
                                 <m:mo>.</m:mo>
                              </m:mrow>
                           </m:mstyle>
                        </m:mrow>
                        <m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaagaart1ev2aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xI8qiVKYPFjYdHaVhbbf9v8qqaqFr0xc9vqFj0dXdbba91qpepeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGaciGaaiaabeqaaeqabiWaaaGcbaGaem4uam1aaSbaaSqaaiabeg7aHbqabaGccqGH9aqpdaaeWbqaaiabigdaXmaaBaaaleaacqGG7bWEcqWGcbGqdaWgaaadbaGaemyAaKgabeaaliabbccaGiabbIgaOjabbggaHjabbohaZjabbccaGiabbggaHjabbccaGiabbsha0jabbMha5jabbchaWjabbwgaLjabbccaGiabbMeajjabbccaGiabbohaZjabbMgaPjabbsha0jabbwgaLjabc2ha9bqabaGccqGHRaWkcqaIXaqmdaWgaaWcbaGaei4EaSNaeeyyaeMaeeiiaaIaeeiDaqNaeeyEaKNaeeiCaaNaeeyzauMaeeiiaaIaeeysaKKaeeysaKKaeeiiaaIaee4CamNaeeyAaKMaeeiDaqNaeeyzauMaeeiiaaIaee4CamNaeeiDaqNaeeyyaeMaeeOCaiNaeeiDaqNaee4CamNaeeiiaaIaeeyAaKMaeeOBa4MaeeiiaaIaemOqai0aaSbaaWqaaiabdMgaPbqabaWccqGG9bqFaeqaaOGaeiOla4caleaacqWGPbqAcqGH9aqpcqaIXaqmaeaacqWGlbWsa0GaeyyeIuoaaaa@7A1C@</m:annotation>
                     </m:semantics>
                  </m:math>
               </display-formula>
            </p>
            <p>To simplify our derivation, we now assume that the block size <it>b </it>is less than the motif length <it>l </it>and we ignore the fact that the last block is typically truncated. In this case, the event {<it>B</it><sub><it>i </it></sub>has a type I site} only occurs if originally a site starts in block <it>B</it><sub><it>i</it></sub>, with probability &#8776; <it>b &#957;</it>, and the block is protected, with probability <it>&#945;</it>. It follows that</p>
            <p>
               <display-formula>
                  <m:math name="1471-2105-9-372-i8" xmlns:m="http://www.w3.org/1998/Math/MathML">
                     <m:semantics>
                        <m:mrow>
                           <m:mi>E</m:mi>
                           <m:mo stretchy="false">(</m:mo>
                           <m:msub>
                              <m:mn>1</m:mn>
                              <m:mrow>
                                 <m:mo>{</m:mo>
                                 <m:msub>
                                    <m:mi>B</m:mi>
                                    <m:mi>i</m:mi>
                                 </m:msub>
                                 <m:mtext>&#160;has&#160;a&#160;type&#160;I&#160;site</m:mtext>
                                 <m:mo>}</m:mo>
                              </m:mrow>
                           </m:msub>
                           <m:mo stretchy="false">)</m:mo>
                           <m:mo>&#8776;</m:mo>
                           <m:mi>b</m:mi>
                           <m:mi>&#957;</m:mi>
                           <m:mi>&#945;</m:mi>
                           <m:mo>.</m:mo>
                        </m:mrow>
                        <m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaagaart1ev2aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xH8viVGI8Gi=hEeeu0xXdbba9frFj0xb9qqpG0dXdb9aspeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGaciGaaiaabeqaaeqabiWaaaGcbaGaemyrauKaeiikaGIaeGymaeZaaSbaaSqaaiabcUha7jabdkeacnaaBaaameaacqWGPbqAaeqaaSGaeeiiaaIaeeiAaGMaeeyyaeMaee4CamNaeeiiaaIaeeyyaeMaeeiiaaIaeeiDaqNaeeyEaKNaeeiCaaNaeeyzauMaeeiiaaIaeeysaKKaeeiiaaIaee4CamNaeeyAaKMaeeiDaqNaeeyzauMaeiyFa0habeaakiabcMcaPiabgIKi7kabdkgaIjabe27aUjabeg7aHjabc6caUaaa@5211@</m:annotation>
                     </m:semantics>
                  </m:math>
               </display-formula>
            </p>
            <p>The number of blocks we extend is negatively correlated with <it>K</it>. However, to first order, <inline-formula><m:math name="1471-2105-9-372-i9" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mrow><m:mi>E</m:mi><m:mo stretchy="false">(</m:mo><m:mi>K</m:mi><m:mo stretchy="false">)</m:mo><m:mo>=</m:mo><m:mfrac><m:mi>n</m:mi><m:mi>b</m:mi></m:mfrac><m:mo>+</m:mo><m:mi>O</m:mi><m:mo stretchy="false">(</m:mo><m:mi>l</m:mi><m:mi>&#957;</m:mi><m:mi>&#945;</m:mi><m:mo stretchy="false">)</m:mo></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaagaart1ev2aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xH8viVGI8Gi=hEeeu0xXdbba9frFj0xb9qqpG0dXdb9aspeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGaciGaaiaabeqaaeqabiWaaaGcbaGaemyrauKaeiikaGIaem4saSKaeiykaKIaeyypa0tcfa4aaSaaaeaacqWGUbGBaeaacqWGIbGyaaGccqGHRaWkcqWGpbWtcqGGOaakcqWGSbaBcqaH9oGBcqaHXoqycqGGPaqkaaa@3C8C@</m:annotation></m:semantics></m:math></inline-formula> and, as <it>l&#957;&#945; </it>is typically negligible compare to <inline-formula><m:math name="1471-2105-9-372-i10" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mrow><m:mfrac><m:mi>n</m:mi><m:mi>b</m:mi></m:mfrac></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaagaart1ev2aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xH8viVGI8Gi=hEeeu0xXdbba9frFj0xb9qqpG0dXdb9aspeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGaciGaaiaabeqaaeqabiWaaaGcbaqcfa4aaSaaaeaacqWGUbGBaeaacqWGIbGyaaaaaa@2F25@</m:annotation></m:semantics></m:math></inline-formula>, we can assume <inline-formula><m:math name="1471-2105-9-372-i11" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mrow><m:mi>K</m:mi><m:mo>&#8776;</m:mo><m:mfrac><m:mi>n</m:mi><m:mi>b</m:mi></m:mfrac></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaagaart1ev2aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xH8viVGI8Gi=hEeeu0xXdbba9frFj0xb9qqpG0dXdb9aspeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGaciGaaiaabeqaaeqabiWaaaGcbaGaem4saSKaeyisISBcfa4aaSaaaeaacqWGUbGBaeaacqWGIbGyaaaaaa@31F5@</m:annotation></m:semantics></m:math></inline-formula> is roughly constant. Therefore,</p>
            <p>
               <display-formula id="M1">
                  <m:math name="1471-2105-9-372-i12" xmlns:m="http://www.w3.org/1998/Math/MathML">
                     <m:semantics>
                        <m:mrow>
                           <m:mi>E</m:mi>
                           <m:mo stretchy="false">(</m:mo>
                           <m:msub>
                              <m:mi>S</m:mi>
                              <m:mi>&#945;</m:mi>
                           </m:msub>
                           <m:mo stretchy="false">)</m:mo>
                           <m:mo>&#8776;</m:mo>
                           <m:mi>n</m:mi>
                           <m:mi>&#957;</m:mi>
                           <m:mi>&#945;</m:mi>
                           <m:mo>+</m:mo>
                           <m:mfrac>
                              <m:mi>n</m:mi>
                              <m:mi>b</m:mi>
                           </m:mfrac>
                           <m:mi>&#956;</m:mi>
                        </m:mrow>
                        <m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaagaart1ev2aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xI8qiVKYPFjYdHaVhbbf9v8qqaqFr0xc9vqFj0dXdbba91qpepeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGaciGaaiaabeqaaeqabiWaaaGcbaGaemyrauKaeiikaGIaem4uam1aaSbaaSqaaiabeg7aHbqabaGccqGGPaqkcqGHijYUcqWGUbGBcqaH9oGBcqaHXoqycqGHRaWkjuaGdaWcaaqaaiabd6gaUbqaaiabdkgaIbaakiabeY7aTbaa@3E4B@</m:annotation>
                     </m:semantics>
                  </m:math>
               </display-formula>
            </p>
            <p>where <inline-formula><m:math name="1471-2105-9-372-i13" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mrow><m:mi>&#956;</m:mi><m:mo>=</m:mo><m:mi>E</m:mi><m:mo stretchy="false">(</m:mo><m:msub><m:mn>1</m:mn><m:mrow><m:mo>{</m:mo><m:mtext>a&#160;type&#160;II&#160;site&#160;starts&#160;in&#160;</m:mtext><m:msub><m:mi>B</m:mi><m:mi>i</m:mi></m:msub><m:mo>}</m:mo></m:mrow></m:msub></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaagaart1ev2aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xH8viVGI8Gi=hEeeu0xXdbba9frFj0xb9qqpG0dXdb9aspeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGaciGaaiaabeqaaeqabiWaaaGcbaGaeqiVd0Maeyypa0JaemyrauKaeiikaGIaeGymaeZaaSbaaSqaaiabcUha7jabbggaHjabbccaGiabbsha0jabbMha5jabbchaWjabbwgaLjabbccaGiabbMeajjabbMeajjabbccaGiabbohaZjabbMgaPjabbsha0jabbwgaLjabbccaGiabbohaZjabbsha0jabbggaHjabbkhaYjabbsha0jabbohaZjabbccaGiabbMgaPjabb6gaUjabbccaGiabdkeacnaaBaaameaacqWGPbqAaeqaaSGaeiyFa0habeaaaaa@55AB@</m:annotation></m:semantics></m:math></inline-formula> is the probability of a new, shared-block, site.</p>
            <p>The exact form of (1) is not that important to us here, as is the fact that the right hand side is a linear function of <it>&#945; </it>(see also Figure <figr fid="F3">3</figr> for an empirical demonstration). We therefore estimate <it>E </it>(<it>S</it><sub><it>&#945;</it></sub>) for <it>&#945; </it>= 0 and <it>&#945; </it>= 1 by generating corresponding resampled sequences of length <it>n </it>and counting the number of observed sites <it>m</it><sub>0 </sub>and <it>m</it><sub>1 </sub>respectively (these resampled sequences are solely generated for the purpose of determining <it>&#945; </it>and are not further used in SADMAMA's main bootstrap tests). SADMAMA then relies on linear interpolation to set</p>
            <fig id="F3">
               <title>
                  <p>Figure 3</p>
               </title>
               <caption>
                  <p>The expected number of sites as a linear function of <it>&#945;</it></p>
               </caption>
               <text>
                  <p><b>The expected number of sites as a linear function of <it>&#945;</it></b>. Average total number of sites in the sequence per <it>&#945;</it>, the probability that a sampled block is extended. Site threshold, background file and all similar settings were as described in the subsection on Bootstrap tests in the Methods section. The average was taken over 100 random resampled sequences of length <it>n </it>per each value of <it>&#945;</it>.</p>
               </text>
               <graphic file="1471-2105-9-372-3"/>
            </fig>
            <p>
               <display-formula>
                  <m:math name="1471-2105-9-372-i14" xmlns:m="http://www.w3.org/1998/Math/MathML">
                     <m:semantics>
                        <m:mrow>
                           <m:mi>&#945;</m:mi>
                           <m:mo>=</m:mo>
                           <m:mfrac>
                              <m:mrow>
                                 <m:mi>n</m:mi>
                                 <m:mi>&#957;</m:mi>
                                 <m:mo>&#8722;</m:mo>
                                 <m:msub>
                                    <m:mi>m</m:mi>
                                    <m:mn>0</m:mn>
                                 </m:msub>
                              </m:mrow>
                              <m:mrow>
                                 <m:msub>
                                    <m:mi>m</m:mi>
                                    <m:mn>1</m:mn>
                                 </m:msub>
                                 <m:mo>&#8722;</m:mo>
                                 <m:msub>
                                    <m:mi>m</m:mi>
                                    <m:mn>0</m:mn>
                                 </m:msub>
                              </m:mrow>
                           </m:mfrac>
                        </m:mrow>
                        <m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaagaart1ev2aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xI8qiVKYPFjYdHaVhbbf9v8qqaqFr0xc9vqFj0dXdbba91qpepeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGaciGaaiaabeqaaeqabiWaaaGcbaGaeqySdeMaeyypa0tcfa4aaSaaaeaacqWGUbGBcqaH9oGBcqGHsislcqWGTbqBdaWgaaqaaiabicdaWaqabaaabaGaemyBa02aaSbaaeaacqaIXaqmaeqaaiabgkHiTiabd2gaTnaaBaaabaGaeGimaadabeaaaaaaaa@3BB5@</m:annotation>
                     </m:semantics>
                  </m:math>
               </display-formula>
            </p>
            <p>so that <it>E </it>(<it>S</it><sub><it>&#945;</it></sub>) = <it>n&#957;</it>. Note that if <inline-formula><m:math name="1471-2105-9-372-i15" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mrow><m:mi>&#957;</m:mi><m:mo>&lt;</m:mo><m:mfrac><m:mrow><m:msub><m:mi>m</m:mi><m:mn>0</m:mn></m:msub></m:mrow><m:mi>n</m:mi></m:mfrac></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaagaart1ev2aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xH8viVGI8Gi=hEeeu0xXdbba9frFj0xb9qqpG0dXdb9aspeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGaciGaaiaabeqaaeqabiWaaaGcbaGaeqyVd4MaeyipaWtcfa4aaSaaaeaacqWGTbqBdaWgaaqaaiabicdaWaqabaaabaGaemOBa4gaaaaa@3306@</m:annotation></m:semantics></m:math></inline-formula> SADMAMA sets <it>&#945; </it>= 0 and throws up a warning that random shuffling of blocks creates more sites than there were to begin with. Similarly, if <inline-formula><m:math name="1471-2105-9-372-i16" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mrow><m:mi>&#957;</m:mi><m:mo>></m:mo><m:mfrac><m:mrow><m:msub><m:mi>m</m:mi><m:mn>1</m:mn></m:msub></m:mrow><m:mi>n</m:mi></m:mfrac></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaagaart1ev2aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xH8viVGI8Gi=hEeeu0xXdbba9frFj0xb9qqpG0dXdb9aspeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGaciGaaiaabeqaaeqabiWaaaGcbaGaeqyVd4MaeyOpa4tcfa4aaSaaaeaacqWGTbqBdaWgaaqaaiabigdaXaqabaaabaGaemOBa4gaaaaa@330C@</m:annotation></m:semantics></m:math></inline-formula> SADMAMA sets <it>&#945; </it>= 1 as that is the highest density of sites you can get with this recipe.</p>
         </sec>
         <sec>
            <st>
               <p>Background model</p>
            </st>
            <p>In all the tests we report, we used SADMAMA's default setting of a 3rd order Markov background model. Unless otherwise stated, the training file from which this model was learned was our "standard <it>S. cerevisiae </it>intergenic file". This file was generated by removing from the <it>S. cerevisiae </it>genome downloaded from SGD <abbrgrp><abbr bid="B38">38</abbr></abbrgrp> all protein and RNA coding sequences including tRNA, rRNA, snoRNA, snRNA and other presumably irrelevant elements such as LTR, and repetitive sequences. We also generated an "ARS-less <it>S. cerevisiae </it>intergenic file" by removing all 325 OriDB-confirmed ARSs <abbrgrp><abbr bid="B30">30</abbr></abbrgrp> from our standard <it>S. cerevisiae </it>intergenic file.</p>
         </sec>
         <sec>
            <st>
               <p>Stable vs. unstable ARSs in <it>mcm1-1 </it>mutant</p>
            </st>
            <p>The ARSs we identified in our <it>mcm1-1 </it>screen were much longer than the typical size of confirmed ARSs in OriDB <abbrgrp><abbr bid="B30">30</abbr></abbrgrp>. To perform our statistical analysis we therefore restricted our attention to what we conjectured to be the core of each of these ARSs. Specifically, we picked the best ACS match in each of these ARSs, as predicted by <abbrgrp><abbr bid="B39">39</abbr></abbrgrp>, and considered only the 200 bases on each side of this match. Similar lengths were explored giving essentially the same picture. We note that two of the ARSs, one stable and one unstable, had no predicted ACS matches so we left those out for this analysis.</p>
            <p>The ABF1 and MCM1 matrices were taken from TRANSFAC <abbrgrp><abbr bid="B40">40</abbr></abbrgrp> via TESS <abbrgrp><abbr bid="B5">5</abbr></abbrgrp>. Given the palindromic nature of the MCM1 sites, the half MCM1 matrix was defined by adding the reverse complement of the second halves to the first halves of the sites. The ORC matrix was taken from <abbrgrp><abbr bid="B39">39</abbr></abbrgrp>. All matrices were adjusted using a total pseudocount of 10% added uniformly to all bases. Site thresholds were set so that 0.5%, 0.1%, and 0.05% of the words in the standard <it>S. cerevisiae </it>background file exceeded these numbers. The maximal overlap allowed between sites in this as well as all our other tests in this paper was the default 20%.</p>
         </sec>
         <sec>
            <st>
               <p><it>S. kluyveri </it>vs. <it>S. cerevisiae </it>ACS</p>
            </st>
            <p>Our <it>S. kluyveri </it>set of ARSs included 46 <it>S. kluyveri </it>DNA segments (defined using DpnII which is a 4-cutter restriction enzyme recognizing the sequence GATC) that conferred S. cerevisiae-ARS activity to the plasmid. The <it>S. kluyveri </it>set included 37,176 bases, 69% of which were AT. Our <it>S. cerevisiae </it>set, generated using the same protocol, included 36 sequences of <it>S. cerevisiae </it>DNA containing 29,561 bases, 66% of which were AT. The ACS matrix was again taken from <abbrgrp><abbr bid="B39">39</abbr></abbrgrp>. Given the amount of data from which this PWM was generated, we used a reduced pseudocount of 1% added uniformly to all bases. Our ARS screen was carried out in <it>S. cerevisiae </it>and as noted the AT content of the kluyveri set was much more in line with <it>S. cerevisiae </it>intergenic DNA than <it>S. kluyveri </it>one. We therefore used the aforementioned standard <it>S. cerevisiae </it>intergenic background file. The site threshold was set to 0.05%, that is, roughly 5 in 10,000 words in the background file are above the chosen threshold (similar results were observed with the 0.1% threshold).</p>
            <p>In our first of two types of permutation tests we ran SADMAMA 10,000 times with the <it>S. kluyveri </it>set serving as the first as well as the second input set. SADMAMA was instructed to randomly permute the second input set, which it does by separately permuting each sequence in the set. In each of these 10,000 runs the unpermuted <it>S. kluyveri </it>set was deemed to have more sites with a <it>p</it>-value &#8804; 0.05.</p>
            <p>In the second of our permutation tests we ran SADMAMA 4,000 times comparing the <it>S. kluyveri </it>set against a dummy set while asking SADMAMA to permute the given ACS matrix. SADMAMA then found the threshold so that the background file will have a rate of 0.05% sites of the permuted matrix which is the same as the percentage of sites for the original, unpermuted, ACS matrix. For each permuted matrix we keep tally of how many sites SADMAMA identifies in the <it>S. kluyveri </it>set (in this mode no tests were actually done: SADMAMA simply counts the number of sites above the threshold which it computed as described above), and we compare those counts with the number of (unpermuted) ACS sites in the same set. It is important to note that, generally, setting the threshold can be rather arbitrary. However, this is not the case when you want to compare site counts of different matrices. Therefore, to make sure the threshold is set to control "background" rather than "real" sites we used the ARS-less S. cerevisiae background file in this test.</p>
         </sec>
         <sec>
            <st>
               <p>Bootstrap tests</p>
            </st>
            <p>From the list of 325 confirmed <it>S. cerevisiae </it>ARSs on OriDB <abbrgrp><abbr bid="B30">30</abbr></abbrgrp> we selected all ARSs shorter than 400 base pairs. Ordering these selected ARSs according to their location in the genome, we then assigned all even numbered ARSs to the "even" set (116 sequences, 28327 bases) and all the odd ones to the "odd" set (116 sequences, 28932 bases). Using the hypergeometric test SADMAMA <it>p</it>-valued these sets' enrichment in ACS sites relative to the ARS-less <it>S. cerevisiae </it>intergenic file (see Methods section) at 3 &#215; 10<sup>-90 </sup>and 6 &#215; 10<sup>-74 </sup>respectively. In these runs SADMAMA used same ACS matrix from <abbrgrp><abbr bid="B39">39</abbr></abbrgrp> and pseudocount of 0.1 as above. The site threshold was set to 0.01% relative to the background file which was the standard <it>S. cerevisiae </it>intergenic file.</p>
            <p>Using the same settings, SADMAMA's hypergeometric test comparing the even and the odd sets was insignificant at 0.87 and 0.16, depending on the chosen one-sided alternative. However, using the naive bootstrap test with block length <it>b </it>= 6, SADMAMA reported that the even set is significantly enriched for ACS sites with a <it>p</it>-value of 0.0005. The difference is still significant at 0.008 for <it>b </it>= 10, and it is even significant for <it>b </it>= 15 at 0.04. With the site-protected feature turned on and <it>b </it>= 6, SADMAMA found the observed difference in ACS sites frequency to be insignificant at 0.87 and 0.13 depending on the chosen one-sided alternative. These <it>p</it>-values remained roughly the same for all other block lengths we looked at including <it>b </it>= 1. Other bootstrap settings were: site statistics are gathered set-wide, using 10,000 resampled pairs, both sets are resampled from a sequence generated by concatenating the two input sets (-tests freqScoresGTT MC -- -MCstatScope setWide -numRandomSets 10000 -set1RandTrainFile _BOTH_ -set2RandTrainFile _BOTH_ -MCmodel bootstrap -v 0.2 -m 3 -pwmPC 0.01 -siteThresholdLearnedFrom 0.0001 nullTrainFile).</p>
         </sec>
         <sec>
            <st>
               <p>Exploratory data analysis with SADMAMA</p>
            </st>
            <p>We downloaded the set of "Phylogibbs PWM predictions" of Morozov and Siggia <abbrgrp><abbr bid="B31">31</abbr></abbrgrp>, which contains 79 predicted <it>S. cerevisiae </it>matrices. For each of these matrices SADMAMA looked for enrichment in site frequency in the set of 325 confirmed ARSs relative to the ARS-less <it>S. cerevisiae </it>intergenic file. The threshold was set to 0.05% relative to the standard <it>S. cerevisiae </it>intergenic background file, and a total pseudocount of 10% was added uniformly to all bases. The <it>p</it>-value of the FKH2 matrix is 4.7 &#215; 10<sup>-5 </sup>and the <it>p</it>-values for all other 78 matrices are > 10<sup>-3</sup>, which is insignificant when corrected for the multiple testing.</p>
         </sec>
      </sec>
      <sec>
         <st>
            <p>Authors' contributions</p>
         </st>
         <p>UK designed the statistical tests, drafted the manuscript, implemented the software and executed the tests on the biological data. HG and JG helped with implementing the software and executing the tests, AB helped with executing the tests and drafting the manuscript, IL and JD generated the biological data and helped drafting the manuscript, BKT helped conceive the experiments and drafting the manuscript. All authors read and approved the final manuscript.</p>
      </sec>
   </bdy>
   <bm>
      <ack>
         <sec>
            <st>
               <p>Acknowledgements</p>
            </st>
            <p>It is our pleasure to acknowledge Niranjan Nagarajan's DP code for the tied Mann-Whitney test, Patrick Ng's useful scripts for handling the data.</p>
            <p>This research uses computational resources funded by NIH grant 1S10RR020889 and is supported by the National Science Foundation Grant No. 0644136 to UK and by the National Institute of Health Grant No. GM072557 to BKT.</p>
         </sec>
      </ack>
      <refgrp>
         <bibl id="B1">
            <title>
               <p>Genome-wide hierarchy of replication origin usage in <it>Saccharomyces cerevisiae</it></p>
            </title>
            <aug>
               <au>
                  <snm>Donato</snm>
                  <fnm>JJ</fnm>
               </au>
               <au>
                  <snm>Chung</snm>
                  <fnm>SCC</fnm>
               </au>
               <au>
                  <snm>Tye</snm>
                  <fnm>BK</fnm>
               </au>
            </aug>
            <source>PLoS Genet</source>
            <pubdate>2006</pubdate>
            <volume>2</volume>
            <issue>9</issue>
            <fpage>e141</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1560401</pubid>
                  <pubid idtype="pmpid" link="fulltext">16965179</pubid>
                  <pubid idtype="doi">10.1371/journal.pgen.0020141</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B2">
            <title>
               <p>Forkhead genes in transcriptional silencing, cell morphology and the cell cycle. Overlapping and distinct functions for FKH1 and FKH2 in <it>Saccharomyces cerevisiae</it></p>
            </title>
            <aug>
               <au>
                  <snm>Hollenhorst</snm>
                  <fnm>PC</fnm>
               </au>
               <au>
                  <snm>Bose</snm>
                  <fnm>ME</fnm>
               </au>
               <au>
                  <snm>Mielke</snm>
                  <fnm>MR</fnm>
               </au>
               <au>
                  <snm>M?ller</snm>
                  <fnm>U</fnm>
               </au>
               <au>
                  <snm>Fox</snm>
                  <fnm>CA</fnm>
               </au>
            </aug>
            <source>Genetics</source>
            <pubdate>2000</pubdate>
            <volume>154</volume>
            <issue>4</issue>
            <fpage>1533</fpage>
            <lpage>1548</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1461039</pubid>
                  <pubid idtype="pmpid" link="fulltext">10747051</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B3">
            <title>
               <p>DNA binding sites: representation and discovery</p>
            </title>
            <aug>
               <au>
                  <snm>Stormo</snm>
                  <fnm>G</fnm>
               </au>
            </aug>
            <source>Bioinformatics</source>
            <pubdate>2000</pubdate>
            <volume>16</volume>
            <fpage>16</fpage>
            <lpage>23</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/bioinformatics/16.1.16</pubid>
                  <pubid idtype="pmpid" link="fulltext">10812473</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B4">
            <title>
               <p>MATCH: A tool for searching transcription factor binding sites in DNA sequences</p>
            </title>
            <aug>
               <au>
                  <snm>Kel</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>G&#246;ssling</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Reuter</snm>
                  <fnm>I</fnm>
               </au>
               <au>
                  <snm>Cheremushkin</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Kel-Margoulis</snm>
                  <fnm>O</fnm>
               </au>
               <au>
                  <snm>Wingender</snm>
                  <fnm>E</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>2003</pubdate>
            <volume>31</volume>
            <issue>13</issue>
            <fpage>3576</fpage>
            <lpage>9</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">169193</pubid>
                  <pubid idtype="pmpid" link="fulltext">12824369</pubid>
                  <pubid idtype="doi">10.1093/nar/gkg585</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B5">
            <title>
               <p>Using TESS to Predict Transcription Factor Binding Sites in DNA Sequence</p>
            </title>
            <aug>
               <au>
                  <snm>Schug</snm>
                  <fnm>J</fnm>
               </au>
            </aug>
            <source>Current Protocols in Bioinformatics</source>
            <publisher>J Wiley and Sons</publisher>
            <editor>Baxevanis AD</editor>
            <pubdate>2003</pubdate>
         </bibl>
         <bibl id="B6">
            <title>
               <p>Finding motifs in promoter regions</p>
            </title>
            <aug>
               <au>
                  <snm>Hertzberg</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Zuk</snm>
                  <fnm>O</fnm>
               </au>
               <au>
                  <snm>Getz</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Domany</snm>
                  <fnm>E</fnm>
               </au>
            </aug>
            <source>J Comput Biol</source>
            <pubdate>2005</pubdate>
            <volume>12</volume>
            <issue>3</issue>
            <fpage>314</fpage>
            <lpage>30</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1089/cmb.2005.12.314</pubid>
                  <pubid idtype="pmpid" link="fulltext">15857245</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B7">
            <title>
               <p>Computing exact P-values for DNA motifs</p>
            </title>
            <aug>
               <au>
                  <snm>Zhang</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Jiang</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Li</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Tromp</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Zhang</snm>
                  <fnm>X</fnm>
               </au>
               <au>
                  <snm>Zhang</snm>
                  <fnm>MQ</fnm>
               </au>
            </aug>
            <source>Bioinformatics</source>
            <pubdate>2007</pubdate>
            <volume>23</volume>
            <issue>5</issue>
            <fpage>531</fpage>
            <lpage>537</lpage>
            <url>http://bioinformatics.oxfordjournals.org/cgi/content/abstract/23/5/531</url>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/bioinformatics/btl662</pubid>
                  <pubid idtype="pmpid" link="fulltext">17237046</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B8">
            <title>
               <p>Detection of functional DNA motifs via statistical over-representation</p>
            </title>
            <aug>
               <au>
                  <snm>Frith</snm>
                  <fnm>MC</fnm>
               </au>
               <au>
                  <snm>Fu</snm>
                  <fnm>Y</fnm>
               </au>
               <au>
                  <snm>Yu</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Chen</snm>
                  <fnm>JF</fnm>
               </au>
               <au>
                  <snm>Hansen</snm>
                  <fnm>U</fnm>
               </au>
               <au>
                  <snm>Weng</snm>
                  <fnm>Z</fnm>
               </au>
            </aug>
            <source>Nucl Acids Res</source>
            <pubdate>2004</pubdate>
            <volume>32</volume>
            <issue>4</issue>
            <fpage>1372</fpage>
            <lpage>1381</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">390287</pubid>
                  <pubid idtype="pmpid" link="fulltext">14988425</pubid>
                  <pubid idtype="doi">10.1093/nar/gkh299</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B9">
            <title>
               <p>Genome-wide in silico identification of transcriptional regulators controlling the cell cycle in human cells</p>
            </title>
            <aug>
               <au>
                  <snm>Elkon</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Linhart</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Sharan</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Shamir</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Shiloh</snm>
                  <fnm>Y</fnm>
               </au>
            </aug>
            <source>Genome Res</source>
            <pubdate>2003</pubdate>
            <volume>13</volume>
            <issue>5</issue>
            <fpage>773</fpage>
            <lpage>80</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">430898</pubid>
                  <pubid idtype="pmpid" link="fulltext">12727897</pubid>
                  <pubid idtype="doi">10.1101/gr.947203</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B10">
            <title>
               <p>On counting position weight matrix matches in a sequence, with application to discriminative motif finding</p>
            </title>
            <aug>
               <au>
                  <snm>Sinha</snm>
                  <fnm>S</fnm>
               </au>
            </aug>
            <source>Bioinformatics</source>
            <pubdate>2006</pubdate>
            <volume>22</volume>
            <issue>14</issue>
            <fpage>e454</fpage>
            <lpage>63</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/bioinformatics/btl227</pubid>
                  <pubid idtype="pmpid" link="fulltext">16873507</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B11">
            <title>
               <p>Statistical tests to compare motif count exceptionalities</p>
            </title>
            <aug>
               <au>
                  <snm>Robin</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Schbath</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Vandewalle</snm>
                  <fnm>V</fnm>
               </au>
            </aug>
            <source>BMC Bioinformatics</source>
            <pubdate>2007</pubdate>
            <volume>8</volume>
            <fpage>84</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1838430</pubid>
                  <pubid idtype="pmpid" link="fulltext">17346349</pubid>
                  <pubid idtype="doi">10.1186/1471-2105-8-84</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B12">
            <title>
               <p>Cell cycle regulation of DNA replication</p>
            </title>
            <aug>
               <au>
                  <snm>Sclafani</snm>
                  <fnm>RA</fnm>
               </au>
               <au>
                  <snm>Holzen</snm>
                  <fnm>TM</fnm>
               </au>
            </aug>
            <source>Annu Rev Genet</source>
            <pubdate>2007</pubdate>
            <volume>41</volume>
            <fpage>237</fpage>
            <lpage>280</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">2292467</pubid>
                  <pubid idtype="pmpid" link="fulltext">17630848</pubid>
                  <pubid idtype="doi">10.1146/annurev.genet.41.110306.130308</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B13">
            <title>
               <p>Making sense of eukaryotic DNA replication origins</p>
            </title>
            <aug>
               <au>
                  <snm>Gilbert</snm>
                  <fnm>D</fnm>
               </au>
            </aug>
            <source>Science</source>
            <pubdate>2001</pubdate>
            <volume>294</volume>
            <issue>5540</issue>
            <fpage>96</fpage>
            <lpage>100</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1255916</pubid>
                  <pubid idtype="pmpid" link="fulltext">11588251</pubid>
                  <pubid idtype="doi">10.1126/science.1061724</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B14">
            <title>
               <p>Replication dynamics of the yeast genome</p>
            </title>
            <aug>
               <au>
                  <snm>Raghuraman</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Winzeler</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Collingwood</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Hunt</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Wodicka</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Conway</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Lockhart</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Davis</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Brewer</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Fangman</snm>
                  <fnm>W</fnm>
               </au>
            </aug>
            <source>Science</source>
            <pubdate>2001</pubdate>
            <volume>294</volume>
            <issue>5540</issue>
            <fpage>115</fpage>
            <lpage>21</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1126/science.294.5540.115</pubid>
                  <pubid idtype="pmpid" link="fulltext">11588253</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B15">
            <title>
               <p>Mapping of early firing origins on a replication profile of budding yeast</p>
            </title>
            <aug>
               <au>
                  <snm>Yabuki</snm>
                  <fnm>N</fnm>
               </au>
               <au>
                  <snm>Terashima</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Kitada</snm>
                  <fnm>K</fnm>
               </au>
            </aug>
            <source>Genes Cells</source>
            <pubdate>2002</pubdate>
            <volume>7</volume>
            <issue>8</issue>
            <fpage>781</fpage>
            <lpage>9</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1046/j.1365-2443.2002.00559.x</pubid>
                  <pubid idtype="pmpid" link="fulltext">12167157</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B16">
            <title>
               <p>Genome-wide distribution of ORC and MCM proteins in <it>S. cerevisiae</it>: high-resolution mapping of replication origins</p>
            </title>
            <aug>
               <au>
                  <snm>Wyrick</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Aparicio</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Chen</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Barnett</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Jennings</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Young</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Bell</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Aparicio</snm>
                  <fnm>O</fnm>
               </au>
            </aug>
            <source>Science</source>
            <pubdate>2001</pubdate>
            <volume>294</volume>
            <issue>5550</issue>
            <fpage>2357</fpage>
            <lpage>60</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1126/science.1066101</pubid>
                  <pubid idtype="pmpid" link="fulltext">11743203</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B17">
            <title>
               <p>ATP-dependent recognition of eukaryotic origins of DNA replication by a multiprotein complex</p>
            </title>
            <aug>
               <au>
                  <snm>Bell</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Stillman</snm>
                  <fnm>B</fnm>
               </au>
            </aug>
            <source>Nature</source>
            <pubdate>1992</pubdate>
            <volume>357</volume>
            <issue>6374</issue>
            <fpage>128</fpage>
            <lpage>34</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/357128a0</pubid>
                  <pubid idtype="pmpid" link="fulltext">1579162</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B18">
            <title>
               <p>The spatial arrangement of ORC binding modules determines the functionality of replication origins in budding yeast</p>
            </title>
            <aug>
               <au>
                  <snm>Bolon</snm>
                  <fnm>YT</fnm>
               </au>
               <au>
                  <snm>Bielinsky</snm>
                  <fnm>AK</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>2006</pubdate>
            <volume>34</volume>
            <issue>18</issue>
            <fpage>5069</fpage>
            <lpage>5080</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1635292</pubid>
                  <pubid idtype="pmpid" link="fulltext">16984967</pubid>
                  <pubid idtype="doi">10.1093/nar/gkl661</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B19">
            <title>
               <p>Fast index based algorithms and software for matching position specific scoring matrices</p>
            </title>
            <aug>
               <au>
                  <snm>Beckstette</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Homann</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Giegerich</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Kurtz</snm>
                  <fnm>S</fnm>
               </au>
            </aug>
            <source>BMC Bioinformatics</source>
            <pubdate>2006</pubdate>
            <volume>7</volume>
            <fpage>389</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1635428</pubid>
                  <pubid idtype="pmpid" link="fulltext">16930469</pubid>
                  <pubid idtype="doi">10.1186/1471-2105-7-389</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B20">
            <title>
               <p>An Efficient, Minimal-Storage Procedure for Calculating the Mann-Whitney U, Generalized U and Similar Distributions</p>
            </title>
            <aug>
               <au>
                  <snm>Harding</snm>
                  <fnm>EF</fnm>
               </au>
            </aug>
            <source>Applied Statistics</source>
            <pubdate>1984</pubdate>
            <volume>33</volume>
            <fpage>1</fpage>
            <lpage>6</lpage>
            <xrefbib>
               <pubid idtype="doi">10.2307/2347656</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B21">
            <title>
               <p>The Bootstrap Method for standard errors, confidence intervals, and other measures of statistical accuracy</p>
            </title>
            <aug>
               <au>
                  <snm>Efron</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Tibshirani</snm>
                  <fnm>R</fnm>
               </au>
            </aug>
            <source>Statistical Science</source>
            <pubdate>1986</pubdate>
            <volume>1</volume>
            <fpage>1</fpage>
            <lpage>35</lpage>
         </bibl>
         <bibl id="B22">
            <title>
               <p>Mcm1 binds replication origins</p>
            </title>
            <aug>
               <au>
                  <snm>Chang</snm>
                  <fnm>VK</fnm>
               </au>
               <au>
                  <snm>Fitch</snm>
                  <fnm>MJ</fnm>
               </au>
               <au>
                  <snm>Donato</snm>
                  <fnm>JJ</fnm>
               </au>
               <au>
                  <snm>Christensen</snm>
                  <fnm>TW</fnm>
               </au>
               <au>
                  <snm>Merchant</snm>
                  <fnm>AM</fnm>
               </au>
               <au>
                  <snm>Tye</snm>
                  <fnm>BK</fnm>
               </au>
            </aug>
            <source>J Biol Chem</source>
            <pubdate>2003</pubdate>
            <volume>278</volume>
            <issue>8</issue>
            <fpage>6093</fpage>
            <lpage>100</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1074/jbc.M209827200</pubid>
                  <pubid idtype="pmpid" link="fulltext">12473677</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B23">
            <title>
               <p>Mcm1 promotes replication initiation by binding specific elements at replication origins</p>
            </title>
            <aug>
               <au>
                  <snm>Chang</snm>
                  <fnm>VK</fnm>
               </au>
               <au>
                  <snm>Donato</snm>
                  <fnm>JJ</fnm>
               </au>
               <au>
                  <snm>Chan</snm>
                  <fnm>CS</fnm>
               </au>
               <au>
                  <snm>Tye</snm>
                  <fnm>BK</fnm>
               </au>
            </aug>
            <source>Mol Cell Biol</source>
            <pubdate>2004</pubdate>
            <volume>24</volume>
            <issue>14</issue>
            <fpage>6514</fpage>
            <lpage>24</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">434236</pubid>
                  <pubid idtype="pmpid" link="fulltext">15226450</pubid>
                  <pubid idtype="doi">10.1128/MCB.24.14.6514-6524.2004</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B24">
            <title>
               <p>Mcm7, a subunit of the presumptive MCM helicase, modulates its own expression in conjunction with Mcm1</p>
            </title>
            <aug>
               <au>
                  <snm>Fitch</snm>
                  <fnm>MJ</fnm>
               </au>
               <au>
                  <snm>Donato</snm>
                  <fnm>JJ</fnm>
               </au>
               <au>
                  <snm>Tye</snm>
                  <fnm>BK</fnm>
               </au>
            </aug>
            <source>J Biol Chem</source>
            <pubdate>2003</pubdate>
            <volume>278</volume>
            <issue>28</issue>
            <fpage>25408</fpage>
            <lpage>16</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1074/jbc.M300699200</pubid>
                  <pubid idtype="pmpid" link="fulltext">12738768</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B25">
            <title>
               <p>Mutants of <it>S. cerevisiae</it> defective in the maintenance of minichromosomes</p>
            </title>
            <aug>
               <au>
                  <snm>Maine</snm>
                  <fnm>GT</fnm>
               </au>
               <au>
                  <snm>Sinha</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Tye</snm>
                  <fnm>BK</fnm>
               </au>
            </aug>
            <source>Genetics</source>
            <pubdate>1984</pubdate>
            <volume>106</volume>
            <issue>3</issue>
            <fpage>365</fpage>
            <lpage>385</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1224244</pubid>
                  <pubid idtype="pmpid" link="fulltext">6323245</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B26">
            <title>
               <p>The requirement of yeast replication origins for pre-replication complex proteins is modulated by transcription</p>
            </title>
            <aug>
               <au>
                  <snm>Nieduszynski</snm>
                  <fnm>CA</fnm>
               </au>
               <au>
                  <snm>Blow</snm>
                  <fnm>JJ</fnm>
               </au>
               <au>
                  <snm>Donaldson</snm>
                  <fnm>AD</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>2005</pubdate>
            <volume>33</volume>
            <issue>8</issue>
            <fpage>2410</fpage>
            <lpage>20</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1087785</pubid>
                  <pubid idtype="pmpid" link="fulltext">15860777</pubid>
                  <pubid idtype="doi">10.1093/nar/gki539</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B27">
            <title>
               <p>Purification of a yeast protein that binds to origins of DNA replication and a transcriptional silencer</p>
            </title>
            <aug>
               <au>
                  <snm>Diffley</snm>
                  <fnm>JF</fnm>
               </au>
               <au>
                  <snm>Stillman</snm>
                  <fnm>B</fnm>
               </au>
            </aug>
            <source>Proc Natl Acad Sci USA</source>
            <pubdate>1988</pubdate>
            <volume>85</volume>
            <issue>7</issue>
            <fpage>2120</fpage>
            <lpage>2124</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">279940</pubid>
                  <pubid idtype="pmpid" link="fulltext">3281162</pubid>
                  <pubid idtype="doi">10.1073/pnas.85.7.2120</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B28">
            <title>
               <p>Specific interaction between a <it>Saccharomyces cerevisiae</it> protein and a DNA element associated with certain autonomously replicating sequences</p>
            </title>
            <aug>
               <au>
                  <snm>Eisenberg</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Civalier</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Tye</snm>
                  <fnm>BK</fnm>
               </au>
            </aug>
            <source>Proc Natl Acad Sci USA</source>
            <pubdate>1988</pubdate>
            <volume>85</volume>
            <issue>3</issue>
            <fpage>743</fpage>
            <lpage>746</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">279631</pubid>
                  <pubid idtype="pmpid" link="fulltext">3277180</pubid>
                  <pubid idtype="doi">10.1073/pnas.85.3.743</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B29">
            <title>
               <p>Nucleosomes positioned by ORC facilitate the initiation of DNA replication</p>
            </title>
            <aug>
               <au>
                  <snm>Lipford</snm>
                  <fnm>JR</fnm>
               </au>
               <au>
                  <snm>Bell</snm>
                  <fnm>SP</fnm>
               </au>
            </aug>
            <source>Mol Cell</source>
            <pubdate>2001</pubdate>
            <volume>7</volume>
            <fpage>21</fpage>
            <lpage>30</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/S1097-2765(01)00151-4</pubid>
                  <pubid idtype="pmpid" link="fulltext">11172708</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B30">
            <title>
               <p>OriDB: a DNA replication origin database</p>
            </title>
            <aug>
               <au>
                  <snm>Nieduszynski</snm>
                  <fnm>CA</fnm>
               </au>
               <au>
                  <snm>ichiro Hiraga</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Ak</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Benham</snm>
                  <fnm>CJ</fnm>
               </au>
               <au>
                  <snm>Donaldson</snm>
                  <fnm>AD</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>2007</pubdate>
            <issue>35 Database</issue>
            <fpage>D40</fpage>
            <lpage>D46</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1781122</pubid>
                  <pubid idtype="pmpid" link="fulltext">17065467</pubid>
                  <pubid idtype="doi">10.1093/nar/gkl758</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B31">
            <title>
               <p>Connecting protein structure with predictions of regulatory sites</p>
            </title>
            <aug>
               <au>
                  <snm>Morozov</snm>
                  <fnm>AV</fnm>
               </au>
               <au>
                  <snm>Siggia</snm>
                  <fnm>ED</fnm>
               </au>
            </aug>
            <source>Proc Natl Acad Sci USA</source>
            <pubdate>2007</pubdate>
            <volume>104</volume>
            <issue>17</issue>
            <fpage>7068</fpage>
            <lpage>7073</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1855371</pubid>
                  <pubid idtype="pmpid" link="fulltext">17438293</pubid>
                  <pubid idtype="doi">10.1073/pnas.0701356104</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B32">
            <title>
               <p>Molecular determinants of the cell-cycle regulated Mcm1p-Fkh2p transcription factor complex</p>
            </title>
            <aug>
               <au>
                  <snm>Boros</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Lim</snm>
                  <fnm>FL</fnm>
               </au>
               <au>
                  <snm>Darieva</snm>
                  <fnm>Z</fnm>
               </au>
               <au>
                  <snm>Pic-Taylor</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Harman</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Morgan</snm>
                  <fnm>BA</fnm>
               </au>
               <au>
                  <snm>Sharrocks</snm>
                  <fnm>AD</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>2003</pubdate>
            <volume>31</volume>
            <issue>9</issue>
            <fpage>2279</fpage>
            <lpage>88</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">154233</pubid>
                  <pubid idtype="pmpid" link="fulltext">12711672</pubid>
                  <pubid idtype="doi">10.1093/nar/gkg347</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B33">
            <title>
               <p>Transcriptional regulatory code of a eukaryotic genome</p>
            </title>
            <aug>
               <au>
                  <snm>Harbison</snm>
                  <fnm>CDG</fnm>
               </au>
               <etal/>
            </aug>
            <source>Nature</source>
            <pubdate>2004</pubdate>
            <volume>431</volume>
            <issue>7004</issue>
            <fpage>99</fpage>
            <lpage>104</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/nature02800</pubid>
                  <pubid idtype="pmpid" link="fulltext">15343339</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B34">
            <title>
               <p>Yeast origin recognition complex functions in transcription silencing and DNA replication</p>
            </title>
            <aug>
               <au>
                  <snm>Bell</snm>
                  <fnm>SP</fnm>
               </au>
               <au>
                  <snm>Kobayashi</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Stillman</snm>
                  <fnm>B</fnm>
               </au>
            </aug>
            <source>Science</source>
            <pubdate>1993</pubdate>
            <volume>262</volume>
            <issue>5141</issue>
            <fpage>1844</fpage>
            <lpage>1849</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1126/science.8266072</pubid>
                  <pubid idtype="pmpid" link="fulltext">8266072</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B35">
            <title>
               <p>Origin recognition complex (ORC) in transcriptional silencing and DNA replication in <it>S. cerevisiae</it></p>
            </title>
            <aug>
               <au>
                  <snm>Foss</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>McNally</snm>
                  <fnm>FJ</fnm>
               </au>
               <au>
                  <snm>Laurenson</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Rine</snm>
                  <fnm>J</fnm>
               </au>
            </aug>
            <source>Science</source>
            <pubdate>1993</pubdate>
            <volume>262</volume>
            <issue>5141</issue>
            <fpage>1838</fpage>
            <lpage>1844</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1126/science.8266071</pubid>
                  <pubid idtype="pmpid" link="fulltext">8266071</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B36">
            <title>
               <p>Differential DNA affinity specifies roles for the origin recognition complex in budding yeast heterochromatin</p>
            </title>
            <aug>
               <au>
                  <snm>DeBeer</snm>
                  <fnm>MAP</fnm>
               </au>
               <au>
                  <snm>Muller</snm>
                  <fnm>U</fnm>
               </au>
               <au>
                  <snm>Fox</snm>
                  <fnm>CA</fnm>
               </au>
            </aug>
            <source>Genes Dev</source>
            <pubdate>2003</pubdate>
            <volume>17</volume>
            <issue>15</issue>
            <fpage>1817</fpage>
            <lpage>1822</lpage>
            <url>http://dx.doi.org/10.1101/gad.1096703</url>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">196224</pubid>
                  <pubid idtype="pmpid" link="fulltext">12897051</pubid>
                  <pubid idtype="doi">10.1101/gad.1096703</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B37">
            <title>
               <p>Fast and Accurate Computation of Binomial Probabilities</p>
            </title>
            <aug>
               <au>
                  <snm>Loader</snm>
                  <fnm>C</fnm>
               </au>
            </aug>
            <pubdate>2000</pubdate>
            <url>http://citeseer.ist.psu.edu/312695.html</url>
         </bibl>
         <bibl id="B38">
            <title>
               <p>"Saccharomyces Genome Database"</p>
            </title>
            <aug>
               <au>
                  <cnm>SGD project</cnm>
               </au>
            </aug>
            <url>http://www.yeastgenome.org/</url>
         </bibl>
         <bibl id="B39">
            <title>
               <p>Prediction of <it>Saccharomyces cerevisiae</it> replication origins</p>
            </title>
            <aug>
               <au>
                  <snm>Breier</snm>
                  <fnm>AM</fnm>
               </au>
               <au>
                  <snm>Chatterji</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Cozzarelli</snm>
                  <fnm>NR</fnm>
               </au>
            </aug>
            <source>Genome Biol</source>
            <pubdate>2004</pubdate>
            <volume>5</volume>
            <issue>4</issue>
            <fpage>R22</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">395781</pubid>
                  <pubid idtype="pmpid" link="fulltext">15059255</pubid>
                  <pubid idtype="doi">10.1186/gb-2004-5-4-r22</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B40">
            <title>
               <p>TRANSFAC: a database on transcription factors and their DNA binding sites</p>
            </title>
            <aug>
               <au>
                  <snm>Wingender</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Dietze</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Karas</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Kn&#252;ppel</snm>
                  <fnm>R</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>1996</pubdate>
            <volume>24</volume>
            <fpage>238</fpage>
            <lpage>241</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">145586</pubid>
                  <pubid idtype="pmpid" link="fulltext">8594589</pubid>
                  <pubid idtype="doi">10.1093/nar/24.1.238</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
      </refgrp>
   </bm>
</art>
