<?xml version='1.0'?>
<!DOCTYPE art SYSTEM 'http://www.biomedcentral.com/xml/article.dtd'>
<art>
   <ui>1471-2105-9-167</ui>
   <ji>1471-2105</ji>
   <fm>
      <dochead>Methodology article</dochead>
      <bibl>
         <title>
            <p>Efficient computation of absent words in genomic sequences</p>
         </title>
         <aug>
            <au id="A1">
               <snm>Herold</snm>
               <fnm>Julia</fnm>
               <insr iid="I1"/>
               <email>jherold@cebitec.uni-bielefeld.de</email>
            </au>
            <au id="A2">
               <snm>Kurtz</snm>
               <fnm>Stefan</fnm>
               <insr iid="I2"/>
               <email>kurtz@zbh.uni-hamburg.de</email>
            </au>
            <au id="A3" ca="yes">
               <snm>Giegerich</snm>
               <fnm>Robert</fnm>
               <insr iid="I1"/>
               <email>robert@techfak.uni-bielefeld.de</email>
            </au>
         </aug>
         <insg>
            <ins id="I1">
               <p>Center of Biotechnology, Bielefeld University, Postfach 10 01 31, 33501 Bielefeld, Germany</p>
            </ins>
            <ins id="I2">
               <p>Center for Bioinformatics, University of Hamburg, Bundesstrasse 43, 20146 Hamburg, Germany</p>
            </ins>
         </insg>
         <source>BMC Bioinformatics</source>
         <issn>1471-2105</issn>
         <pubdate>2008</pubdate>
         <volume>9</volume>
         <issue>1</issue>
         <fpage>167</fpage>
         <url>http://www.biomedcentral.com/1471-2105/9/167</url>
         <xrefbib>
            <pubidlist>
               <pubid idtype="pmpid">18366790</pubid>
               <pubid idtype="doi">10.1186/1471-2105-9-167</pubid>
            </pubidlist>
         </xrefbib>
      </bibl>
      <history>
         <rec>
            <date>
               <day>08</day>
               <month>11</month>
               <year>2007</year>
            </date>
         </rec>
         <acc>
            <date>
               <day>26</day>
               <month>3</month>
               <year>2008</year>
            </date>
         </acc>
         <pub>
            <date>
               <day>26</day>
               <month>3</month>
               <year>2008</year>
            </date>
         </pub>
      </history>
      <cpyrt>
         <year>2008</year>
         <collab>Herold et al; licensee BioMed Central Ltd.</collab>
         <note>This is an Open Access article distributed under the terms of the Creative Commons Attribution License (<url>http://creativecommons.org/licenses/by/2.0</url>), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</note>
      </cpyrt>
      <abs>
         <sec>
            <st>
               <p>Abstract</p>
            </st>
            <sec>
               <st>
                  <p>Background</p>
               </st>
               <p>Analysis of sequence composition is a routine task in genome research. Organisms are characterized by their base composition, dinucleotide relative abundance, codon usage, and so on. Unique subsequences are markers of special interest in genome comparison, expression profiling, and genetic engineering. Relative to a random sequence of the same length, unique subsequences are overrepresented in real genomes. Shortest words <it>absent </it>from a genome have been addressed in two recent studies.</p>
            </sec>
            <sec>
               <st>
                  <p>Results</p>
               </st>
               <p>We describe a new algorithm and software for the computation of absent words. It is more efficient than previous algorithms and easier to use. It directly computes unwords without the need to specify a length estimate. Moreover, it avoids the space requirements of index structures such as suffix trees and suffix arrays. Our implementation is available as an open source package. We compute unwords of human and mouse as well as some other organisms, covering a genome size range from 10<sup>9 </sup>down to 10<sup>5 </sup>bp.</p>
            </sec>
            <sec>
               <st>
                  <p>Conclusion</p>
               </st>
               <p>The new algorithm computes absent words for the human genome in 10 minutes on standard hardware, using only 2.5 Mb of space. This enables us to perform this type of analysis not only for the largest genomes available so far, but also for the emerging pan- and meta-genome data.</p>
            </sec>
         </sec>
      </abs>
   </fm>
   <bdy>
      <sec>
         <st>
            <p>Background</p>
         </st>
         <sec>
            <st>
               <p>Sequence statistics and unique substrings</p>
            </st>
            <p>Word statistics is a traditional field of genome research. For word-length 1, GC-content is a basic characteristic noted for each organism, and dinucleotide relative abundance profiles provide a reliable genomic signature <abbrgrp><abbr bid="B1">1</abbr></abbrgrp>. Dinucleotide content also distinguishes natural RNA from random sequences <abbrgrp><abbr bid="B2">2</abbr></abbrgrp>. Trinucleotide (codon) usage can reliably predict bacterial genes <abbrgrp><abbr bid="B3">3</abbr></abbrgrp> even in the presence of horizontal gene transfer. Short palindromic words mark the characteristic sites of restriction enzymes in bacteria, and are therefore <it>under </it>represented in bacterial genomes <abbrgrp><abbr bid="B4">4</abbr></abbrgrp>. A theory of <it>over</it>- as well as <it>under</it>-represented words has been laid out in <abbrgrp><abbr bid="B5">5</abbr><abbr bid="B6">6</abbr></abbrgrp>.</p>
            <p><it>Unique </it>words are of particular interest. They provide sequence signatures, and microarray probes are often designed to match them. Unique sequences from several genomes exhibiting a perfect match serve as reliable anchors in a multiple genome alignment <abbrgrp><abbr bid="B7">7</abbr></abbrgrp>. Recently, Haubold et al. <abbrgrp><abbr bid="B8">8</abbr></abbrgrp> addressed the problem of efficiently computing shortest unique substrings (using their terminology) in a sequence, and provided a program called SHUSTRING for this purpose. Using this program, they found that there is typically much more unique sequence in a genome than one would expect in a random sequence of the same length. While this observation by itself is not a surprise, given the repetitive nature of genomes, their approach and software allows to quantify this fact. Furthermore, they found unique words to be significantly clustered in upstream regions of genes in human and mouse.</p>
         </sec>
         <sec>
            <st>
               <p>Absent words</p>
            </st>
            <p>One may take such investigations farther and investigate words that do <it>not </it>occur in a genome. We suggest the term "unwords" for shortest words from the underlying alphabet that do not show up in a given sequence.</p>
            <p>A first approach at the unwords problem was recently presented by Hampikian and Andersen <abbrgrp><abbr bid="B9">9</abbr></abbrgrp>. Their motivation was to "discover the constraints on natural DNA and protein sequences". However, there is no evidence that such constraints exist. The absence of certain shortest words in a sequence data base, no matter what (finite) size it has, is a mathematical necessity. Speculations about negative selection against certain words have been refuted convincingly in <abbrgrp><abbr bid="B10">10</abbr></abbrgrp>. There, it is shown that human unwords computed in <abbrgrp><abbr bid="B9">9</abbr></abbrgrp> can be explained by a mutational bias rather than negative selection.</p>
            <p>Still, there is twofold interest in the capability of efficiently computing unwords.(1) Statistically, it is interesting to see how length and number of unwords in a given genome deviates from expectation in random sequences. (2) Practically, it is useful to know all the unwords when a genome or chromosome is to be extended by insertion of foreign DNA. Combinations of unwords can directly serve as tags that are guaranteed to be unique in the modified DNA sequence.</p>
         </sec>
         <sec>
            <st>
               <p>Software for unwords computation</p>
            </st>
            <p>Unfortunately, the software presented in <abbrgrp><abbr bid="B9">9</abbr></abbrgrp> is slow and difficult to use: It reads Genbank files rather than the more space efficient Fasta format &#8211; and space matters a lot when dealing with genomes as large as human and mouse. It runs an internal conversion routine for over 50 minutes before starting unwords computation. The program generates an excessive number of files that may break your file systems. The C code is platform dependent and internal constants must be adapted. Finally, the human unwords data computed with the program according to <abbrgrp><abbr bid="B9">9</abbr></abbrgrp> appear to be incomplete (and hence incorrect).</p>
            <p>In order to make unwords computation possible in an efficient and reliable way, we present here a new algorithm and the software implementing it. Efficient computation of unwords can be done from an index data structure such as a suffix tree or an (enhanced) suffix array <abbrgrp><abbr bid="B11">11</abbr></abbrgrp>. For example, in <abbrgrp><abbr bid="B8">8</abbr></abbrgrp> a suffix tree was used to compute unique substrings. In fact, our first unwords-program was an extension to the VMATCH software <abbrgrp><abbr bid="B12">12</abbr></abbrgrp>, which is based on enhanced suffix arrays. However, index data structures must be built in memory and are space-consuming. Hence, we developed a direct approach that works more efficiently, because the overall sequence need not be kept in main memory. Computing the unwords of the human genome, for example, takes about 10 minutes computation time on a Linux PC with a single 2.4 MHz CPU. The space requirement is 2.5 megabytes.</p>
            <p>In this article, we describe the new program UNWORDS and report its application to the genomes of human, mouse, and other organisms, covering a genome size range from 10<sup>9 </sup>down to 10<sup>5 </sup>bp.</p>
         </sec>
      </sec>
      <sec>
         <st>
            <p>Results</p>
         </st>
         <sec>
            <st>
               <p>Problem statement</p>
            </st>
            <p>Let &#931; be a finite alphabet of at least two letters. Let |&#931;| denote the cardinality of &#931;. In genome analysis, &#931; = {a, c, g, t} and |&#931;| = 4. A word is a sequence of letters from the alphabet. The terms "word" and "sequence" are equivalent, but are used here to indicate that a word is short and a sequence is long. |<it>w</it>| denotes the length of a word. If |<it>w</it>| = <it>q</it>, we speak of a <it>q</it>-word.</p>
            <p>A word <it>w </it>over &#931; is an <it>unword </it>of a sequence <it>G </it>if (1) it does not occur as a substring of <it>G</it>, and (2) all words over &#931; shorter than <it>w </it>do occur in <it>G</it>. Note that the unword length is uniquely defined for a given genome <it>G</it>.</p>
            <p>The built-in minimality requirement in this definition is motivated by the fact that when <it>w </it>is an unword of length <it>q </it>in <it>G</it>, it has 2|&#931;| one-letter extensions that also do not occur in <it>G</it>. Therefore, asking for missing words longer than <it>q </it>would introduce a substantial proportion of redundant results.</p>
            <p>Similar to shortest unique substrings, the length of unwords is expected to increase with genome size. For fixed unword length, the number of unwords is expected to decrease while |<it>G</it>| increases. Given <it>G</it>, let <it>q </it>be the unword length. It is easy to see that 1 &#8804; <it>q</it>. To derive an upper bound on <it>q</it>, let <it>w </it>be a shortest unique substring in <it>G </it>and let &#8467; = |<it>w</it>|. Consider the following cases:</p>
            <p>&#8226; If |<it>w</it>| = |<it>G</it>|, then for any <it>a </it>&#8712; &#931;, <it>wa </it>is an unword. Hence <it>q </it>&#8804; |<it>wa</it>| = &#8467; + 1.</p>
            <p>&#8226; If |<it>w</it>| &lt; |<it>G</it>| and <it>w </it>is not a suffix of <it>G</it>, then <it>wa </it>occurs in <it>G </it>for exactly one letter <it>a</it>. Hence <it>wb </it>for any <it>b </it>&#8712; &#931;\{<it>a</it>} is an unword. This implies <it>q </it>&#8804; |<it>wb</it>| = &#8467; + 1.</p>
            <p>&#8226; If |<it>w</it>| &lt; |<it>G</it>| and <it>w </it>is not a prefix of <it>G</it>, then aw occurs in <it>G </it>for exactly one letter <it>a</it>. Hence <it>bw </it>for any <it>b </it>&#8712; &#931;\{<it>a</it>} is an unword. This implies <it>q </it>&#8804; |<it>wb</it>| = &#8467; + 1.</p>
            <p>Thus we conclude 1 &#8804; <it>q </it>&#8804; &#8467; + 1.</p>
            <p>The problem of <it>unword analysis </it>of a given sequence <it>G </it>(typically a complete genome) is to determine all unwords of <it>G</it>. The double-stranded nature of DNA lets unwords always show up in complementary pairs, as each word present implies the presence of its Watson-Crick complement on the opposite strand. Sometimes, however, an unword is self-complementary, and hence a "pair" represents only a single word. Therefore, we report unword numbers rather than numbers of pairs (in contrast to <abbrgrp><abbr bid="B8">8</abbr></abbrgrp>).</p>
            <p>Computation of <it>q</it>-word statistics for small <it>q </it>is straightforward. Efficient computation of unwords when <it>q </it>is unknown, however, requires more advanced techniques. Our unword analysis algorithm is described in the section on computational methods.</p>
         </sec>
         <sec>
            <st>
               <p>Unword statistics</p>
            </st>
            <p>The unword analysis problem is mathematically well defined. Unwords must exist for any sequence. The interesting question is their size and number, compared to what one would expect given the alphabet size and the length of <it>G</it>.</p>
            <p>Let <it>w </it>be a word of length |<it>w</it>|, <it>w </it>[<it>i</it>] the <it>i</it>-th letter in <it>w</it>, <it>G </it>a genomic sequence and &#8473;[<it>w </it>[<it>i</it>]] the relative frequency of nucleotide <it>w </it>[<it>i</it>] in <it>G</it>. The probability for <it>w </it>to occur by chance (i.e. at a fixed position in a random sequence <it>s </it>of the same composition and length as <it>G</it>) is then <inline-formula><m:math name="1471-2105-9-167-i2" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mrow><m:mi>&#8473;</m:mi><m:mo stretchy="false">[</m:mo><m:mi>w</m:mi><m:mo stretchy="false">]</m:mo><m:mo>=</m:mo><m:mstyle displaystyle="true"><m:msubsup><m:mo>&#8719;</m:mo><m:mrow><m:mi>i</m:mi><m:mo>=</m:mo><m:mn>1</m:mn></m:mrow><m:mrow><m:mrow><m:mo>|</m:mo><m:mi>w</m:mi><m:mo>|</m:mo></m:mrow></m:mrow></m:msubsup><m:mrow><m:mi>&#8473;</m:mi><m:mo stretchy="false">[</m:mo><m:mi>w</m:mi><m:mo stretchy="false">[</m:mo><m:mi>i</m:mi><m:mo stretchy="false">]</m:mo><m:mo stretchy="false">]</m:mo></m:mrow></m:mstyle></m:mrow><m:annotation encoding="MathType-MTEF">
MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xH8viVGI8Gi=hEeeu0xXdbba9frFj0xb9qqpG0dXdb9aspeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGaciGaaiaabeqaaeqabiWaaaGcbaWefv3ySLgznfgDOjdaryqr1ngBPrginfgDObcv39gaiqaacqWFzecucqGGBbWwcqWG3bWDcqGGDbqxcqGH9aqpdaqeWaqaaiab=LriqjabcUfaBjabdEha3jabcUfaBjabdMgaPjabc2faDjabc2faDbWcbaGaemyAaKMaeyypa0JaeGymaedabaWaaqWaaeaacqWG3bWDaiaawEa7caGLiWoaa0Gaey4dIunaaaa@4E3F@</m:annotation></m:semantics></m:math></inline-formula>. The expectation value for (the number of occurrences of) <it>w </it>in <it>s </it>is <inline-formula><m:math name="1471-2105-9-167-i3" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mi mathvariant="double-struck">E</m:mi><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xH8viVGI8Gi=hEeeu0xXdbba9frFj0xb9qqpG0dXdb9aspeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGaciGaaiaabeqaaeqabiWaaaGcbaWefv3ySLgznfgDOjdaryqr1ngBPrginfgDObcv39gaiqaacqWFecFraaa@37B5@</m:annotation></m:semantics></m:math></inline-formula>[<it>w in s</it>] &#8776; &#8473;[<it>w</it>]&#183;|<it>G</it>|.</p>
            <p>Calculating the probability for a word <it>not </it>to occur in a specific sequence is quite difficult and not much literature is available. Following Rahmann et al. <abbrgrp><abbr bid="B13">13</abbr></abbrgrp>, a good approximation of the probability can be given using the expectation value. A Poisson Distribution is expected for word counts in a genomic sequence, which is <inline-formula><m:math name="1471-2105-9-167-i4" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mrow><m:mi>&#8473;</m:mi><m:mo stretchy="false">[</m:mo><m:msub><m:mi>X</m:mi><m:mi>w</m:mi></m:msub><m:mo>=</m:mo><m:mi>k</m:mi><m:mo stretchy="false">]</m:mo><m:mo>=</m:mo><m:mfrac><m:mrow><m:mi>&#955;</m:mi><m:msup><m:mrow><m:mo stretchy="false">(</m:mo><m:mi>w</m:mi><m:mo stretchy="false">)</m:mo></m:mrow><m:mi>k</m:mi></m:msup></m:mrow><m:mrow><m:mi>k</m:mi><m:mo>!</m:mo></m:mrow></m:mfrac><m:mo>&#8901;</m:mo><m:msup><m:mi>e</m:mi><m:mrow><m:mo>&#8722;</m:mo><m:mi>&#955;</m:mi><m:mo stretchy="false">(</m:mo><m:mi>w</m:mi><m:mo stretchy="false">)</m:mo></m:mrow></m:msup></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xH8viVGI8Gi=hEeeu0xXdbba9frFj0xb9qqpG0dXdb9aspeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGaciGaaiaabeqaaeqabiWaaaGcbaWefv3ySLgznfgDOjdaryqr1ngBPrginfgDObcv39gaiqaacqWFzecucqGGBbWwcqWGybawdaWgaaWcbaGaem4DaChabeaakiabg2da9iabdUgaRjabc2faDjabg2da9KqbaoaalaaabaGaeq4UdWMaeiikaGIaem4DaCNaeiykaKYaaWbaaeqabaGaem4AaSgaaaqaaiabdUgaRjabcgcaHaaakiabgwSixlabdwgaLnaaCaaaleqabaGaeyOeI0Iaeq4UdWMaeiikaGIaem4DaCNaeiykaKcaaaaa@521A@</m:annotation></m:semantics></m:math></inline-formula> with <it>&#955;</it>(<it>w</it>) = <inline-formula><m:math name="1471-2105-9-167-i3" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mi mathvariant="double-struck">E</m:mi><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xH8viVGI8Gi=hEeeu0xXdbba9frFj0xb9qqpG0dXdb9aspeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGaciGaaiaabeqaaeqabiWaaaGcbaWefv3ySLgznfgDOjdaryqr1ngBPrginfgDObcv39gaiqaacqWFecFraaa@37B5@</m:annotation></m:semantics></m:math></inline-formula>[<it>w in s</it>], and <it>k </it>the number of occurrences of the word <it>w</it>. Now let <it>k </it>= 0. Then</p>
            <p>
               <display-formula id="M1">
                  <m:math name="1471-2105-9-167-i5" xmlns:m="http://www.w3.org/1998/Math/MathML">
                     <m:semantics>
                        <m:mrow>
                           <m:mi>&#8473;</m:mi>
                           <m:mo stretchy="false">[</m:mo>
                           <m:msub>
                              <m:mi>X</m:mi>
                              <m:mi>w</m:mi>
                           </m:msub>
                           <m:mo>=</m:mo>
                           <m:mn>0</m:mn>
                           <m:mo stretchy="false">]</m:mo>
                           <m:mo>=</m:mo>
                           <m:mn>1</m:mn>
                           <m:mo>&#8901;</m:mo>
                           <m:msup>
                              <m:mi>e</m:mi>
                              <m:mrow>
                                 <m:mo>&#8722;</m:mo>
                                 <m:mi>&#955;</m:mi>
                                 <m:mo stretchy="false">(</m:mo>
                                 <m:mi>w</m:mi>
                                 <m:mo stretchy="false">)</m:mo>
                              </m:mrow>
                           </m:msup>
                        </m:mrow>
                        <m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xI8qiVKYPFjYdHaVhbbf9v8qqaqFr0xc9vqFj0dXdbba91qpepeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGaciGaaiaabeqaaeqabiWaaaGcbaWefv3ySLgznfgDOjdaryqr1ngBPrginfgDObcv39gaiqaacqWFzecucqGGBbWwcqWGybawdaWgaaWcbaGaem4DaChabeaakiabg2da9iabicdaWiabc2faDjabg2da9iabigdaXiabgwSixlabdwgaLnaaCaaaleqabaGaeyOeI0Iaeq4UdWMaeiikaGIaem4DaCNaeiykaKcaaaaa@49B8@</m:annotation>
                     </m:semantics>
                  </m:math>
               </display-formula>
            </p>
            <p>The expected number <it>N </it>of <it>q</it>-words that do not occur is therefore</p>
            <p>
               <display-formula id="M2"><it>N </it>&#8776; |&#931;|<sup><it>q</it></sup><it>e</it><sup>-<it>&#955;</it>(<it>w</it>)</sup></display-formula>
            </p>
            <p>As an example, for a random sequence <it>G </it>of length 3.1&#183;10<sup>9 </sup>and an unword <it>w </it>of length 14 and typical composition, we obtain a probability of 1.40082&#183;10<sup>-5 </sup>for <it>w </it>not occurring in <it>G</it>. Still, the expected number of unwords of length 14 is 2590.798, while for length 13, it is only 5.823108&#183;10<sup>-13</sup>. For even shorter unwords, it is practically zero.</p>
         </sec>
         <sec>
            <st>
               <p>Unwords algorithm</p>
            </st>
            <p>For convenience, we map each of the four letters of the DNA-alphabet to an integer in the range 0 to 3 as follows: <it>&#257; </it>= 0, <inline-formula><m:math name="1471-2105-9-167-i6" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mover accent="true"><m:mi>c</m:mi><m:mo>&#175;</m:mo></m:mover><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xH8viVGI8Gi=hEeeu0xXdbba9frFj0xb9qqpG0dXdb9aspeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGaciGaaiaabeqaaeqabiWaaaGcbaGafm4yamMbaebaaaa@2D3C@</m:annotation></m:semantics></m:math></inline-formula> = 1, <inline-formula><m:math name="1471-2105-9-167-i7" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mover accent="true"><m:mi>g</m:mi><m:mo>&#175;</m:mo></m:mover><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xH8viVGI8Gi=hEeeu0xXdbba9frFj0xb9qqpG0dXdb9aspeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGaciGaaiaabeqaaeqabiWaaaGcbaGafm4zaCMbaebaaaa@2D44@</m:annotation></m:semantics></m:math></inline-formula> = 2, <inline-formula><m:math name="1471-2105-9-167-i8" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mover accent="true"><m:mi>t</m:mi><m:mo>&#175;</m:mo></m:mover><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xH8viVGI8Gi=hEeeu0xXdbba9frFj0xb9qqpG0dXdb9aspeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGaciGaaiaabeqaaeqabiWaaaGcbaGafmiDaqNbaebaaaa@2D5E@</m:annotation></m:semantics></m:math></inline-formula> = 3. Moreover, for any fixed value <it>q</it>, we use a standard method to map each possible <it>q</it>-word to a number in the range [0, 4<sup><it>q </it></sup>- 1]. That is, we define <inline-formula><m:math name="1471-2105-9-167-i9" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mrow><m:msub><m:mi>&#981;</m:mi><m:mi>q</m:mi></m:msub><m:mo stretchy="false">(</m:mo><m:mi>w</m:mi><m:mo stretchy="false">)</m:mo><m:mo>=</m:mo><m:mstyle displaystyle="true"><m:msubsup><m:mo>&#8721;</m:mo><m:mrow><m:mi>i</m:mi><m:mo>=</m:mo><m:mn>1</m:mn></m:mrow><m:mi>q</m:mi></m:msubsup><m:mrow><m:mover accent="true"><m:mrow><m:mi>w</m:mi><m:mo stretchy="false">[</m:mo><m:mi>i</m:mi><m:mo stretchy="false">]</m:mo></m:mrow><m:mo stretchy="true">&#175;</m:mo></m:mover></m:mrow></m:mstyle><m:mo>&#8901;</m:mo><m:msup><m:mn>4</m:mn><m:mrow><m:mi>q</m:mi><m:mo>&#8722;</m:mo><m:mi>i</m:mi></m:mrow></m:msup></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xH8viVGI8Gi=hEeeu0xXdbba9frFj0xb9qqpG0dXdb9aspeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGaciGaaiaabeqaaeqabiWaaaGcbaGaeqy1dO2aaSbaaSqaaiabdghaXbqabaGccqGGOaakcqWG3bWDcqGGPaqkcqGH9aqpdaaeWaqaamaanaaabaGaem4DaCNaei4waSLaemyAaKMaeiyxa0faaaWcbaGaemyAaKMaeyypa0JaeGymaedabaGaemyCaehaniabggHiLdGccqGHflY1cqaI0aandaahaaWcbeqaaiabdghaXjabgkHiTiabdMgaPbaaaaa@46BC@</m:annotation></m:semantics></m:math></inline-formula> for any <it>q</it>-word <it>w</it>. In other words, <it>q</it>-words are mapped to their rank in the corresponding lexicographic order. Substrings in <it>G </it>containing at least one wildcard (e.g. N) are ignored. The integer value <it>&#966;</it><sub><it>q </it></sub>(<it>w</it>) serves as an index into a bit table &#937;<sub><it>q </it></sub>such that for all sequences <it>w </it>of length <it>q </it>we have: &#937;<sub><it>q </it></sub>[<it>&#966;</it><sub><it>q </it></sub>(<it>w</it>)] = 1 if and only if <it>w </it>occurs as a substring in the genome <it>G</it>. Let |&#937;<sub><it>q</it></sub>| denote the number of 1-entries in &#937;<sub><it>q</it></sub>.</p>
            <p>Initially we set all bits in &#937;<sub><it>q </it></sub>to 0. This requires <inline-formula><m:math name="1471-2105-9-167-i10" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mrow><m:mi>O</m:mi><m:mrow><m:mo>(</m:mo><m:mrow><m:mfrac><m:mrow><m:msup><m:mn>4</m:mn><m:mi>q</m:mi></m:msup></m:mrow><m:mi>&#969;</m:mi></m:mfrac></m:mrow><m:mo>)</m:mo></m:mrow></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xH8viVGI8Gi=hEeeu0xXdbba9frFj0xb9qqpG0dXdb9aspeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGaciGaaiaabeqaaeqabiWaaaGcbaGaem4ta80aaeWaaKqbagaadaWcaaqaaiabisda0maaCaaabeqaaiabdghaXbaaaeaacqaHjpWDaaaakiaawIcacaGLPaaaaaa@337D@</m:annotation></m:semantics></m:math></inline-formula> time, where <it>w </it>is the computer word size. Then we sweep a window of width <it>q </it>over <it>G </it>from left to right. For the first window <it>G </it>[1..<it>q</it>] we determine the integer code <it>&#966;</it><sub><it>q </it></sub>(<it>G </it>[1..<it>q</it>]) as defined above in <it>O</it>(<it>q</it>) time. For each of the remaining <it>n </it>- <it>q </it>windows, say at start position <it>i </it>+ 1, we compute <it>&#966;</it><sub><it>q </it></sub>(<it>G</it>[<it>i </it>+ 1..<it>i </it>+ <it>q</it>]) in constant time from <it>&#966;</it><sub><it>q </it></sub>(<it>G</it>[<it>i..i </it>+ <it>q </it>- 1]) according to the following equation:</p>
            <p>
               <display-formula id="M3">
                  <m:math name="1471-2105-9-167-i11" xmlns:m="http://www.w3.org/1998/Math/MathML">
                     <m:semantics>
                        <m:mrow>
                           <m:msub>
                              <m:mi>&#981;</m:mi>
                              <m:mi>q</m:mi>
                           </m:msub>
                           <m:mo stretchy="false">(</m:mo>
                           <m:mi>G</m:mi>
                           <m:mo stretchy="false">[</m:mo>
                           <m:mi>i</m:mi>
                           <m:mo>+</m:mo>
                           <m:mn>1 ..</m:mn>
                           <m:mi> i</m:mi>
                           <m:mo>+</m:mo>
                           <m:mi>q</m:mi>
                           <m:mo stretchy="false">]</m:mo>
                           <m:mo stretchy="false">)</m:mo>
                           <m:mo>=</m:mo>
                           <m:mo stretchy="false">(</m:mo>
                           <m:msub>
                              <m:mi>&#981;</m:mi>
                              <m:mi>q</m:mi>
                           </m:msub>
                           <m:mo stretchy="false">(</m:mo>
                           <m:mi>G</m:mi>
                           <m:mo stretchy="false">[</m:mo>
                           <m:mi>i</m:mi>
                           <m:mn> .. </m:mn>
                           <m:mi>i</m:mi>
                           <m:mo>+</m:mo>
                           <m:mi>q</m:mi>
                           <m:mo>&#8722;</m:mo>
                           <m:mn>1</m:mn>
                           <m:mo stretchy="false">]</m:mo>
                           <m:mo stretchy="false">)</m:mo>
                           <m:mo>&#8722;</m:mo>
                           <m:msup>
                              <m:mn>4</m:mn>
                              <m:mrow>
                                 <m:mi>q</m:mi>
                                 <m:mo>&#8722;</m:mo>
                                 <m:mn>1</m:mn>
                              </m:mrow>
                           </m:msup>
                           <m:mo>&#8901;</m:mo>
                           <m:mover accent="true">
                              <m:mrow>
                                 <m:mi>G</m:mi>
                                 <m:mo stretchy="false">[</m:mo>
                                 <m:mi>i</m:mi>
                                 <m:mo stretchy="false">]</m:mo>
                              </m:mrow>
                              <m:mo stretchy="true">&#175;</m:mo>
                           </m:mover>
                           <m:mo stretchy="false">)</m:mo>
                           <m:mo>&#8901;</m:mo>
                           <m:mn>4</m:mn>
                           <m:mo>+</m:mo>
                           <m:mover accent="true">
                              <m:mrow>
                                 <m:mi>G</m:mi>
                                 <m:mo stretchy="false">[</m:mo>
                                 <m:mi>i</m:mi>
                                 <m:mo>+</m:mo>
                                 <m:mi>q</m:mi>
                                 <m:mo stretchy="false">]</m:mo>
                              </m:mrow>
                              <m:mo stretchy="true">&#175;</m:mo>
                           </m:mover>
                        </m:mrow>
                        <m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xI8qiVKYPFjYdHaVhbbf9v8qqaqFr0xc9vqFj0dXdbba91qpepeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGaciGaaiaabeqaaeqabiWaaaGcbaGaeqy1dO2aaSbaaSqaaiabdghaXbqabaGccqGGOaakcqWGhbWrcqGGBbWwcqWGPbqAcqGHRaWkcqaIXaqmcqGGUaGlcqGGUaGlcqWGPbqAcqGHRaWkcqWGXbqCcqGGDbqxcqGGPaqkcqGH9aqpcqGGOaakcqaHvpGAdaWgaaWcbaGaemyCaehabeaakiabcIcaOiabdEeahjabcUfaBjabdMgaPjabc6caUiabc6caUiabdMgaPjabgUcaRiabdghaXjabgkHiTiabigdaXiabc2faDjabcMcaPiabgkHiTiabisda0maaCaaaleqabaGaemyCaeNaeyOeI0IaeGymaedaaOGaeyyXIC9aa0aaaeaacqWGhbWrcqGGBbWwcqWGPbqAcqGGDbqxaaGaeiykaKIaeyyXICTaeGinaqJaey4kaSYaa0aaaeaacqWGhbWrcqGGBbWwcqWGPbqAcqGHRaWkcqWGXbqCcqGGDbqxaaaaaa@69AD@</m:annotation>
                     </m:semantics>
                  </m:math>
               </display-formula>
            </p>
            <p>Thus the computation of the <it>n </it>- <it>q </it>+ 1 integer code requires <it>O</it>(<it>n</it>) time. The multiplication and addition in can be implemented by fast bit-shift and bit-or operations. If <it>j </it>is the current integer code and &#937;<sub><it>q </it></sub>[<it>j</it>] is 0, then we set &#937;<sub><it>q </it></sub>[<it>j</it>] to 1 and increment a counter of the number of 1-entries in &#937;<sub><it>q</it></sub>. This can be done in constant time. Note that once |&#937;<sub><it>q</it></sub>| = 4<sup><it>q</it></sup>, we can stop scanning <it>G</it>. While the time requirement of this algorithm is <inline-formula><m:math name="1471-2105-9-167-i12" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mrow><m:mi>O</m:mi><m:mrow><m:mo>(</m:mo><m:mrow><m:mi>n</m:mi><m:mo>+</m:mo><m:mfrac><m:mrow><m:msup><m:mn>4</m:mn><m:mi>q</m:mi></m:msup></m:mrow><m:mi>&#969;</m:mi></m:mfrac></m:mrow><m:mo>)</m:mo></m:mrow></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xH8viVGI8Gi=hEeeu0xXdbba9frFj0xb9qqpG0dXdb9aspeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGaciGaaiaabeqaaeqabiWaaaGcbaGaem4ta80aaeWaaeaacqWGUbGBcqGHRaWkjuaGdaWcaaqaaiabisda0maaCaaabeqaaiabdghaXbaaaeaacqaHjpWDaaaakiaawIcacaGLPaaaaaa@35C4@</m:annotation></m:semantics></m:math></inline-formula> it uses <it>O</it>(1) + 2<it>q </it>+ 4<sup><it>q </it></sup>bits of space, as only <it>q </it>consecutive letters in <it>G </it>need to be stored in memory.</p>
            <p>If |&#937;<sub><it>q</it></sub>| = 4<sup><it>q</it></sup>, i.e. all 4<sup><it>q </it></sup>entries in &#937;<sub><it>q </it></sub>are 1, then we know that all possible <it>q</it>-words occur in <it>G</it>. Hence there is no unword of length <it>q </it>in <it>G</it>. On the other hand, if after processing all <it>q</it>-words in <it>G</it>, |&#937;<sub><it>q</it></sub>| &lt; 4<sup><it>q</it></sup>, there are some unwords of length <it>q</it>. If additionally |&#937;<sub><it>q</it>-1</sub>| = 4<sup><it>q</it>-1</sup>, then we know that <it>q </it>is the smallest value such that unwords of length <it>q </it>exist. The unwords can easily be computed by determining all <it>j </it>such that &#937;<sub><it>q </it></sub>[<it>j</it>] = 0. Given <it>j</it>, one determines the corresponding <it>q</it>-word <it>w </it>satisfying <it>&#966;</it><sub><it>q </it></sub>(<it>w</it>) = <it>j </it>in <it>O</it>(<it>q</it>) time. Thus the unwords are enumerated in <it>O</it>(4<sup>1 </sup>+ <it>qz</it>) time where <it>z </it>is the number of unwords.</p>
            <p>Let <it>q</it>* be the smallest value such that there are unwords of length <it>q</it>*. Consider the possible range of values for <it>q </it>for a given genome length <it>n</it>. Let <it>q</it><sup>max </sup>= &#8968;log<sub>4 </sub>(<it>n </it>+ 1)&#8969;. Then <inline-formula><m:math name="1471-2105-9-167-i13" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mrow><m:msup><m:mn>4</m:mn><m:mrow><m:msup><m:mi>q</m:mi><m:mrow><m:mi>max</m:mi><m:mo>&#8289;</m:mo></m:mrow></m:msup></m:mrow></m:msup><m:mo>=</m:mo><m:msup><m:mn>4</m:mn><m:mrow><m:mrow><m:mo>&#8968;</m:mo><m:mrow><m:msub><m:mrow><m:mi>log</m:mi><m:mo>&#8289;</m:mo></m:mrow><m:mn>4</m:mn></m:msub><m:mo stretchy="false">(</m:mo><m:mi>n</m:mi><m:mo>+</m:mo><m:mn>1</m:mn><m:mo stretchy="false">)</m:mo></m:mrow><m:mo>&#8969;</m:mo></m:mrow></m:mrow></m:msup><m:mo>&#8805;</m:mo><m:mi>n</m:mi><m:mo>+</m:mo><m:mn>1</m:mn><m:mo>></m:mo><m:mi>n</m:mi><m:mo>&#8805;</m:mo><m:mi>n</m:mi><m:mo>&#8722;</m:mo><m:msup><m:mi>q</m:mi><m:mrow><m:mi>max</m:mi><m:mo>&#8289;</m:mo></m:mrow></m:msup><m:mo>+</m:mo><m:mn>1</m:mn></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xH8viVGI8Gi=hEeeu0xXdbba9frFj0xb9qqpG0dXdb9aspeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGaciGaaiaabeqaaeqabiWaaaGcbaGaeGinaqZaaWbaaSqabeaacqWGXbqCdaahaaadbeqaaiGbc2gaTjabcggaHjabcIha4baaaaGccqGH9aqpcqaI0aandaahaaWcbeqaamaahmaabaGagiiBaWMaei4Ba8Maei4zaC2aaSbaaWqaaiabisda0aqabaWccqGGOaakcqWGUbGBcqGHRaWkcqaIXaqmcqGGPaqkaiaaw6o+caGL5JpaaaGccqGHLjYScqWGUbGBcqGHRaWkcqaIXaqmcqGH+aGpcqWGUbGBcqGHLjYScqWGUbGBcqGHsislcqWGXbqCdaahaaWcbeqaaiGbc2gaTjabcggaHjabcIha4baakiabgUcaRiabigdaXaaa@575A@</m:annotation></m:semantics></m:math></inline-formula>. Note that <it>G </it>contains <it>n </it>- <it>q</it><sup>max </sup>+ 1 substrings of length <it>q</it><sup>max</sup>. Hence <it>G </it>is too short to accommodate all possible <it>q</it><sup>max</sup>-words and therefore there are some unwords of length <it>q</it><sup>max</sup>. Thus <it>q</it>* &#8804; <it>q</it><sup>max</sup>, i.e. we can restrict the search for <it>q</it>* to the range [1, <it>q</it><sup>max</sup>].</p>
            <p>There are basically two strategies to determine <it>q</it>*. The first strategy (linear search) starts with <it>q </it>= 1 and increments <it>q </it>until |&#937;<sub><it>q</it></sub>| &lt; 4<sup><it>q</it></sup>. Then <it>q</it>* = <it>q</it>. The space requirement is <it>O</it>(1) + 2<it>q</it>* + 4<sup><it>q</it></sup>* and the running time is</p>
            <p>
               <display-formula id="M4">
                  <m:math name="1471-2105-9-167-i14" xmlns:m="http://www.w3.org/1998/Math/MathML">
                     <m:semantics>
                        <m:mrow>
                           <m:mi>O</m:mi>
                           <m:mo stretchy="false">(</m:mo>
                           <m:msup>
                              <m:mn>4</m:mn>
                              <m:mrow>
                                 <m:msup>
                                    <m:mi>q</m:mi>
                                    <m:mo>&#8727;</m:mo>
                                 </m:msup>
                              </m:mrow>
                           </m:msup>
                           <m:mo>+</m:mo>
                           <m:msup>
                              <m:mi>q</m:mi>
                              <m:mo>&#8727;</m:mo>
                           </m:msup>
                           <m:mi>z</m:mi>
                           <m:mo stretchy="false">)</m:mo>
                           <m:mo>+</m:mo>
                           <m:mstyle displaystyle="true">
                              <m:munderover>
                                 <m:mo>&#8721;</m:mo>
                                 <m:mrow>
                                    <m:mi>q</m:mi>
                                    <m:mo>=</m:mo>
                                    <m:mn>1</m:mn>
                                 </m:mrow>
                                 <m:mrow>
                                    <m:msup>
                                       <m:mi>q</m:mi>
                                       <m:mo>&#8727;</m:mo>
                                    </m:msup>
                                 </m:mrow>
                              </m:munderover>
                              <m:mrow>
                                 <m:mi>O</m:mi>
                                 <m:mrow>
                                    <m:mo>(</m:mo>
                                    <m:mrow>
                                       <m:mi>n</m:mi>
                                       <m:mo>+</m:mo>
                                       <m:mfrac>
                                          <m:mrow>
                                             <m:msup>
                                                <m:mn>4</m:mn>
                                                <m:mi>q</m:mi>
                                             </m:msup>
                                          </m:mrow>
                                          <m:mi>&#969;</m:mi>
                                       </m:mfrac>
                                    </m:mrow>
                                    <m:mo>)</m:mo>
                                 </m:mrow>
                              </m:mrow>
                           </m:mstyle>
                           <m:mo>=</m:mo>
                           <m:mi>O</m:mi>
                           <m:mo stretchy="false">(</m:mo>
                           <m:msup>
                              <m:mn>4</m:mn>
                              <m:mrow>
                                 <m:msup>
                                    <m:mi>q</m:mi>
                                    <m:mo>&#8727;</m:mo>
                                 </m:msup>
                              </m:mrow>
                           </m:msup>
                           <m:mo>+</m:mo>
                           <m:msup>
                              <m:mi>q</m:mi>
                              <m:mo>&#8727;</m:mo>
                           </m:msup>
                           <m:mi>z</m:mi>
                           <m:mo stretchy="false">)</m:mo>
                           <m:mo>+</m:mo>
                           <m:mi>O</m:mi>
                           <m:mo stretchy="false">(</m:mo>
                           <m:msup>
                              <m:mi>q</m:mi>
                              <m:mo>&#8727;</m:mo>
                           </m:msup>
                           <m:mi>n</m:mi>
                           <m:mo stretchy="false">)</m:mo>
                           <m:mo>+</m:mo>
                           <m:mi>O</m:mi>
                           <m:mrow>
                              <m:mo>(</m:mo>
                              <m:mrow>
                                 <m:mfrac>
                                    <m:mrow>
                                       <m:msup>
                                          <m:mn>4</m:mn>
                                          <m:mrow>
                                             <m:msup>
                                                <m:mi>q</m:mi>
                                                <m:mo>&#8727;</m:mo>
                                             </m:msup>
                                             <m:mo>+</m:mo>
                                             <m:mn>1</m:mn>
                                          </m:mrow>
                                       </m:msup>
                                    </m:mrow>
                                    <m:mi>&#969;</m:mi>
                                 </m:mfrac>
                              </m:mrow>
                              <m:mo>)</m:mo>
                           </m:mrow>
                           <m:mo>,</m:mo>
                        </m:mrow>
                        <m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xI8qiVKYPFjYdHaVhbbf9v8qqaqFr0xc9vqFj0dXdbba91qpepeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGaciGaaiaabeqaaeqabiWaaaGcbaGaem4ta8KaeiikaGIaeGinaqZaaWbaaSqabeaacqWGXbqCdaahaaadbeqaaiabgEHiQaaaaaGccqGHRaWkcqWGXbqCdaahaaWcbeqaaiabgEHiQaaakiabdQha6jabcMcaPiabgUcaRmaaqahabaGaem4ta80aaeWaaeaacqWGUbGBcqGHRaWkjuaGdaWcaaqaaiabisda0maaCaaabeqaaiabdghaXbaaaeaacqaHjpWDaaaakiaawIcacaGLPaaaaSqaaiabdghaXjabg2da9iabigdaXaqaaiabdghaXnaaCaaameqabaGaey4fIOcaaaqdcqGHris5aOGaeyypa0Jaem4ta8KaeiikaGIaeGinaqZaaWbaaSqabeaacqWGXbqCdaahaaadbeqaaiabgEHiQaaaaaGccqGHRaWkcqWGXbqCdaahaaWcbeqaaiabgEHiQaaakiabdQha6jabcMcaPiabgUcaRiabd+eapjabcIcaOiabdghaXnaaCaaaleqabaGaey4fIOcaaOGaemOBa4MaeiykaKIaey4kaSIaem4ta80aaeWaaKqbagaadaWcaaqaaiabisda0maaCaaabeqaaiabdghaXnaaCaaabeqaaiabgEHiQaaacqGHRaWkcqaIXaqmaaaabaGaeqyYdChaaaGccaGLOaGaayzkaaGaeiilaWcaaa@6B29@</m:annotation>
                     </m:semantics>
                  </m:math>
               </display-formula>
            </p>
            <p>where <it>z </it>is the number of unwords. Note that we have <inline-formula><m:math name="1471-2105-9-167-i15" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mrow><m:mi>n</m:mi><m:mo>&#8805;</m:mo><m:msup><m:mn>4</m:mn><m:mrow><m:msup><m:mi>q</m:mi><m:mo>&#8727;</m:mo></m:msup><m:mo>&#8722;</m:mo><m:mn>1</m:mn></m:mrow></m:msup><m:mo>=</m:mo><m:mfrac><m:mrow><m:msup><m:mn>4</m:mn><m:mrow><m:msup><m:mi>q</m:mi><m:mo>&#8727;</m:mo></m:msup><m:mo>+</m:mo><m:mn>1</m:mn></m:mrow></m:msup></m:mrow><m:mrow><m:msup><m:mn>4</m:mn><m:mn>2</m:mn></m:msup></m:mrow></m:mfrac><m:mo>&#8805;</m:mo><m:mfrac><m:mrow><m:msup><m:mn>4</m:mn><m:mrow><m:msup><m:mi>q</m:mi><m:mo>&#8727;</m:mo></m:msup><m:mo>+</m:mo><m:mn>1</m:mn></m:mrow></m:msup></m:mrow><m:mi>&#969;</m:mi></m:mfrac></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xH8viVGI8Gi=hEeeu0xXdbba9frFj0xb9qqpG0dXdb9aspeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGaciGaaiaabeqaaeqabiWaaaGcbaGaemOBa4MaeyyzImRaeGinaqZaaWbaaSqabeaacqWGXbqCdaahaaadbeqaaiabgEHiQaaaliabgkHiTiabigdaXaaakiabg2da9KqbaoaalaaabaGaeGinaqZaaWbaaeqabaGaemyCae3aaWbaaeqabaGaey4fIOcaaiabgUcaRiabigdaXaaaaeaacqaI0aandaahaaqabeaacqaIYaGmaaaaaOGaeyyzImBcfa4aaSaaaeaacqaI0aandaahaaqabeaacqWGXbqCdaahaaqabeaacqGHxiIkaaGaey4kaSIaeGymaedaaaqaaiabeM8a3baaaaa@4752@</m:annotation></m:semantics></m:math></inline-formula> under the realistic assumption that the machine word size <it>&#969; </it>is at least 4<sup>2</sup>. Hence <it>n </it>dominates the last term in (4). Thus the overall running time for the linear search is <it>O</it>(4<sup><it>q</it></sup>* + <it>q</it>* (<it>n </it>+ <it>z</it>)).</p>
            <p>The second strategy determines <it>q</it>* by a binary search in the range [1, <it>q</it><sup>max</sup>], as described in Table <tblr tid="T1">1</tblr>. The strategy is optimal in the sense that it tests a minimal number of possible values of <it>q </it>before it arrives at <it>q</it>*. Unfortunately, a value <it>q' </it>determined in line 8 of Table <tblr tid="T1">1</tblr>, may or may not be modified later in the loop, which means that one has to store the corresponding table &#937;<sub><it>q' </it></sub>or recompute it later. The running time of the binary search is <inline-formula><m:math name="1471-2105-9-167-i16" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mrow><m:mi>O</m:mi><m:mo stretchy="false">(</m:mo><m:msup><m:mn>4</m:mn><m:mrow><m:msup><m:mi>q</m:mi><m:mo>&#8727;</m:mo></m:msup></m:mrow></m:msup><m:mo>+</m:mo><m:msup><m:mi>q</m:mi><m:mo>&#8727;</m:mo></m:msup><m:mi>z</m:mi><m:mo stretchy="false">)</m:mo><m:mo>+</m:mo><m:msub><m:mrow><m:mi>log</m:mi><m:mo>&#8289;</m:mo></m:mrow><m:mn>2</m:mn></m:msub><m:msup><m:mi>q</m:mi><m:mrow><m:mi>max</m:mi><m:mo>&#8289;</m:mo></m:mrow></m:msup><m:mo stretchy="false">(</m:mo><m:mi>n</m:mi><m:mo>+</m:mo><m:mfrac><m:mrow><m:msup><m:mn>4</m:mn><m:mrow><m:msup><m:mi>q</m:mi><m:mrow><m:mi>max</m:mi><m:mo>&#8289;</m:mo></m:mrow></m:msup><m:mo>&#8722;</m:mo><m:mn>1</m:mn></m:mrow></m:msup></m:mrow><m:mi>&#969;</m:mi></m:mfrac><m:mo stretchy="false">)</m:mo></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xH8viVGI8Gi=hEeeu0xXdbba9frFj0xb9qqpG0dXdb9aspeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGaciGaaiaabeqaaeqabiWaaaGcbaGaem4ta8KaeiikaGIaeGinaqZaaWbaaSqabeaacqWGXbqCdaahaaadbeqaaiabgEHiQaaaaaGccqGHRaWkcqWGXbqCdaahaaWcbeqaaiabgEHiQaaakiabdQha6jabcMcaPiabgUcaRiGbcYgaSjabc+gaVjabcEgaNnaaBaaaleaacqaIYaGmaeqaaOGaemyCae3aaWbaaSqabeaacyGGTbqBcqGGHbqycqGG4baEaaGccqGGOaakcqWGUbGBcqGHRaWkjuaGdaWcaaqaaiabisda0maaCaaabeqaaiabdghaXnaaCaaabeqaaiGbc2gaTjabcggaHjabcIha4baacqGHsislcqaIXaqmaaaabaGaeqyYdChaaOGaeiykaKcaaa@5259@</m:annotation></m:semantics></m:math></inline-formula>. Its space requirement is <it>O</it>(1) + 2<it>q</it><sup>max </sup>+ <inline-formula><m:math name="1471-2105-9-167-i17" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mrow><m:msup><m:mn>4</m:mn><m:mrow><m:msup><m:mi>q</m:mi><m:mrow><m:mi>max</m:mi><m:mo>&#8289;</m:mo></m:mrow></m:msup></m:mrow></m:msup></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xH8viVGI8Gi=hEeeu0xXdbba9frFj0xb9qqpG0dXdb9aspeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGaciGaaiaabeqaaeqabiWaaaGcbaGaeGinaqZaaWbaaSqabeaacqWGXbqCdaahaaadbeqaaiGbc2gaTjabcggaHjabcIha4baaaaaaaa@32B7@</m:annotation></m:semantics></m:math></inline-formula>.</p>
            <tbl id="T1">
               <title>
                  <p>Table 1</p>
               </title>
               <caption>
                  <p>Algorithm for computing <it>q</it>* by a binary search strategy.</p>
               </caption>
               <tblbdy cols="1">
                  <r>
                     <c ca="left">
                        <p>1: determine sequence length <it>n</it></p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>2: <it>l </it>&#8592; 1</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>3: <it>r </it>&#8592; &#8568;log<sub>4 </sub>(<it>n </it>+ 1)&#8569;</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>4: <b>while </b><it>l </it>&#8804; <it>r </it><b>do</b></p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>5: &#160;&#160;&#160;<it>q </it>&#8592; (<it>l </it>+ <it>r</it>)/2</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>6: &#160;&#160;&#160;compute &#937;<sub><it>q</it></sub></p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>7: &#160;&#160;&#160;<b>if </b>|&#937;<sub><it>q</it></sub>| &lt; 4<sup><it>q </it></sup><b>then</b></p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>8: &#160;&#160;&#160;&#160;&#160;&#160;<it>q' </it>&#8592; <it>q</it></p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>9: &#160;&#160;&#160;&#160;&#160;&#160;&#937;<sub><it>q' </it></sub>&#8592; &#937;<sub><it>q</it></sub></p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>10: &#160;&#160;&#160;&#160;&#160;&#160;<it>r </it>&#8592; <it>q </it>- 1</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>11: &#160;&#160;&#160;<b>else</b></p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>12: &#160;&#160;&#160;&#160;&#160;&#160;<it>l </it>&#8592; <it>q </it>+ 1</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>13: &#160;&#160;&#160;<b>end if</b></p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>14: <b>end while</b></p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>15: <it>q</it>* &#8592; <it>q'</it></p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>16: &#937;<sub><it>q</it>* </sub>&#8592; &#937;<sub><it>q'</it></sub></p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>17: <b>for all </b><it>j </it>&#8712; [0, 4<sup><it>q</it></sup>* - 1] <b>do</b></p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>18: &#160;&#160;&#160;<b>if </b>&#937;<sub><it>q</it>* </sub>[<it>j</it>] = 0 <b>then</b></p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>19: &#160;&#160;&#160;&#160;&#160;&#160;print <it>w </it>such that <it>&#966;</it><sub><it>q</it>* </sub>(<it>w</it>) = <it>j</it></p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>20: &#160;&#160;&#160;<b>end if</b></p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>21: <b>end for</b></p>
                     </c>
                  </r>
               </tblbdy>
            </tbl>
         </sec>
         <sec>
            <st>
               <p>Testing</p>
            </st>
            <p>We used our first implementation (based on suffix-arrays) of an unwords algorithm to cross-validate the program presented here. Applied to the human genome, both algorithms (which are completely independent) produce the same set of unwords. This makes us sure that our set of 104 human unwords is indeed complete, in contrast to the 80 unwords reported in <abbrgrp><abbr bid="B9">9</abbr></abbrgrp>. (If a smaller genome assembly or repeat masked sequences were used in this earlier study, more rather than less unwords should have been detected.) We computed unwords for six eucaryotic genomes: <it>Homo sapiens</it>, Release NCBI36 <abbrgrp><abbr bid="B14">14</abbr></abbrgrp>, <it>Mus musculus</it>, Release NCBIm36 <abbrgrp><abbr bid="B15">15</abbr></abbrgrp>, <it>Drosophila melanogaster</it>, Release 5.1 <abbrgrp><abbr bid="B16">16</abbr></abbrgrp>, <it>Caenorhabditis elegans</it>, Release WS170 <abbrgrp><abbr bid="B17">17</abbr></abbrgrp>, <it>Neurospora crassa </it><abbrgrp><abbr bid="B18">18</abbr></abbrgrp> and <it>Saccharomyces cerevisiae</it>, Release SGD1.01 <abbrgrp><abbr bid="B19">19</abbr></abbrgrp>, including nonchromosomal sequences which could not be assigned to a chromosome. Additionally, unwords for two bacterial genomes were calculated: <it>Staphylococcus aureus subsp. aureus </it>strain MSSA476, Refseq number NC_002953 and <it>Mycoplasma genitalium</it>, Refseq number NC_000908, as well as for two Archaea genomes:</p>
            <p><it>Thermococcus kodakarensis</it>, Release KOD1 <abbrgrp><abbr bid="B20">20</abbr></abbrgrp> and <it>Methanocaldococcus jannaschii</it>, Release DSM 2661 <abbrgrp><abbr bid="B21">21</abbr></abbrgrp>. Table <tblr tid="T2">2</tblr> gives a summary of genome sizes and unword lengths and numbers. In Table <tblr tid="T3">3</tblr>, we show the unwords computed from the human genome. We also indicate the number of occurrences expected for each unword &#8211; if the genome was a random sequence, which of course is not the case. Deviation of GC content in unwords is summarized in Table <tblr tid="T4">4</tblr>. Unwords for the other genomes are given in Tables <tblr tid="T5">5</tblr>, <tblr tid="T6">6</tblr>, <tblr tid="T7">7</tblr>, <tblr tid="T8">8</tblr>, <tblr tid="T9">9</tblr>, <tblr tid="T10">10</tblr>, <tblr tid="T11">11</tblr>, <tblr tid="T12">12</tblr>.</p>
            <tbl id="T2">
               <title>
                  <p>Table 2</p>
               </title>
               <caption>
                  <p>Genome sizes (including sequences not assigned to a chromosome), the logarithm of the genome size to the base of 10, length and number of unwords of the analyzed genomes</p>
               </caption>
               <tblbdy cols="6">
                  <r>
                     <c ca="left">
                        <p>Organism</p>
                     </c>
                     <c ca="center">
                        <p>Genome size</p>
                     </c>
                     <c ca="right">
                        <p>&#8970;log<sub>10 </sub>|<it>G</it>|&#8971;</p>
                     </c>
                     <c ca="right">
                        <p>&#8970;log<sub>4 </sub>|<it>G</it>|&#8971;</p>
                     </c>
                     <c ca="right">
                        <p>#unwords</p>
                     </c>
                     <c ca="right">
                        <p>length</p>
                     </c>
                  </r>
                  <r>
                     <c cspan="6">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <it>H. sapiens</it>
                        </p>
                     </c>
                     <c ca="center">
                        <p>&#8776; 3.1 Gb</p>
                     </c>
                     <c ca="right">
                        <p>9</p>
                     </c>
                     <c ca="right">
                        <p>15.8</p>
                     </c>
                     <c ca="right">
                        <p>104</p>
                     </c>
                     <c ca="right">
                        <p>11</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <it>M. musculus</it>
                        </p>
                     </c>
                     <c ca="center">
                        <p>&#8776; 2.7 Gb</p>
                     </c>
                     <c ca="right">
                        <p>9</p>
                     </c>
                     <c ca="right">
                        <p>15.7</p>
                     </c>
                     <c ca="right">
                        <p>192</p>
                     </c>
                     <c ca="right">
                        <p>11</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <it>D. melanogaster</it>
                        </p>
                     </c>
                     <c ca="center">
                        <p>&#8776; 132 Mb</p>
                     </c>
                     <c ca="right">
                        <p>8</p>
                     </c>
                     <c ca="right">
                        <p>13.5</p>
                     </c>
                     <c ca="right">
                        <p>104</p>
                     </c>
                     <c ca="right">
                        <p>11</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <it>C. elegans</it>
                        </p>
                     </c>
                     <c ca="center">
                        <p>&#8776; 100 Mb</p>
                     </c>
                     <c ca="right">
                        <p>8</p>
                     </c>
                     <c ca="right">
                        <p>13.3</p>
                     </c>
                     <c ca="right">
                        <p>2</p>
                     </c>
                     <c ca="right">
                        <p>10</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <it>N. crassa</it>
                        </p>
                     </c>
                     <c ca="center">
                        <p>&#8776; 34 Mb</p>
                     </c>
                     <c ca="right">
                        <p>7</p>
                     </c>
                     <c ca="right">
                        <p>12.5</p>
                     </c>
                     <c ca="right">
                        <p>2262</p>
                     </c>
                     <c ca="right">
                        <p>11</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <it>S. cerevisiae</it>
                        </p>
                     </c>
                     <c ca="center">
                        <p>&#8776; 12 Mb</p>
                     </c>
                     <c ca="right">
                        <p>7</p>
                     </c>
                     <c ca="right">
                        <p>11.8</p>
                     </c>
                     <c ca="right">
                        <p>4</p>
                     </c>
                     <c ca="right">
                        <p>9</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <it>S. aureus</it>
                        </p>
                     </c>
                     <c ca="center">
                        <p>&#8776; 2.79 Mb</p>
                     </c>
                     <c ca="right">
                        <p>6</p>
                     </c>
                     <c ca="right">
                        <p>10.7</p>
                     </c>
                     <c ca="right">
                        <p>248</p>
                     </c>
                     <c ca="right">
                        <p>8</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <it>T. kodakarensis</it>
                        </p>
                     </c>
                     <c ca="center">
                        <p>&#8776; 2.08 Mb</p>
                     </c>
                     <c ca="right">
                        <p>6</p>
                     </c>
                     <c ca="right">
                        <p>10.5</p>
                     </c>
                     <c ca="right">
                        <p>1</p>
                     </c>
                     <c ca="right">
                        <p>8</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <it>M. jannaschii</it>
                        </p>
                     </c>
                     <c ca="center">
                        <p>&#8776; 1.66 Mb</p>
                     </c>
                     <c ca="right">
                        <p>6</p>
                     </c>
                     <c ca="right">
                        <p>10.3</p>
                     </c>
                     <c ca="right">
                        <p>3</p>
                     </c>
                     <c ca="right">
                        <p>6</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <it>M. genitalium</it>
                        </p>
                     </c>
                     <c ca="center">
                        <p>&#8776; 0.58 Mb</p>
                     </c>
                     <c ca="right">
                        <p>5</p>
                     </c>
                     <c ca="right">
                        <p>9.6</p>
                     </c>
                     <c ca="right">
                        <p>5</p>
                     </c>
                     <c ca="right">
                        <p>6</p>
                     </c>
                  </r>
               </tblbdy>
            </tbl>
            <tbl id="T3">
               <title>
                  <p>Table 3</p>
               </title>
               <caption>
                  <p>Unwords for the human genome and their expected number of occurrences. The four words which are also unwords for the mouse genome are shown in a box.</p>
               </caption>
               <tblbdy cols="8">
                  <r>
                     <c ca="center">
                        <p>
                           <monospace>accgatacgcg</monospace>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <monospace>153</monospace>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <monospace>accgttcgtcg</monospace>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <monospace>153</monospace>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <monospace>acgaccgttcg</monospace>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <monospace>153</monospace>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <monospace>acgatcgtcgg</monospace>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <monospace>153</monospace>
                        </p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>
                           <monospace>acgcgcgatat</monospace>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <monospace>221</monospace>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <monospace>acggtacgtcg</monospace>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <monospace>153</monospace>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <monospace>agcgtcgtacg</monospace>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <monospace>153</monospace>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <monospace>atatcgcgcgg</monospace>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <monospace>153</monospace>
                        </p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>
                           <monospace>atatcgcgcgt</monospace>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <monospace>221</monospace>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <monospace>atcgtcgacga</monospace>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <monospace>221</monospace>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <monospace>atgtcgcgcga</monospace>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <monospace>153</monospace>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <monospace>catatcgcgcg</monospace>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <monospace>153</monospace>
                        </p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>
                           <monospace>ccgaatacgcg</monospace>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <monospace>153</monospace>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <monospace>ccgacgatcga</monospace>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <monospace>153</monospace>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <monospace>ccgacgatcgt</monospace>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <monospace>153</monospace>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <monospace>ccgatacgtcg</monospace>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <monospace>153</monospace>
                        </p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>
                           <monospace>ccgcgcgatat</monospace>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <monospace>153</monospace>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <monospace>ccgtcgaacgc</monospace>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <monospace>106</monospace>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <monospace>ccgttacgtcg</monospace>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <monospace>153</monospace>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <monospace>cgaacggtcgt</monospace>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <monospace>153</monospace>
                        </p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>
                           <monospace>cgaatcgacga</monospace>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <monospace>221</monospace>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <monospace>cgaatcgcgta</monospace>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <monospace>221</monospace>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <monospace>cgaccgatacg</monospace>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <monospace>153</monospace>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <monospace>cgacgaacgag</monospace>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <monospace>153</monospace>
                        </p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>
                           <monospace>cgacgaacggt</monospace>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <monospace>153</monospace>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <monospace>
                              <b>cgacgcgatac</b>
                           </monospace>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <monospace>153</monospace>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <monospace>cgacgcgtata</monospace>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <monospace>221</monospace>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <monospace>cgacggacgta</monospace>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <monospace>153</monospace>
                        </p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>
                           <monospace>cgacgtaacgg</monospace>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <monospace>153</monospace>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <monospace>cgacgtaccgt</monospace>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <monospace>153</monospace>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <monospace>cgacgtatcgg</monospace>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <monospace>153</monospace>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <monospace>cgatcgtgcga</monospace>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <monospace>153</monospace>
                        </p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>
                           <monospace>cgattacgcga</monospace>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <monospace>221</monospace>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <monospace>cgattcggcga</monospace>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <monospace>153</monospace>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <monospace>cgcgacgcata</monospace>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <monospace>153</monospace>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <monospace>cgcgacgttaa</monospace>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <monospace>221</monospace>
                        </p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>
                           <monospace>cgcgcataata</monospace>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <monospace>319</monospace>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <monospace>cgcgcgatatg</monospace>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <monospace>153</monospace>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <monospace>cgcgctatacg</monospace>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <monospace>153</monospace>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <monospace>cgcgtaacgcg</monospace>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <monospace>106</monospace>
                        </p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>
                           <monospace>cgcgtaatacg</monospace>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <monospace>221</monospace>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <monospace>cgcgtaatcga</monospace>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <monospace>221</monospace>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <monospace>cgcgtatcggt</monospace>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <monospace>153</monospace>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <monospace>cgcgtattcgg</monospace>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <monospace>153</monospace>
                        </p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>
                           <monospace>cgcgttacgcg</monospace>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <monospace>106</monospace>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <monospace>
                              <b>cgctcgacgta</b>
                           </monospace>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <monospace>153</monospace>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <monospace>cggtcgtacga</monospace>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <monospace>153</monospace>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <monospace>cgtacgaaacg</monospace>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <monospace>221</monospace>
                        </p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>
                           <monospace>cgtacgacgct</monospace>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <monospace>153</monospace>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <monospace>cgtatacgcga</monospace>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <monospace>221</monospace>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <monospace>cgtatagcgcg</monospace>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <monospace>153</monospace>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <monospace>cgtatcggtcg</monospace>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <monospace>153</monospace>
                        </p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>
                           <monospace>cgtattacgcg</monospace>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <monospace>221</monospace>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <monospace>cgtcgactatc</monospace>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <monospace>221</monospace>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <monospace>cgtcgctcgaa</monospace>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <monospace>153</monospace>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <monospace>cgtcgttcgac</monospace>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <monospace>153</monospace>
                        </p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>
                           <monospace>cgttacgcgtc</monospace>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <monospace>153</monospace>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <monospace>cgtttcgtacg</monospace>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <monospace>222</monospace>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <monospace>ctacgcgtcga</monospace>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <monospace>153</monospace>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <monospace>ctcgttcgtcg</monospace>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <monospace>153</monospace>
                        </p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>
                           <monospace>gacgcgtaacg</monospace>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <monospace>153</monospace>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <monospace>gatagtcgacg</monospace>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <monospace>221</monospace>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <monospace>gcgcgacgtta</monospace>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <monospace>153</monospace>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <monospace>gcgcgtaccga</monospace>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <monospace>106</monospace>
                        </p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>
                           <monospace>gcgttcgacgg</monospace>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <monospace>106</monospace>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <monospace>ggtacgcgtaa</monospace>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <monospace>221</monospace>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <monospace>
                              <b>gtatcgcgtcg</b>
                           </monospace>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <monospace>153</monospace>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <monospace>gtccgagcgta</monospace>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <monospace>153</monospace>
                        </p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>
                           <monospace>gtcgaacgacg</monospace>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <monospace>153</monospace>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <monospace>taacgtcgcgc</monospace>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <monospace>153</monospace>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <monospace>tacgcgattcg</monospace>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <monospace>221</monospace>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <monospace>tacgcgcgaca</monospace>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <monospace>153</monospace>
                        </p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>
                           <monospace>tacgctcggac</monospace>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <monospace>153</monospace>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <monospace>tacggtcgcga</monospace>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <monospace>153</monospace>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <monospace>tacgtccgtcg</monospace>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <monospace>153</monospace>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <monospace>
                              <b>tacgtcgagcg</b>
                           </monospace>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <monospace>153</monospace>
                        </p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>
                           <monospace>tagcgtaccga</monospace>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <monospace>221</monospace>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <monospace>tatacgcgtcg</monospace>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <monospace>221</monospace>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <monospace>tatcgcgtcga</monospace>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <monospace>221</monospace>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <monospace>tatgcgtcgcg</monospace>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <monospace>153</monospace>
                        </p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>
                           <monospace>tattatgcgcg</monospace>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <monospace>321</monospace>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <monospace>tattcgcgcga</monospace>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <monospace>221</monospace>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <monospace>tcgacgcgata</monospace>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <monospace>221</monospace>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <monospace>tcgacgcgtag</monospace>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <monospace>153</monospace>
                        </p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>
                           <monospace>tcgatcgtcgg</monospace>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <monospace>153</monospace>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <monospace>tcgattacgcg</monospace>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <monospace>221</monospace>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <monospace>tcgcacgatcg</monospace>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <monospace>153</monospace>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <monospace>tcgccgaatcg</monospace>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <monospace>153</monospace>
                        </p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>
                           <monospace>tcgcgaccgta</monospace>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <monospace>153</monospace>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <monospace>tcgcgacgtaa</monospace>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <monospace>221</monospace>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <monospace>tcgcgcgaata</monospace>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <monospace>221</monospace>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <monospace>tcgcgcgacat</monospace>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <monospace>153</monospace>
                        </p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>
                           <monospace>tcgcgtaatcg</monospace>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <monospace>221</monospace>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <monospace>tcgcgtatacg</monospace>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <monospace>221</monospace>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <monospace>tcggtacgcgc</monospace>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <monospace>106</monospace>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <monospace>tcggtacgcta</monospace>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <monospace>221</monospace>
                        </p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>
                           <monospace>tcgtacgaccg</monospace>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <monospace>153</monospace>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <monospace>tcgtcgacgat</monospace>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <monospace>221</monospace>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <monospace>tcgtcgattcg</monospace>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <monospace>222</monospace>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <monospace>tgtcgcgcgta</monospace>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <monospace>153</monospace>
                        </p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>
                           <monospace>ttaacgtcgcg</monospace>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <monospace>221</monospace>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <monospace>ttacgcgtacc</monospace>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <monospace>221</monospace>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <monospace>ttacgtcgcga</monospace>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <monospace>221</monospace>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <monospace>ttcgagcgacg</monospace>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <monospace>153</monospace>
                        </p>
                     </c>
                  </r>
               </tblbdy>
            </tbl>
            <tbl id="T4">
               <title>
                  <p>Table 4</p>
               </title>
               <caption>
                  <p>GC content of Human, Mouse, <it>Drosophila melanogaster</it>, <it>Caenorhabditis elegans</it>, <it>Saccharomyces cerevisiae</it>, <it>Staphylococcus aureus </it>and <it>Mycoplasma genitalium </it>as well as the GC content of the associated unwords.</p>
               </caption>
               <tblbdy cols="3">
                  <r>
                     <c ca="left">
                        <p>Organism</p>
                     </c>
                     <c ca="center">
                        <p>Genome GC%</p>
                     </c>
                     <c ca="center">
                        <p>Unword GC%</p>
                     </c>
                  </r>
                  <r>
                     <c cspan="3">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <it>H. sapiens</it>
                        </p>
                     </c>
                     <c ca="center">
                        <p>&#8776; 38</p>
                     </c>
                     <c ca="center">
                        <p>&#8776; 45&#8211;72</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <it>M. musculus</it>
                        </p>
                     </c>
                     <c ca="center">
                        <p>&#8776; 40</p>
                     </c>
                     <c ca="center">
                        <p>&#8776; 54&#8211;72</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <it>D. melanogaster</it>
                        </p>
                     </c>
                     <c ca="center">
                        <p>&#8776; 40</p>
                     </c>
                     <c ca="center">
                        <p>&#8776; 45&#8211;90</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <it>C. elegans</it>
                        </p>
                     </c>
                     <c ca="center">
                        <p>&#8776; 35</p>
                     </c>
                     <c ca="center">
                        <p>&#8776; 80</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <it>S. cerevisiae</it>
                        </p>
                     </c>
                     <c ca="center">
                        <p>&#8776; 38</p>
                     </c>
                     <c ca="center">
                        <p>&#8776; 89&#8211;100</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <it>S. aureus</it>
                        </p>
                     </c>
                     <c ca="center">
                        <p>&#8776; 33</p>
                     </c>
                     <c ca="center">
                        <p>&#8776; 50&#8211;100</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <it>M. genitalium</it>
                        </p>
                     </c>
                     <c ca="center">
                        <p>&#8776; 32</p>
                     </c>
                     <c ca="center">
                        <p>&#8776; 66&#8211;100</p>
                     </c>
                  </r>
               </tblbdy>
            </tbl>
            <tbl id="T5">
               <title>
                  <p>Table 5</p>
               </title>
               <caption>
                  <p>Unwords for the Mouse genome.</p>
               </caption>
               <tblbdy cols="6">
                  <r>
                     <c ca="center">
                        <p>
                           <monospace>aacgcgtatcg</monospace>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <monospace>aatcgcgcgat</monospace>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <monospace>acccgcgtacg</monospace>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <monospace>accgcgatacg</monospace>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <monospace>acgaacgtcga</monospace>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <monospace>acgacgcgata</monospace>
                        </p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>
                           <monospace>acgacgtacgg</monospace>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <monospace>acgattcgacg</monospace>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <monospace>acgattcgcgt</monospace>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <monospace>acgcgaaacga</monospace>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <monospace>acgcgaatcgt</monospace>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <monospace>acgcgtcgaaa</monospace>
                        </p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>
                           <monospace>acgcgtcgcga</monospace>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <monospace>acgcgtcgcta</monospace>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <monospace>acggtcgtcga</monospace>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <monospace>acgttcgaacg</monospace>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <monospace>acgttcgaccg</monospace>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <monospace>actcgtcgcga</monospace>
                        </p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>
                           <monospace>atcgacgcgcg</monospace>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <monospace>atcgcgcgatt</monospace>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <monospace>atcgcggtacg</monospace>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <monospace>atcgtaccgcg</monospace>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <monospace>atcgtacgccg</monospace>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <monospace>atcgtcgaccg</monospace>
                        </p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>
                           <monospace>attacgcgcga</monospace>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <monospace>attacgcgcgg</monospace>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <monospace>attacgtcgcg</monospace>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <monospace>attcgcgcgta</monospace>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <monospace>attgcgtcgcg</monospace>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <monospace>cccgatacgcg</monospace>
                        </p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>
                           <monospace>ccgatacgcgc</monospace>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <monospace>ccgcgatacga</monospace>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <monospace>ccgcgcgataa</monospace>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <monospace>ccgcgcgtaat</monospace>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <monospace>ccgcgcgtata</monospace>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <monospace>ccggtcgtacg</monospace>
                        </p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>
                           <monospace>ccgtacgtcgt</monospace>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <monospace>ccgtcgaatcg</monospace>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <monospace>cgaatttcgcg</monospace>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <monospace>cgacgagcgta</monospace>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <monospace>cgacgcgataa</monospace>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <monospace>cgacgcgatac</monospace>
                        </p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>
                           <monospace>cgacgcgtaac</monospace>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <monospace>cgacggatacg</monospace>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <monospace>cgacgtaacgc</monospace>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <monospace>cgacgttaacg</monospace>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <monospace>cgactaacgcg</monospace>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <monospace>cgatacgacga</monospace>
                        </p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>
                           <monospace>cgatacgccga</monospace>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <monospace>cgatacgcgtt</monospace>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <monospace>cgatagtcgcg</monospace>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <monospace>cgatcgacgcg</monospace>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <monospace>cgatcgcgtaa</monospace>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <monospace>cgatcgtacga</monospace>
                        </p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>
                           <monospace>cgatcgtcgca</monospace>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <monospace>cgattcgacgg</monospace>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <monospace>cgattgacgcg</monospace>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <monospace>cgcatatcgcg</monospace>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <monospace>cgccgattacg</monospace>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <monospace>cgcgaaattcg</monospace>
                        </p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>
                           <monospace>cgcgaccgata</monospace>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <monospace>cgcgacgcaat</monospace>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <monospace>cgcgacgtaat</monospace>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <monospace>cgcgactatcg</monospace>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <monospace>cgcgatacgaa</monospace>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <monospace>cgcgatacgac</monospace>
                        </p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>
                           <monospace>cgcgatatcac</monospace>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <monospace>cgcgatatccg</monospace>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <monospace>cgcgatatgcg</monospace>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <monospace>cgcgatcggta</monospace>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <monospace>cgcgcgtaacg</monospace>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <monospace>cgcgcgtcgat</monospace>
                        </p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>
                           <monospace>cgcggtacgat</monospace>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <monospace>cgcgtaacgta</monospace>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <monospace>cgcgtatcggg</monospace>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <monospace>cgcgtcaatcg</monospace>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <monospace>cgcgtcacgta</monospace>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <monospace>cgcgtcgatcg</monospace>
                        </p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>
                           <monospace>cgcgtcgatta</monospace>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <monospace>cgcgttagtcg</monospace>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <monospace>cgctcgacgta</monospace>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <monospace>cggacgtcgta</monospace>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <monospace>cggatatcgcg</monospace>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <monospace>cggcgtacgat</monospace>
                        </p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>
                           <monospace>cggcgtcgtaa</monospace>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <monospace>cgggcgtaacg</monospace>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <monospace>cggtcgaacgt</monospace>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <monospace>cggtcgacgat</monospace>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <monospace>cgtaatcgcga</monospace>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <monospace>cgtaatcggcg</monospace>
                        </p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>
                           <monospace>cgtaccgcgat</monospace>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <monospace>cgtacgaccgg</monospace>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <monospace>cgtacgatcgc</monospace>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <monospace>cgtacgcgggt</monospace>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <monospace>cgtatccgtcg</monospace>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <monospace>cgtatcgcgag</monospace>
                        </p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>
                           <monospace>cgtatcgcggt</monospace>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <monospace>cgtccgatcga</monospace>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <monospace>cgtcgaatcgt</monospace>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <monospace>cgtcgacgagc</monospace>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <monospace>cgtcgcgttaa</monospace>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <monospace>cgtcgcgttag</monospace>
                        </p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>
                           <monospace>cgtcgttacgc</monospace>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <monospace>cgttaacgtcg</monospace>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <monospace>cgttacgcccg</monospace>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <monospace>cgttacgcgcg</monospace>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <monospace>cgttcgaacgt</monospace>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <monospace>cgttcgaccga</monospace>
                        </p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>
                           <monospace>cgttgcgcgaa</monospace>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <monospace>cgttgcgtcga</monospace>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <monospace>ctaacgcgacg</monospace>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <monospace>ctcgcgatacg</monospace>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <monospace>ctcgcgtacga</monospace>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <monospace>gcgatcgtacg</monospace>
                        </p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
            