<?xml version='1.0'?>
<!DOCTYPE art SYSTEM 'http://www.biomedcentral.com/xml/article.dtd'>
<art>
   <ui>1471-2105-4-66</ui>
   <ji>1471-2105</ji>
   <fm>
      <dochead>Methodology article</dochead>
      <bibl rating="0">
         <title>
            <p>Fast and sensitive multiple alignment of large genomic sequences</p>
         </title>
         <aug>
            <au id="A1" ca="yes" ce="no" pa="no" da="no">
               <snm>Brudno</snm>
               <fnm>Michael</fnm>
               <insr iid="I1"/>
               <email>brudno@cs.stanford.edu</email>
            </au>
            <au id="A2" ca="no" ce="no" pa="no" da="no">
               <snm>Chapman</snm>
               <fnm>Michael</fnm>
               <insr iid="I2"/>
               <email>mac54@cus.cam.ac.uk</email>
            </au>
            <au id="A3" ca="no" ce="no" pa="no" da="no">
               <snm>G&#246;ttgens</snm>
               <fnm>Berthold</fnm>
               <insr iid="I2"/>
               <email>bg200@cam.ac.uk</email>
            </au>
            <au id="A4" ca="no" ce="no" pa="no" da="no">
               <snm>Batzoglou</snm>
               <fnm>Serafim</fnm>
               <insr iid="I1"/>
               <email>serafim@cs.stanford.edu</email>
            </au>
            <au id="A5" ca="yes" ce="no" pa="no" da="no">
               <snm>Morgenstern</snm>
               <fnm>Burkhard</fnm>
               <insr iid="I3"/>
               <insr iid="I4"/>
               <email>bmorgen@gwdg.de</email>
            </au>
         </aug>
         <insg>
            <ins id="I1">
               <p>Department of Computer Science, Stanford University, Stanford, CA 94305, USA</p>
            </ins>
            <ins id="I2">
               <p>Department of Haematology, University of Cambridge, Cambridge Institute for Medical Research, Hills Road, Cambridge CB2 2XY, United Kingdom</p>
            </ins>
            <ins id="I3">
               <p>International Graduate School in Bioinformatics and Genome Research, Universit&#228;t Bielefeld, Postfach 100131, 33501 Bielefeld, Germany</p>
            </ins>
            <ins id="I4">
               <p>University of G&#246;ttingen, Institute of Microbiology and Genetics, Goldschmidtstr. 1, 37077 G&#246;ttingen, Germany</p>
            </ins>
         </insg>
         <source>BMC Bioinformatics</source>
         <issn>1471-2105</issn>
         <pubdate>2003</pubdate>
         <volume>4</volume>
         <issue>1</issue>
         <fpage>66</fpage>
         <url>http://www.biomedcentral.com/1471-2105/4/66</url>
         <xrefbib>
            <pubidlist>
               <pubid idtype="pmpid" link="fulltext">14693042</pubid>
               <pubid idtype="doi" link="fulltext">10.1186/1471-2105-4-66</pubid>
            </pubidlist>
         </xrefbib>
      </bibl>
      <history>
         <rec>
            <date>
               <day>01</day>
               <month>9</month>
               <year>2003</year>
            </date>
         </rec>
         <acc>
            <date>
               <day>23</day>
               <month>12</month>
               <year>2003</year>
            </date>
         </acc>
         <pub>
            <date>
               <day>23</day>
               <month>12</month>
               <year>2003</year>
            </date>
         </pub>
      </history>
      <cpyrt>
         <year>2003</year>
         <collab>Brudno et al; licensee BioMed Central Ltd. This is an Open Access article: verbatim copying and redistribution of this article are permitted in all media for any purpose, provided this notice is preserved along with the article's original URL.</collab>
      </cpyrt>
      <abs>
         <sec>
            <st>
               <p>Abstract</p>
            </st>
            <sec>
               <st>
                  <p>Background</p>
               </st>
               <p>Genomic sequence alignment is a powerful method for genome analysis and annotation, as alignments are routinely used to identify functional sites such as genes or regulatory elements. With a growing number of partially or completely sequenced genomes, <it>multiple alignment </it>is playing an increasingly important role in these studies. In recent years, various tools for pair-wise and multiple genomic alignment have been proposed. Some of them are extremely fast, but often efficiency is achieved at the expense of sensitivity. One way of combining speed and sensitivity is to use an <it>anchored-alignment </it>approach. In a first step, a fast search program identifies a chain of strong local sequence similarities. In a second step, regions between these anchor points are aligned using a slower but more accurate method.</p>
            </sec>
            <sec>
               <st>
                  <p>Results</p>
               </st>
               <p>Herein, we present CHAOS, a novel algorithm for rapid identification of chains of local pair-wise sequence similarities. Local alignments calculated by CHAOS are used as anchor points to improve the running time of DIALIGN, a slow but sensitive multiple-alignment tool. We show that this way, the running time of DIALIGN can be reduced by more than 95% for BAC-sized and longer sequences, without affecting the quality of the resulting alignments. We apply our approach to a set of five genomic sequences around the stem-cell-leukemia (SCL) gene and demonstrate that exons and small regulatory elements can be identified by our multiple-alignment procedure.</p>
            </sec>
            <sec>
               <st>
                  <p>Conclusion</p>
               </st>
               <p>We conclude that the novel CHAOS local alignment tool is an effective way to significantly speed up global alignment tools such as DIALIGN without reducing the alignment quality. We likewise demonstrate that the DIALIGN/CHAOS combination is able to accurately align short regulatory sequences in distant orthologues.</p>
            </sec>
         </sec>
      </abs>
   </fm>
   <bdy>
      <sec>
         <st>
            <p>Background</p>
         </st>
         <p>Cross-species sequence comparison is playing an increasingly important role in genome analysis and annotation, see <abbrgrp><abbr bid="B1">1</abbr><abbr bid="B2">2</abbr><abbr bid="B3">3</abbr></abbrgrp> for review. The functional parts of genomes are under selective pressure, and therefore evolve more slowly than non-functional parts, where random mutations can be tolerated without affecting the evolutionary fitness of the organism. Consequently, conserved sequences often correspond to functional elements. Comparative sequence analysis has been used for a variety of purposes, e.g. gene prediction <abbrgrp><abbr bid="B4">4</abbr><abbr bid="B5">5</abbr><abbr bid="B6">6</abbr><abbr bid="B7">7</abbr><abbr bid="B8">8</abbr><abbr bid="B9">9</abbr><abbr bid="B10">10</abbr></abbrgrp>, identification of regulatory elements <abbrgrp><abbr bid="B11">11</abbr><abbr bid="B12">12</abbr><abbr bid="B13">13</abbr><abbr bid="B14">14</abbr><abbr bid="B15">15</abbr><abbr bid="B16">16</abbr><abbr bid="B17">17</abbr></abbrgrp> and identification of signature sequences to detect pathogene microorganisms <abbrgrp><abbr bid="B18">18</abbr></abbrgrp>. One major advantage of comparative approaches is that they are based on simple measurement of sequence similarity and require little additional information about the features to be detected. While more traditional methods need large sets of training data to construct species-specific statistical models of genes or regulatory elements, comparative methods essentially depend on the availability of syntenic sequences at an appropriate evolutionary distance, making them effective for analysis of newly sequenced genomes, when little training data is available.</p>
         <p>In recent years, a number of algorithms have been proposed for pair-wise genomic alignment; these algorithms combine local and global alignment features by returning ordered chains of local similarities. Some approaches use suffix-tree or hashing algorithms to identify pairs of <it>k</it>-mers of a certain minimum length (and, possibly, a maximum number of mismatches) <abbrgrp><abbr bid="B19">19</abbr><abbr bid="B20">20</abbr><abbr bid="B21">21</abbr></abbrgrp>. These methods are extremely time-efficient but are most effective at aligning sequences from closely related genomes, e.g. from different strains of a bacterium <abbrgrp><abbr bid="B19">19</abbr></abbrgrp>. A more flexible approach has been implemented in the PipMaker <abbrgrp><abbr bid="B22">22</abbr></abbrgrp> set of tools, where a local alignment program implementing a gapped BLAST algorithm, BLASTZ <abbrgrp><abbr bid="B23">23</abbr></abbrgrp>, is used.</p>
         <p>A sensitive and versatile tool for <it>multiple </it>alignment of distal sequences is DIALIGN <abbrgrp><abbr bid="B24">24</abbr></abbrgrp>. Originally, this approach has been developed to align protein and DNA sequences of limited length, e.g. <abbrgrp><abbr bid="B25">25</abbr></abbrgrp>, but in more recent studies the program has also been applied to large genomic sequences. G&#246;ttgens <it>et al. </it><abbrgrp><abbr bid="B14">14</abbr><abbr bid="B15">15</abbr></abbrgrp> used DIALIGN to detect small regulatory sites in vertebrate genome sequences. Fitch <it>et al. </it>identified consensus sequences in pathogen viral genomes based on DIALIGN multiple alignments; these consensus sequences were used to identify sequence <it>signatures </it>for pathogen detection <abbrgrp><abbr bid="B18">18</abbr></abbrgrp>. Unfortunately, the use of DIALIGN for analysis of genomic sequences has been limited by the long program running time: the original algorithm for pair-wise alignment required time proportional to the product of the lengths of the input sequences <abbrgrp><abbr bid="B26">26</abbr></abbrgrp>, which is too slow for long sequences.</p>
         <p>One way of combining speed and sensitivity for genomic alignment is to use an <it>anchored-alignment </it>approach. In a first step, a fast search tool is used to identify a chain of high-scoring sequence similarities. These similarities are then used as anchor points for the final alignment, where a more sensitive method aligns those regions that are left over between the identified anchor points. Such an approach was initially proposed by Batzoglou <it>et al. </it><abbrgrp><abbr bid="B6">6</abbr></abbrgrp>. These authors developed <it>GLASS</it>, a system that aligns genomic sequences based on matching <it>k</it>-mers. Obviously, the more dense a chain of anchor points is, the higher is the reduction of the search space and gain in speed for the final procedure &#8211; on the other hand, too many anchor points could overly restrict the search space, leading to decreased alignment quality. The main challenge in the anchored-alignment approach is therefore to find a trade-off between speed and alignment quality &#8211; to locate anchor points that are as dense as possible while still leading to optimal or near-optimal alignments.</p>
      </sec>
      <sec>
         <st>
            <p>Results</p>
         </st>
         <p>In this section we first describe the CHAOS procedure for local alignment of two sequences. We then explain how pair-wise similarities identified by CHAOS can be used as anchor points for pairwise or multiple alignment. Finally, we evaluate our approach in detail, using pair-wise and multiple test data sets.</p>
         <sec>
            <st>
               <p>CHAOS local alignment algorithm</p>
            </st>
            <p>The CHAOS algorithm works by chaining together pairs of similar regions, one from each of the two input DNA sequences; we call such pairs of regions <it>seeds</it>. More precisely, a seed is a pair of words of length <it>k </it>with at least <it>n </it>identical base pairs (<it>bp</it>). A seed <it>s</it><sup>(1) </sup>can be chained to another seed <it>s</it><sup>(2) </sup>whenever (i) the indices of <it>s</it><sup>(1) </sup>in both sequences are higher than the indices of <it>s</it><sup>(2)</sup>, and (ii) <it>s</it><sup>(1) </sup>and <it>s</it><sup>(2) </sup>are "near" each other, with "near" defined by both a distance and a gap criteria as illustrated in Figure <figr fid="F1">1</figr>. The final score of a chain is the total number of matching <it>bp </it>in it. The default parameters used by CHAOS are words of length 10, with a degeneracy of one (n = k-1), a distance and gap criteria of 20 and 5 bp respectively, and a score cutoff of 25. The detailed algorithms used for finding seeds and computing the maximal chains are specified in Methods.</p>
            <fig id="F1">
               <title>
                  <p>Figure 1</p>
               </title>
               <caption>
                  <p>The figure shows a matrix representation of sequence alignment</p>
               </caption>
               <text>
                  <p>The figure shows a matrix representation of sequence alignment. The seed shown can be chained to any seed which lies inside the <it>search box</it>. All seeds located less then <it>distance bp </it>from the <it>current location </it>are stored in a skip list, in which we do a range query for seeds located within a <it>gap </it>cutoff from the diagonal on which the current seed is located. The seeds located in the grey areas are not available for chaining to make the algorithm independent of sequence order.</p>
               </text>
               <graphic file="1471-2105-4-66-1"/>
            </fig>
            <p>After computing the maximal chains, CHAOS scores each chain by using match and mismatch penalties for the letters of each seed. For two seeds seperated by <it>x </it>and base <it>y </it>pairs in the first and second sequences, a gap penalty proportional to |<it>x - y</it>| is incurred. CHAOS throws away chains that score below some threshold <it>t</it>. We augment this scoring method, by adding a rapid rescoring step: chains that score below <it>t </it>are immediately thrown away. Chains that score above <it>t </it>are rescored by performing ungapped extensions in both directions from each seed, and finding the optimal location to insert exactly one gap of size |<it>x - y</it>|. The matches and mismatches can be scored with an arbitrary substitution matrix. CHAOS can be used as a stand-alone program for local sequence alignment or as a pre-processing step to find anchor points for global alignment procedures.</p>
         </sec>
         <sec>
            <st>
               <p>Anchored pair-wise and multiple alignment</p>
            </st>
            <p>In the present study, we use CHAOS to identify <it>chains </it>of local sequence similarities that can be used as anchor points for DIALIGN. Once CHAOS has identified a collection of local alignments for a pair of input sequences, we use an algorithm based on the longest increasing subsequence problem <abbrgrp><abbr bid="B27">27</abbr></abbrgrp> to find the highest scoring chain of local alignments in time <it>O</it>(<it>N </it>log <it>N</it>), where <it>N </it>is the number of local alignments. For <it>pair-wise </it>alignment, this chain is directly used to anchor the DIALIGN alignment as described in <abbrgrp><abbr bid="B28">28</abbr></abbrgrp>.</p>
            <p>For anchored <it>multiple alignment</it>, we proceed as follows: in a first step, we apply CHAOS to all possible pairs of input sequences; this way we obtain a list of similarities that we consider as <it>candidate </it>anchor points. The problem with these similarities is that they may contradict each other, <it>i.e. </it>it may not be possible to include all of them simultaneously in one single multiple alignment. To solve this <it>consistency </it>problem, we use the same greedy algorithm that DIALIGN uses to find consistent sets of local pairwise alignments in the process of multile alignment calculation <abbrgrp><abbr bid="B29">29</abbr></abbrgrp>. A quality score is associated with each of the identified candidate anchors and the set of all candidate anchors is sorted by these scores. Starting with the highest-scoring one, candidate anchors are accepted one-by-one as final anchor points &#8211; provided they are consistent with those candidates that have been accepted previously. Non-consistent similarities are discarded. This way, we finally obtain a consistent set of pair-wise anchor points, <it>i.e. </it>a set of anchor points that would fit into one single multiple alignment, see also <abbrgrp><abbr bid="B29">29</abbr><abbr bid="B24">24</abbr><abbr bid="B30">30</abbr></abbrgrp> where our greedy procedure is explained in the context of the DIALIGN algorithm.</p>
         </sec>
         <sec>
            <st>
               <p>Program Evaluation</p>
            </st>
            <p>It is common practice to evaluate sequence alignment programs by applying them to real-world sequences with known functional sites or 3D structure. For protein alignment, several sets of benchmark sequences are available <abbrgrp><abbr bid="B31">31</abbr><abbr bid="B32">32</abbr><abbr bid="B33">33</abbr></abbrgrp>; they are routinely used as standards of truth to evaluate and compare the performance of multiple alignment programs. For pair-wise comparison of genomic sequences, benchmark data have been compiled by Jareborg <it>et al. </it><abbrgrp><abbr bid="B12">12</abbr></abbrgrp> and Batzoglou <it>et al. </it><abbrgrp><abbr bid="B6">6</abbr></abbrgrp>, these data have been used for comparative gene finding. So far, however, there are no generally accepted reference data with which to evaluate software programs for <it>multiple </it>genomic alignment. Herein, we first use the Jareborg benchmark data to demonstrate that our anchored-alignment procedure improves the running time of DIALIGN by up to two orders of magnitude while the resulting alignments are essentially the same as with the original non-anchored algorithm. Secondly, we apply our method to a set of five genomic sequences around the stem-cell-leukemia (SCL) gene. For all evaluations we start by masking the repeats in the sequences with RepeatMasker. We analyze the resulting multiple alignment in detail and we show that not only is the speed of DIALIGN is improved, but also important functional elements missed by the original DIALIGN can be detected by using the CHAOS anchors. Additional multiple sequence sets are used to demonstrate how the improvement in running time that we achieve depends on the length of the input sequences.</p>
         </sec>
         <sec>
            <st>
               <p>Running time for pair-wise alignment</p>
            </st>
            <p>The Jareborg data set consists of 42 annotated sequence pairs from human and mouse varying in length between less than 6 <it>kb </it>and more than 227 <it>kb</it>, with an average length of 38 <it>kb</it>. These sequences have been used in a paper for a systematic comparison of five different genomic alignment programs <abbrgrp><abbr bid="B10">10</abbr></abbrgrp>. The result of this previous study was that DIALIGN was superior to other methods in terms of alignment quality, but inferior in terms of running time. Since these results have been published previously, we do not repeat the evaluation of DIALIGN for pair-wise alignment. Instead, we focus on how our anchoring procedure affects running time and alignment quality compared with the non-anchored DIALIGN.</p>
            <p>We first applied CHAOS to our data in order to obtain chains of anchor points. Next, we aligned the sequence pairs with DIALIGN, first without anchoring and then using the anchor points identified by CHAOS, and we compared the program running time and quality of the resulting alignments. DIALIGN was run with the <it>translation </it>option where local similarity among DNA sequences is compared at the peptide level, see <abbrgrp><abbr bid="B29">29</abbr></abbrgrp>. When CHAOS is run with default parameters the density of the returned anchor points was, on average, 2.1 anchor points per <it>kb</it>. The results in terms of alignment quality and program running time are summarized in Table <tblr tid="T1">1</tblr>. With a <it>cutoff </it>value of 20 for CHAOS, the program running time of the anchored DIALIGN could be improved by 95% compared to the non-anchored program, while the scores of the resulting alignments were reduced by about 1%. Alignment quality was measured at two distinct levels, (<it>a</it>) by considering the <it>numerical </it>score of the produced alignments and (<it>b</it>) by considering their <it>biological </it>quality. To this end, alignments were compared to annotated protein-coding exons and sensitivity and specificity were measured at the nucleotide level, i.e. a nucleotide that is part of a selected fragment is considered a <it>true positive </it>(TP) if it is also part an annotated exon and as <it>false positives </it>(FP) if it is not; true and false negatives (TN and FN) are defined accordingly. We used the usual measures for prediction accuracy, namely <it>sensitivity </it>= TP/(TP + FN), specificity = <it>TP/(TP + FP)</it>, and <it>approximate correlation </it>= 0.5 ((TP/(TP + FN)+(TP/(TP + FP)+(TN/(TN + FP)+(TN/(TN + FN))-1.</p>
            <tbl id="T1">
               <title>
                  <p>Table 1</p>
               </title>
               <caption>
                  <p>Total CPU time and alignment quality for DIALIGN (D) and DIALIGN anchored with CHAOS (C+D) applied to a set of 42 pairs of genomic sequences from human and mouse <abbrgrp><abbr bid="B12">12</abbr></abbrgrp>. CHAOS was run with varying cutoff parameters. Lower cutoff values for CHAOS produced higher numbers of anchor points resulting in a decreased search space for the final DIALIGN alignment procedure thus leading to improved running time but slightly decreased alignment quality. The average number of anchor points per kilobase is shown (anc./<it>kb</it>). <it>Score </it>is the total <it>numerical </it>score of all produced DIALIGN alignments, <it>i.e. </it>the sum of the scores of the segment pairs in the alignments. As a rough measure of the <it>biological </it>quality of the produced alignments, we compared local sequence similarities identified by DIALIGN and CHAOS to known protein-coding regions. Here, Sn, Sp and AC are <it>sensitivity</it>, <it>specificity </it>and <it>approximate correlation</it>, respectively. For the D and C+D results, DIALIGN was evaluated by comparing <it>all </it>segment pairs contained in the alignment to annotated exons.</p>
               </caption>
               <tblbdy cols="10">
                  <r>
                     <c ca="left">
                        <p>program</p>
                     </c>
                     <c ca="center">
                        <p>cutoff</p>
                     </c>
                     <c ca="center">
                        <p>anc./<it>kb</it></p>
                     </c>
                     <c ca="right">
                        <p>CPU</p>
                     </c>
                     <c ca="right">
                        <p>%CPU</p>
                     </c>
                     <c ca="right">
                        <p>score</p>
                     </c>
                     <c ca="right">
                        <p>%score</p>
                     </c>
                     <c ca="center">
                        <p>Sn</p>
                     </c>
                     <c ca="center">
                        <p>Sp</p>
                     </c>
                     <c ca="center">
                        <p>AC</p>
                     </c>
                  </r>
                  <r>
                     <c cspan="10">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>D</p>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c ca="right">
                        <p>179,001</p>
                     </c>
                     <c ca="right">
                        <p>100.0</p>
                     </c>
                     <c ca="right">
                        <p>54,214</p>
                     </c>
                     <c ca="right">
                        <p>100.0</p>
                     </c>
                     <c ca="center">
                        <p>83</p>
                     </c>
                     <c ca="center">
                        <p>40</p>
                     </c>
                     <c ca="center">
                        <p>57</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>C+D</p>
                     </c>
                     <c ca="center">
                        <p>35</p>
                     </c>
                     <c ca="center">
                        <p>1.4</p>
                     </c>
                     <c ca="right">
                        <p>14,334</p>
                     </c>
                     <c ca="right">
                        <p>8.0</p>
                     </c>
                     <c ca="right">
                        <p>53,839</p>
                     </c>
                     <c ca="right">
                        <p>99.3</p>
                     </c>
                     <c ca="center">
                        <p>83</p>
                     </c>
                     <c ca="center">
                        <p>40</p>
                     </c>
                     <c ca="center">
                        <p>57</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>C+D</p>
                     </c>
                     <c ca="center">
                        <p>30</p>
                     </c>
                     <c ca="center">
                        <p>1.7</p>
                     </c>
                     <c ca="right">
                        <p>11,717</p>
                     </c>
                     <c ca="right">
                        <p>6.5</p>
                     </c>
                     <c ca="right">
                        <p>53,820</p>
                     </c>
                     <c ca="right">
                        <p>99.2</p>
                     </c>
                     <c ca="center">
                        <p>83</p>
                     </c>
                     <c ca="center">
                        <p>40</p>
                     </c>
                     <c ca="center">
                        <p>57</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>C+D</p>
                     </c>
                     <c ca="center">
                        <p>25</p>
                     </c>
                     <c ca="center">
                        <p>2.1</p>
                     </c>
                     <c ca="right">
                        <p>11,485</p>
                     </c>
                     <c ca="right">
                        <p>6.4</p>
                     </c>
                     <c ca="right">
                        <p>53,654</p>
                     </c>
                     <c ca="right">
                        <p>98.9</p>
                     </c>
                     <c ca="center">
                        <p>83</p>
                     </c>
                     <c ca="center">
                        <p>40</p>
                     </c>
                     <c ca="center">
                        <p>57</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>C+D</p>
                     </c>
                     <c ca="center">
                        <p>20</p>
                     </c>
                     <c ca="center">
                        <p>2.8</p>
                     </c>
                     <c ca="right">
                        <p>8,964</p>
                     </c>
                     <c ca="right">
                        <p>5.0</p>
                     </c>
                     <c ca="right">
                        <p>53,642</p>
                     </c>
                     <c ca="right">
                        <p>98.9</p>
                     </c>
                     <c ca="center">
                        <p>83</p>
                     </c>
                     <c ca="center">
                        <p>40</p>
                     </c>
                     <c ca="center">
                        <p>57</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>C+D</p>
                     </c>
                     <c ca="center">
                        <p>15</p>
                     </c>
                     <c ca="center">
                        <p>4.2</p>
                     </c>
                     <c ca="right">
                        <p>7,404</p>
                     </c>
                     <c ca="right">
                        <p>4.1</p>
                     </c>
                     <c ca="right">
                        <p>53,208</p>
                     </c>
                     <c ca="right">
                        <p>98.1</p>
                     </c>
                     <c ca="center">
                        <p>82</p>
                     </c>
                     <c ca="center">
                        <p>41</p>
                     </c>
                     <c ca="center">
                        <p>57</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>C+D</p>
                     </c>
                     <c ca="center">
                        <p>10</p>
                     </c>
                     <c ca="center">
                        <p>6.5</p>
                     </c>
                     <c ca="right">
                        <p>6,696</p>
                     </c>
                     <c ca="right">
                        <p>3.7</p>
                     </c>
                     <c ca="right">
                        <p>52,684</p>
                     </c>
                     <c ca="right">
                        <p>97.1</p>
                     </c>
                     <c ca="center">
                        <p>82</p>
                     </c>
                     <c ca="center">
                        <p>41</p>
                     </c>
                     <c ca="center">
                        <p>57</p>
                     </c>
                  </r>
               </tblbdy>
            </tbl>
         </sec>
         <sec>
            <st>
               <p>Multiple alignment of the stem-cell-leukemia (SCL) region</p>
            </st>
            <p>To test the combined CHAOS-DIALIGN algorithm for multiple alignment, we used a set of five genomic sequences around the stem cell leukaemia (SCL) gene. SCL is a critical regulator of haematopoiesis, with a pattern of expression that is conserved in all species studied, from mammals to teleost fish <abbrgrp><abbr bid="B34">34</abbr></abbrgrp>. Locations of the exons and of a number of important regulatory regions have been previously experimentally determined. We took SCL sequence from immediately after the upstream gene to the end of the sequence or just after the downstream gene &#8211; whichever was longer &#8211; in five species: human, mouse, chicken, pufferfish, and zebrafish. We aligned these with DIALIGN, both with and without prior CHAOS anchoring. We then examined the alignments for regions of sequence conservation between all five species.</p>
            <p>A total of 265,145 bases were aligned. With a new <it>mixed-alignment </it>option and the -o option, the combined CHAOS-DIALIGN algorithm completed the task in 1 hour and 35 minutes while the non-anchored DIALIGN took 6 hours and 6 minutes. <it>Mixed-alignments </it>means that local similarities are evaluated in two ways, at the <it>nucleotide </it>level and at the <it>peptide </it>level where segments are translated according to the genetic code and the resulting peptide segments are compared. This option is appropriate where genomic sequences are aligned that may contain coding as well as non-coding homologies but it is relatively time consuming. The -o option is used for reduced running time, see the DIALIGN user guide for details. By contrast, if our sequences were compared at the peptide level only, the running time was 13.8 minutes with the anchoring procedure and 49.2 minutes with the non-anchored version of DIALIGN. These test runs were carried out on a Linux PC with a 2.4 GHz Pentium 4 processor. With both program options, the running-time improvement achieved by CHAOS anchoring procedure was more than 70 percent while the <it>numerical </it>score of the output alignments differed by less than 1 percent ('translated' option) and less than 0.1 percent ('mixed alignment' option).</p>
            <p>Of the four fish SCL exons, all of which have homologues in the higher species <abbrgrp><abbr bid="B35">35</abbr><abbr bid="B15">15</abbr></abbrgrp>, the three coding exons were successfully aligned across all species by both algorithms. The downstream gene, membrane associated protein-17 (MAP17), is not present in pufferfish and contains four, rather short, exons. Moreover, the chicken sequence only extends to the first of these. It is therefore perhaps not surprising that these were only aligned between human and mouse by both algorithms. Within the non-coding DNA, one further region of homology across all species was identified (see Figure <figr fid="F2">2</figr>). This region just upstream of exon 1 has promoter activity in haematopoietic cell lines and also contains a midbrain enhancer <abbrgrp><abbr bid="B36">36</abbr><abbr bid="B37">37</abbr><abbr bid="B38">38</abbr></abbrgrp>. Within this region and in all species, CHAOS-DIALIGN perfectly aligned five motifs, each of which is essential for the appropriate pattern or level of SCL transcription <abbrgrp><abbr bid="B36">36</abbr><abbr bid="B37">37</abbr><abbr bid="B38">38</abbr><abbr bid="B39">39</abbr></abbrgrp>. Unanchored DIALIGN misaligned the first GATA binding site; otherwise, alignments of the SCL promoter were identical. In the immediate downstream region, within the non-coding exon 1, a further motif was identified by CHAOS-DIALIGN alone. This represents a perfect binding consensus (5 -AANATGGC-3) for the zinc finger transcription factor YY1 <abbrgrp><abbr bid="B40">40</abbr></abbrgrp>. This motif was conserved in all five species and may act as a transcriptional enhancer for the nearby promoter. Alternatively, it may be an RNA-binding element involved in post-transcriptional processing. There is one further non-coding sequence known to be conserved in the five species, but which is not aligned by either DIALIGN algorithm &#8211; the AAUAAA polyadenylation sequence <abbrgrp><abbr bid="B15">15</abbr></abbrgrp>. However, previous alignment of this region was only possible with <it>a priori </it>knowledge of its existence and following extraction and local alignment of the relevant sequences. Other multiple alignment algorithms (MAVID <abbrgrp><abbr bid="B41">41</abbr></abbrgrp>, LAGAN <abbrgrp><abbr bid="B42">42</abbr></abbrgrp>) also fail to align this region. It is interesting to note that in two cases the CHAOS/DIALIGN combination produces biologically superior alignments than unanchored DIALIGN. This is likely due to the anchor points limiting the search area of DIALIGN and not allowing it to accept a numerically superior alignment that is incorrect biologically.</p>
            <fig id="F2">
               <title>
                  <p>Figure 2</p>
               </title>
               <caption>
                  <p>CHAOS-DIALIGN correctly aligns the SCL promoter and a conserved non-coding sequence in exon 1</p>
               </caption>
               <text>
                  <p>CHAOS-DIALIGN correctly aligns the SCL promoter and a conserved non-coding sequence in exon 1. The alignment was extracted from the CHAOS-DIALIGN global alignment of SCL sequences from human, mouse, chicken, zebrafish, and pufferfish. Consensus binding motifs are labelled. All except YY1 have been previously demonstrated to be essential for the appropriate pattern or level of SCL expression. The factors binding conserved sequence (CS) 1 and 2 are unknown. Shading of bases is at (grey) and (black) conservation.</p>
               </text>
               <graphic file="1471-2105-4-66-2"/>
            </fig>
         </sec>
         <sec>
            <st>
               <p>Running time for longer sequences</p>
            </st>
            <p>We wanted to explore how the relative improvement in program running time that we achieved by our anchoring method depends on the length of the input sequences. The main benefit of reduced running time of DIALIGN is that this way the program becomes applicable to genomic sequences that were previously beyond its scope, so we wanted to estimate the behavior of the running time for very long sequences. It has been previously shown in <abbrgrp><abbr bid="B42">42</abbr></abbrgrp> that given certain assumptions about the distribution of anchor points on the sequences the running time of an anchored alignment algorithm would be linear in the sequence lengths. In reality, it is difficult to predict the distribution of distances between anchor points since this depends, of course, on the sequences being compared. Nevertheless, for our data we could confirm that the relative improvement in running time for pairwise sequences was far more significant for longer sequences than for shorter ones (Figure <figr fid="F3">3</figr>).</p>
            <fig id="F3">
               <title>
                  <p>Figure 3</p>
               </title>
               <caption>
                  <p>Relative improvement in program running time for 42 pairs of genomic sequences form human and mouse of different length</p>
               </caption>
               <text>
                  <p>Relative improvement in program running time for 42 pairs of genomic sequences form human and mouse of different length. Each point represents one sequence pair. The <it>x</it>-axis is the medium sequence length of sequence pairs while the <it>y</it>-axis is the relative running time of the anchored-alignment procedure compared to the non-anchored procedure.</p>
               </text>
               <graphic file="1471-2105-4-66-3"/>
            </fig>
            <p>The SCL sequences that we used as a test example for multiple alignment were only 53 <it>kb </it>in length, so we did two additional test runs to test the performance of our approach for longer, multiple sequence sets. First, we applied the anchored and non-anchored procedures to a set of three genomic sequences from human, mouse and dog from the interleukin region <abbrgrp><abbr bid="B43">43</abbr></abbrgrp> with an average length of 222 <it>kb</it>. We used the <it>translation </it>option together with the -o option. Without anchoring, the running time of DIALIGN was 8 <it>h </it>and 36 <it>min </it>; with anchor points created by CHAOS, the running time was reduced to only 24 <it>min </it>and 40 <it>s</it>, so the CPU time was reduced by more than 95%. At the same time all the annotated features (all exons and known reguatory sequences) were properly aligned. The numerical score of the anchored alignment was 1.5% below the score of the non-anchored alignment. As a third example, we aligned syntenic sequences from human chromosome 20, mouse chromosome 2 and rat chromosome 3 that had an average length of more than 1 <it>MB</it>. The anchored program run terminated after 8 <it>h </it>and 17 <it>min</it>. We did not complete the non-anchored run but based on the first 2 days we estimated that without anchoring, the program would have terminated after 18 days, so for these sequences, the running time was reduced by around 98%.</p>
         </sec>
      </sec>
      <sec>
         <st>
            <p>Discussion</p>
         </st>
         <p>Multiple alignment of large genomic sequences is now a crucial tool for genome data analysis and annotation. Several studies demonstrated that DIALIGN is a highly efficient and versatile tool for this purpose. It has been used to identify biological relevant signals in raw sequence data, such as regulatory elements <abbrgrp><abbr bid="B14">14</abbr><abbr bid="B16">16</abbr><abbr bid="B44">44</abbr><abbr bid="B45">45</abbr></abbrgrp> or protein-coding regions <abbrgrp><abbr bid="B10">10</abbr></abbrgrp> and a new gene-prediction program called AGenDA (<b>A</b>lignment-based <b>Gen</b>e <b>D</b>etection <b>A</b>lgorithm) has been developed that relies on DIALIGN alignments as input information <abbrgrp><abbr bid="B9">9</abbr><abbr bid="B46">46</abbr></abbrgrp>. Most recently, DIALIGN has been succesfully used to identify signature patterns for pathogen microorganisms <abbrgrp><abbr bid="B18">18</abbr></abbrgrp>. However, DIALIGN was originally designed to align protein and short DNA sequences and its application to genomic sequences was severely limited by the long program running time. To make the program applicable to larger sequences, we implemented an <it>anchored-alignment </it>option where pre-defined anchor points can be used to reduce the search space and running time of the alignment procedure. To identify appropriate anchor points, we developed a fast similarity search tool called <it>CHAOS</it>. With the new anchoring option and anchor points created by CHAOS, DIALIGN can now be applied to data sets that were previously beyond its scope.</p>
         <p>Most of the methods for heuristic local alignment, such as BLAST <abbrgrp><abbr bid="B47">47</abbr></abbrgrp> and FASTA <abbrgrp><abbr bid="B48">48</abbr></abbrgrp> were developed when the bulk of available sequence were proteins. It has been shown that such algorithms are not as efficient in aligning non-coding sequences <abbrgrp><abbr bid="B49">49</abbr></abbrgrp>. With the new availability of genomic sequences it is appropriate to refine the algorithms used for local alignment so that they more closely reflect the fashion in which the genomic sequences are conserved. Unlike other fast algorithms for genomic alignment, CHAOS does not depend on long exact matches, does not require extensive ungapped homology, and allows mismatches in seeds, all of which are important when comparing distantly related organisms or non-coding regions, where conservation is generally much poorer than in coding areas.</p>
         <p>Some previous algorithms for anchored global alignment have worked by first identifying very strong local similarities among the input sequences and adding weaker similarities later. The problem with this approach is that one high-scoring spurious match can lead to a wrong output alignment while many weaker but biologically important homologies may be missed. By contrast, CHAOS searches for the <it>highest scoring </it>chain of local alignments. This way, a numerically high-scoring but biologically wrong local alignment can be conterbalanced by a chain of several weaker local alignments &#8211; provided that the total score of these alignments exceeds the score of the one wrong alignment.</p>
         <p>We demonstrate that the chains of local alignments returned by CHAOS can be used to anchor the DIALIGN alignment procedure, significantly improving the alignment speed, without affecting the quality of the output alignments. To compare the quality of the anchored and non-anchored alignments, we applied both versions of the program to a database of genomic sequence pairs from human and mouse. We compared the <it>numerical </it>scores of the resulting alignments as well as their <it>biological </it>quality. For <it>multiple </it>genomic alignment, no benchmark data are presently available to compare the perfomance of different alignment algorithms systematically. However, the first step in the DIALIGN multiple-alignment procedure is the pair-wise alignment of all possible pairs of input sequences; fragments of these pair-wise alignments are then used to assemble a multiple alignment. Thus, the results that we obtained for pair-wise alignment can be directly applied to multiple alignment.</p>
         <p>We could confirm this in a detailed study of a set of five genomic sequences around the <it>stem cell leukemia (SCL) </it>gene from vertebrates ranging from fish to human. As with our test runs for pair-wise alignment, the anchoring procedure led to a considerable improvement in running time while the output alignments were virtually the same as without anchoring. The <it>numerical </it>scores of the anchored multiple alignments differed by less than 1 percent from the scores of the non-anchored alignments and, again, the <it>biological </it>quality of the anchored alignments was even improved. For the SCL sequences, the improvement in running time was less dramatic than with the human-mouse seuqence pairs used to evaluate the pair-wise alignment procedure. There are two obvious reasons for this result. (<it>a</it>) The SCL sequences are shorter than the sequences used for pair-wise alignment and, as discussed above, the relative improvement in running time increases with sequence length. (<it>b</it>) The SCL sequences are more distantly related than the human-mouse sequence pairs. Thus, the <it>density </it>of anchoring points identified by CHAOS is lower than in the previous examples.</p>
         <p>In the SCL example, we demonstrated that our method is able to identify small regulatory elements. It should be mentioned that there are a number of limitations associated with distal species comparisons for the identification of putative regulatory regions. In the SCL locus, many known mammalian enhancers cannot be identified in chicken or fish species <abbrgrp><abbr bid="B15">15</abbr><abbr bid="B14">14</abbr></abbrgrp>. This may be because sequence divergence is so extensive as to mask short regulatory motifs. In support of this is the observation that some functional regions (e.g. exon 1 and the polyA site) could be aligned only with <it>a priori </it>knowledge of their location, extraction of the surrounding sequence, and subsequent local alignment <abbrgrp><abbr bid="B15">15</abbr></abbrgrp>. Alternatively, it may be because regulatory mechanisms differ. An example of this is provided by the enhancer of the IgH locus in catfish, which is capable of activity in mammalian transgenics, but which differs both in its location and critical regulatory motifs between fish and mammals <abbrgrp><abbr bid="B50">50</abbr></abbrgrp>. Where non-coding homology in distal comparisons exists, it is usually a powerful indicator of the presence of a regulatory region. The CHAOS-DIALIGN algorithm was capable of detecting the SCL promoter in a five-way alignment of sequences from human, mouse, chicken, pufferfish, and zebrafish. Furthermore, it correctly aligned all the critical motifs within this region, and a further YY1 motif in exon 1. As discussed above, homology in all five species for this latter motif has only previously been demonstrated following extraction and local alignment of the relevant sequences using DIALIGN <abbrgrp><abbr bid="B15">15</abbr></abbrgrp>. Other multiple alignment algorithms (MAVID <abbrgrp><abbr bid="B41">41</abbr></abbrgrp>, LAGAN <abbrgrp><abbr bid="B42">42</abbr></abbrgrp>) fail to align this motif. Therefore, with the SCL dataset, the quality of the CHAOS-DIALIGN output in terms of biological relevance is superior to that of other multiple global alignment tools. It is also better than that of unanchored DIALIGN and, at the same time, the anchored program is between one and two orders of magnitude faster.</p>
         <p>Finally, we want to emphasize the need for further work in the general area of multiple alignments. Perhaps the most pressing problem right now is the inability of researchers to evaluate the alignment programs except by looking at examples which have been annotated by biologists. At the same time the methods that simulate evolution of DNA sequences, such as ROSE <abbrgrp><abbr bid="B51">51</abbr></abbrgrp>, are unable to create biologically realistic sequences. Thus it is necessary to create some measure of alignment quality that is based on real sequences without biological annotation.</p>
      </sec>
      <sec>
         <st>
            <p>Conclusion</p>
         </st>
         <p>In this paper, we present a fast local pair-wise alignment tool called <it>CHAOS </it>(CHAins Of Seeds); we use this program to speed up the DIALIGN program. For a pair of input sequences, CHAOS returns a chain of local sequence alignments that can be used as anchor points to reduce the search space and running time of any sensitive global alignment procedure: it has also been used for anchoring in the LAGAN <abbrgrp><abbr bid="B42">42</abbr></abbrgrp> alignment tool. We extend the anchoring approach to the problem of <it>multiple </it>alignment of large genomic sequences. Multiple alignments are likely to contain much more information about functional sites than pair-wise alignments, and with the increasing amount of genome sequence data, the development of methods for multiple alignment is a high priority.</p>
         <p>Systematic test runs with pair-wise alignments demonstrate that this way the running time of DIALIGN can be reduced by one to two orders of magnitude while the quality of the resulting alignments is only minimally affected. Moreover, the relative improvement in speed increases with the length of the input sequences, making our approach particularly effective for alignment of large genomic sequences.</p>
         <p>We also applied CHAOS/DIALIGN to a set of five genomic sequences from human, mouse, chicken, zebrafish, and pufferfish around the stem-cell-leukemia (SCL) locus. Our method correctly aligned three coding exons and five motifs involved in transcription regulation. To make our method easily available for the scientific community, we set up an internet server where CHAOS/DIALIGN can be used through a WWW interface.</p>
      </sec>
      <sec>
         <st>
            <p>Methods</p>
         </st>
         <p>In this section we describe the details of the CHAOS local alignment algorithm.</p>
         <sec>
            <st>
               <p>Finding the seeds</p>
            </st>
            <p>Formally, a seed is a pair of words of length <it>k </it>with at least <it>n </it>identical base pairs (<it>bp</it>). The seeds are located using a simplified version of the Aho-Corasick <abbrgrp><abbr bid="B52">52</abbr></abbrgrp> algorithm. A variation on the <it>trie </it>data structure <abbrgrp><abbr bid="B53">53</abbr></abbrgrp> which we call a <it>threaded trie </it>(T-trie) is used to store the <it>k</it>-mers of one sequence. A trie is a tree for storing strings in which there is one node for every common prefix. A node which corresponds to the word <it>w</it><sub>1</sub>...<it>w</it><sub><it>p </it></sub>would have as its parent a node that corresponds to <it>w</it><sub>1</sub>...<it>w</it><sub><it>p</it>-1</sub>. A trie that contains all of the <it>k</it>-mers of some string has each leaf at depth <it>k</it>, and each leaf stores all of the locations where this <it>k</it>-mer occurs in the indexed sequence.</p>
            <p>A T-trie differs from a regular trie in that a node that corresponds to the string <it>w</it><sub>1</sub>...<it>w</it><sub><it>p </it></sub>will also have a <it>back pointer </it>to the node which corresponds to <it>w</it><sub>2</sub>...<it>w</it><sub><it>p</it></sub>. We start by inserting into the T-trie all of the <it>k</it>-mers of one of the sequences, which we will call the <it>database</it>. Then we do a "walk" using the other, <it>query </it>sequence, where we start by making the root of the T-trie our current node, and for every letter of the query:</p>
            <p><b>1. </b>If the <it>current </it>node has a child corresponding to this letter we make this child our current node, and return any seeds stored in it,</p>
            <p><b>2. </b>Otherwise make the node pointed to by our <it>back pointer </it>our <it>current </it>node, and return to step 1.</p>
            <p>As an illustration of why this method works well in practice, assume that all of the possible <it>k</it>-mers are present in the database (which is most likely the case). Then, finding the <it>k</it>-mers that correspond to the next letter of the query requires only two pointer operations: the first is to follow a back pointer from the <it>k </it>level node which is our <it>current </it>node, the second to follow a down pointer from the resulting node to the appropriate child. Because in practice most k-mers will be present in the database sequence this process will work quickly. To allow degeneracy we permit multiple current nodes, which correspond to the possible degenerate words. It also offers a space saving over the traditional Aho-Corasick automaton as it requires the storage of one rather than four "failure links".</p>
         </sec>
         <sec>
            <st>
               <p>Chaining the seeds</p>
            </st>
            <p>A seed <it>s</it><sup>(1) </sup>can be chained to another seed <it>s</it><sup>(2) </sup>whenever (i) the indices of <it>s</it><sup>(1) </sup>in both sequences are higher than the indices of <it>s</it><sup>(2)</sup>, and (ii) <it>s</it><sup>(1) </sup>and <it>s</it><sup>(2) </sup>are "near" each other, with "near" defined by both a distance and a gap criteria as illustrated in Figure <figr fid="F1">1</figr>.</p>
            <p>To find the chains of seeds we use the following algorithm. Let <it>D </it>be the maximum distance between two adjacent seeds. The seeds generated while examining the last <it>D </it>base pairs of the query sequence are stored in a skip list, a probabilistic data structure that allows for fast searches and easy in-order traversal of its elements <abbrgrp><abbr bid="B54">54</abbr></abbrgrp>. The seeds are ordered by the difference of its indices in the two sequences (<it>diagonal number</it>). For each seed <it>s </it>found at the <it>current location </it>do a search in the skip list for previously stored seeds which have diagonal numbers within the permitted gap criterion of the diagonal number of <it>s</it>. We thus find the possible previous seeds with which <it>s </it>can be chained. The highest scoring chain is picked, and this chain can be further extended by future seeds. In order to enforce the distance criterion we then remove from the skip list all seeds which were generated <it>D </it>base pairs from the positions of the new seeds, and insert the new seeds into the skip list.</p>
         </sec>
      </sec>
      <sec>
         <st>
            <p>Availability and requirements</p>
         </st>
         <p>The combined CHAOS-DIALIGN software is available online at <it>G&#246;ttingen Bioinformatics Compute Server (GoBiCS)</it>: <url>http://dialign.gobics.de/chaos-dialign-submission</url></p>
         <p>The source code for CHAOS is available at: <url>http://www.cs.stanford.edu/~brudno/chaos/</url> together with a PERL script that transforms CHAOS output to the format that can be used to anchor DIALIGN. A version of DIALIGN that accepts such anchors is available at: <url>http://bibiserv.techfak.uni-bielefeld.de/dialign/</url></p>
      </sec>
      <sec>
         <st>
            <p>Authors Contribution</p>
         </st>
         <p>MB developed CHAOS and drafted parts of the manuscript. MC and BG analyzed the <it>SCL </it>genomic sequences and drafted parts of the manuscript. SB contributed ideas to CHAOS development and rewrote portions of the manuscript. BM implemented the new version of DIALIGN and drafted parts of the manuscript. All authors read and approved the final manuscript.</p>
      </sec>
   </bdy>
   <bm>
      <ack>
         <sec>
            <st>
               <p>Acknowledgements</p>
            </st>
            <p>The authors would like to thank Chuong B. Do for help with CHAOS development, Nadine Werner for assistance with the manuscript, and Inna Dubchak for valuable conversations during this study. Rasmus Steinkamp developed the WWW interface for the CHAOS/DIALIGN software at GoBiCS. MB is supported by the NSF Graduate Research Fellowship. MC and BG are supported by the Wellcome Trust and the Leukaemia Research Fund. The work is partly supported by DFG grant MO 1048/1-1.</p>
         </sec>
      </ack>
      <refgrp>
         <bibl id="B1" rating="0">
            <title>
               <p>Comparison of genomic DNA sequences: solved and unsolved problems</p>
            </title>
            <aug>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Miller</snm>
                  <fnm>W</fnm>
               </au>
            </aug>
            <source>Bioinformatics</source>
            <pubdate>2001</pubdate>
            <volume>17</volume>
            <fpage>391</fpage>
            <lpage>397</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi" link="fulltext">10.1093/bioinformatics/17.5.391</pubid>
                  <pubid idtype="pmpid" link="fulltext">11331233</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B2" rating="0">
            <title>
               <p>Cross-species sequence comparisons: A review of methods and available resources</p>
            </title>
            <aug>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Frazer</snm>
                  <fnm>KA</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Elnitski</snm>
                  <fnm>L</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Church</snm>
                  <fnm>DM</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Dubchak</snm>
                  <fnm>I</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Hardison</snm>
                  <fnm>RC</fnm>
               </au>
            </aug>
            <source>Genome Research</source>
            <pubdate>2003</pubdate>
            <volume>13</volume>
            <fpage>1</fpage>
            <lpage>12</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi" link="fulltext">10.1101/gr.222003</pubid>
                  <pubid idtype="pmpid" link="fulltext">12529301</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B3" rating="0">
            <title>
               <p>An applications-focused review of comparative genomics tools: capabilities, limitations, and future challenges</p>
            </title>
            <aug>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Chain</snm>
                  <fnm>P</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Kurtz</snm>
                  <fnm>S</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Ohlebusch</snm>
                  <fnm>E</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Slezak</snm>
                  <fnm>T</fnm>
               </au>
            </aug>
            <source>Briefings in Bioinformatics</source>
            <pubdate>2003</pubdate>
            <volume>4</volume>
            <fpage>105</fpage>
            <lpage>123</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">12846393</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B4" rating="0">
            <title>
               <p>Gene recognition via spliced sequence alignment</p>
            </title>
            <aug>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Gelfand</snm>
                  <fnm>MS</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Mironov</snm>
                  <fnm>AA</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Pevzner</snm>
                  <fnm>PA</fnm>
               </au>
            </aug>
            <source>Proc Natl Acad Sci USA</source>
            <pubdate>1996</pubdate>
            <volume>93</volume>
            <issue>17</issue>
            <fpage>9061</fpage>
            <lpage>9066</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid" link="fulltext">38595</pubid>
                  <pubid idtype="pmpid" link="fulltext">8799154</pubid>
                  <pubid idtype="doi" link="fulltext">10.1073/pnas.93.17.9061</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B5" rating="0">
            <title>
               <p>The conserved exon method for gene finding</p>
            </title>
            <aug>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Bafna</snm>
                  <fnm>V</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Huson</snm>
                  <fnm>DH</fnm>
               </au>
            </aug>
            <source>Bioinformatics</source>
            <pubdate>2000</pubdate>
            <volume>16</volume>
            <fpage>190</fpage>
            <lpage>202</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi" link="fulltext">10.1093/bioinformatics/16.3.190</pubid>
                  <pubid idtype="pmpid" link="fulltext">10869012</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B6" rating="0">
            <title>
               <p>Human and mouse gene structure: comparative analysis and application to exon prediction</p>
            </title>
            <aug>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Batzoglou</snm>
                  <fnm>S</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Pachter</snm>
                  <fnm>L</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Mesirov</snm>
                  <fnm>JP</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Berger</snm>
                  <fnm>B</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Lander</snm>
                  <fnm>ES</fnm>
               </au>
            </aug>
            <source>Genome Research</source>
            <pubdate>2000</pubdate>
            <volume>10</volume>
            <issue>7</issue>
            <fpage>950</fpage>
            <lpage>958</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi" link="fulltext">10.1101/gr.10.7.950</pubid>
                  <pubid idtype="pmpid" link="fulltext">10899144</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B7" rating="0">
            <title>
               <p>Integrating genomic homology into gene structure prediction</p>
            </title>
            <aug>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Korf</snm>
                  <fnm>I</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Flicek</snm>
                  <fnm>P</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Duan</snm>
                  <fnm>D</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Brent</snm>
                  <fnm>MR</fnm>
               </au>
            </aug>
            <source>Bioinformatics</source>
            <pubdate>2001</pubdate>
            <issue>17</issue>
            <fpage>S140</fpage>
            <lpage>S148</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">11473003</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B8" rating="0">
            <title>
               <p>SGP-1: Prediction and validation of homologous genes based on sequence alignments</p>
            </title>
            <aug>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Wiehe</snm>
                  <fnm>T</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Gebauer-Jung</snm>
                  <fnm>S</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Mitchell-Olds</snm>
                  <fnm>T</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Guig&#243;</snm>
                  <fnm>R</fnm>
               </au>
            </aug>
            <source>Genome Research</source>
            <pubdate>2001</pubdate>
            <volume>11</volume>
            <fpage>1574</fpage>
            <lpage>1583</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi" link="fulltext">10.1101/gr.177401</pubid>
                  <pubid idtype="pmpid" link="fulltext">11544202</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B9" rating="0">
            <title>
               <p>AGenDA: Gene prediction by comparative sequence analysis</p>
            </title>
            <aug>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Rinner</snm>
                  <fnm>O</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Morgenstern</snm>
                  <fnm>B</fnm>
               </au>
            </aug>
            <source>In Silico Biology</source>
            <pubdate>2002</pubdate>
            <volume>2</volume>
            <fpage>195</fpage>
            <lpage>205</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">12542405</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B10" rating="0">
            <title>
               <p>Exon discovery by genomic sequence alignment</p>
            </title>
            <aug>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Morgenstern</snm>
                  <fnm>B</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Rinner</snm>
                  <fnm>O</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Abdedda&#239;m</snm>
                  <fnm>S</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Haase</snm>
                  <fnm>D</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Mayer</snm>
                  <fnm>K</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Dress</snm>
                  <fnm>A</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Mewes</snm>
                  <fnm>H-W</fnm>
               </au>
            </aug>
            <source>Bioinformatics</source>
            <pubdate>2002</pubdate>
            <volume>18</volume>
            <fpage>777</fpage>
            <lpage>787</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi" link="fulltext">10.1093/bioinformatics/18.6.777</pubid>
                  <pubid idtype="pmpid" link="fulltext">12075013</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B11" rating="0">
            <title>
               <p>Locus control regions of mammalian &#946;-globin gene clusters: combining phylo-genetic analyses and experimental results to gain functional insights</p>
            </title>
            <aug>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Hardison</snm>
                  <fnm>R</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Slightom</snm>
                  <fnm>JL</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Gumucio</snm>
                  <fnm>DL</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Goodman</snm>
                  <fnm>M</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Stojanovic</snm>
                  <fnm>N</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Miller</snm>
                  <fnm>W</fnm>
               </au>
            </aug>
            <source>Gene</source>
            <pubdate>1998</pubdate>
            <volume>205</volume>
            <fpage>73</fpage>
            <lpage>94</lpage>
            <xrefbib>
               <pubid idtype="doi" link="fulltext">10.1016/S0378-1119(97)00474-5</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B12" rating="0">
            <title>
               <p>Comparative analysis of noncoding regions of 77 orthologous mouse and human gene pairs</p>
            </title>
            <aug>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Jareborg</snm>
                  <fnm>N</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Birney</snm>
                  <fnm>E</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Durbin</snm>
                  <fnm>R</fnm>
               </au>
            </aug>
            <source>Genome Research</source>
            <pubdate>1999</pubdate>
            <volume>9</volume>
            <fpage>815</fpage>
            <lpage>824</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi" link="fulltext">10.1101/gr.9.9.815</pubid>
                  <pubid idtype="pmpid" link="fulltext">10508839</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B13" rating="0">
            <title>
               <p>Identification of a coordinate regulator of interleukins 4, 13, and 5 by cross-species sequence comparisons</p>
            </title>
            <aug>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Loots</snm>
                  <fnm>GG</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Locksley</snm>
                  <fnm>RM</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Blankespoor</snm>
                  <fnm>CM</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Wang</snm>
                  <fnm>ZE</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Miller</snm>
                  <fnm>W</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Rubin</snm>
                  <fnm>EM</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Frazer</snm>
                  <fnm>KA</fnm>
               </au>
            </aug>
            <source>Science</source>
            <pubdate>2000</pubdate>
            <volume>288</volume>
            <issue>5463</issue>
            <fpage>136</fpage>
            <lpage>140</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi" link="fulltext">10.1126/science.288.5463.136</pubid>
                  <pubid idtype="pmpid" link="fulltext">10753117</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B14" rating="0">
            <title>
               <p>Analysis of vertebrate SCL loci identifies conserved enhancers</p>
            </title>
            <aug>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>G&#246;ttgens</snm>
                  <fnm>B</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Barton</snm>
                  <fnm>LM</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Gilbert</snm>
                  <fnm>JGR</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Bench</snm>
                  <fnm>AJ</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Sanchez</snm>
                  <fnm>MJ</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Bahn</snm>
                  <fnm>S</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Mistry</snm>
                  <fnm>S</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Grafham</snm>
                  <fnm>D</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>McMurray</snm>
                  <fnm>A</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Vaudin</snm>
                  <fnm>M</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Amaya</snm>
                  <fnm>E</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Bentley</snm>
                  <fnm>DR</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Green</snm>
                  <fnm>AR</fnm>
               </au>
            </aug>
            <source>Nature Biotechnology</source>
            <pubdate>2000</pubdate>
            <volume>18</volume>
            <fpage>181</fpage>
            <lpage>186</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi" link="fulltext">10.1038/72635</pubid>
                  <pubid idtype="pmpid" link="fulltext">10657125</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B15" rating="0">
            <title>
               <p>Transcriptional regulation of the stem cell leukemia gene (SCL) comparative analysis of five vertebrate SCL loci</p>
            </title>
            <aug>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>G&#246;ttgens</snm>
                  <fnm>B</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Barton</snm>
                  <fnm>L</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Chapman</snm>
                  <fnm>M</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Sinclair</snm>
                  <fnm>A</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Knudsen</snm>
                  <fnm>B</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Grafham</snm>
                  <fnm>D</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Gilbert</snm>
                  <fnm>J</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Rogers</snm>
                  <fnm>J</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Bentley</snm>
                  <fnm>DR</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Green</snm>
                  <fnm>AR</fnm>
               </au>
            </aug>
            <source>Genome Res</source>
            <pubdate>2002</pubdate>
            <volume>12</volume>
            <fpage>749</fpage>
            <lpage>759</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid" link="fulltext">186570</pubid>
                  <pubid idtype="pmpid" link="fulltext">11997341</pubid>
                  <pubid idtype="doi" link="fulltext">10.1101/gr.45502</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B16" rating="0">
            <title>
               <p>Long-range comparison of human and mouse SCL loci: localized regions of sensitivity to restriction endonucleases correspond precisely with peaks of conserved noncoding sequences</p>
            </title>
            <aug>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>G&#246;ttgens</snm>
                  <fnm>B</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Gilbert</snm>
                  <fnm>JGR</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Barton</snm>
                  <fnm>LM</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Grafham</snm>
                  <fnm>D</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Rogers</snm>
                  <fnm>J</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Bentley</snm>
                  <fnm>DR</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Green</snm>
                  <fnm>AR</fnm>
               </au>
            </aug>
            <source>Genome Res</source>
            <pubdate>2001</pubdate>
            <volume>11</volume>
            <fpage>87</fpage>
            <lpage>97</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi" link="fulltext">10.1101/gr.153001</pubid>
                  <pubid idtype="pmpid" link="fulltext">11156618</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B17" rating="0">
            <title>
               <p>Annotating regulatory DNA based on man-mouse genomic comparison</p>
            </title>
            <aug>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Dieterich</snm>
                  <fnm>C</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Wang</snm>
                  <fnm>H</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Rateitschak</snm>
                  <fnm>K</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Krause</snm>
                  <fnm>A</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Vingron</snm>
                  <fnm>M</fnm>
               </au>
            </aug>
            <source>Bioinformatics</source>
            <pubdate>2002</pubdate>
            <volume>18</volume>
            <fpage>S84</fpage>
            <lpage>S90</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">12385988</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B18" rating="0">
            <title>
               <p>Rapid Development of Nucleic Acid Diagnostics</p>
            </title>
            <aug>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Fitch</snm>
                  <fnm>JP</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Gardner</snm>
                  <fnm>SN</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Kuczmarski</snm>
                  <fnm>TA</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Kurtz</snm>
                  <fnm>S</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Myers</snm>
                  <fnm>R</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Ott</snm>
                  <fnm>LL</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Slezak</snm>
                  <fnm>TR</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Vitalis</snm>
                  <fnm>EA</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Zemla</snm>
                  <fnm>AT</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>McCready</snm>
                  <fnm>PM</fnm>
               </au>
            </aug>
            <source>Proceedings of the IEEE</source>
            <pubdate>2002</pubdate>
            <volume>90</volume>
            <fpage>1708</fpage>
            <lpage>1721</lpage>
            <xrefbib>
               <pubid idtype="doi" link="fulltext">10.1109/JPROC.2002.804680</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B19" rating="0">
            <title>
               <p>Alignment of whole genomes</p>
            </title>
            <aug>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Delcher</snm>
                  <fnm>LA</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Kasif</snm>
                  <fnm>S</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Fleischmann</snm>
                  <fnm>AD</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Peterson</snm>
                  <fnm>J</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>White</snm>
                  <fnm>O</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Salzberg</snm>
                  <fnm>SL</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>1999</pubdate>
            <volume>27</volume>
            <issue>11</issue>
            <fpage>2369</fpage>
            <lpage>2376</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid" link="fulltext">148804</pubid>
                  <pubid idtype="pmpid" link="fulltext">10325427</pubid>
                  <pubid idtype="doi" link="fulltext">10.1093/nar/27.11.2369</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B20" rating="0">
            <title>
               <p><it>REPuter </it>: Fast computation of maximal repeats in complete genomes</p>
            </title>
            <aug>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Kurtz</snm>
                  <fnm>S</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Schleiermacher</snm>
                  <fnm>C</fnm>
               </au>
            </aug>
            <source>Bioinformatics</source>
            <pubdate>1999</pubdate>
            <volume>15</volume>
            <issue>5</issue>
            <fpage>426</fpage>
            <lpage>427</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi" link="fulltext">10.1093/bioinformatics/15.5.426</pubid>
                  <pubid idtype="pmpid" link="fulltext">10366664</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B21" rating="0">
            <title>
               <p>Computation and visualization of degenerate repeats in complete genomes</p>
            </title>
            <aug>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Kurtz</snm>
                  <fnm>S</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Ohlebusch</snm>
                  <fnm>E</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Schleiermacher</snm>
                  <fnm>C</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Stoye</snm>
                  <fnm>J</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Giegerich</snm>
                  <fnm>R</fnm>
               </au>
            </aug>
            <source>In Proceedings of the 8th International Conference on Intelligent Systems for Molecular Biology</source>
            <publisher>Menlo Parc, CA, AAAI Press</publisher>
            <pubdate>2000</pubdate>
            <fpage>228</fpage>
            <lpage>238</lpage>
         </bibl>
         <bibl id="B22" rating="0">
            <title>
               <p>PipMaker&#8211;a web server for aligning two genomic DNA sequences</p>
            </title>
            <aug>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Schwartz</snm>
                  <fnm>S</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Zhang</snm>
                  <fnm>Z</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Frazer</snm>
                  <fnm>KA</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Smit</snm>
                  <fnm>A</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Riemer</snm>
                  <fnm>C</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Bouck</snm>
                  <fnm>J</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Gibbs</snm>
                  <fnm>R</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Hardison</snm>
                  <fnm>R</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Miller</snm>
                  <fnm>W</fnm>
               </au>
            </aug>
            <source>Genome Research</source>
            <pubdate>2000</pubdate>
            <volume>10</volume>
            <fpage>577</fpage>
            <lpage>586</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi" link="fulltext">10.1101/gr.10.4.577</pubid>
                  <pubid idtype="pmpid" link="fulltext">10779500</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B23" rating="0">
            <title>
               <p>Human-mouse alignments with BLASTZ</p>
            </title>
            <aug>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Schwartz</snm>
                  <fnm>S</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Kent</snm>
                  <fnm>WJ</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Smit</snm>
                  <fnm>A</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Zhang</snm>
                  <fnm>Z</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Baertsch</snm>
                  <fnm>RHR</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Haussler</snm>
                  <fnm>D</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Miller</snm>
                  <fnm>W</fnm>
               </au>
            </aug>
            <source>Genome Research</source>
            <pubdate>2003</pubdate>
            <volume>13</volume>
            <fpage>103</fpage>
            <lpage>107</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi" link="fulltext">10.1101/gr.809403</pubid>
                  <pubid idtype="pmpid" link="fulltext">12529312</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B24" rating="0">
            <title>
               <p>DIALIGN 2: improvement of the segment-to-segment approach to multiple sequence alignment</p>
            </title>
            <aug>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Morgenstern</snm>
                  <fnm>B</fnm>
               </au>
            </aug>
            <source>Bioinformatics</source>
            <pubdate>1999</pubdate>
            <volume>15</volume>
            <fpage>211</fpage>
            <lpage>218</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi" link="fulltext">10.1093/bioinformatics/15.3.211</pubid>
                  <pubid idtype="pmpid" link="fulltext">10222408</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B25" rating="0">
            <title>
               <p>Evolution of bhlh transcription factors: modular evolution by domain shuffling?</p>
            </title>
            <aug>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Morgenstern</snm>
                  <fnm>B</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Atchley</snm>
                  <fnm>WR</fnm>
               </au>
            </aug>
            <source>Mol Biol Evol</source>
            <pubdate>1999</pubdate>
            <volume>16</volume>
            <fpage>1654</fpage>
            <lpage>1663</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">10605108</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B26" rating="0">
            <title>
               <p>A simple and space-efficient fragment-chaining algorithm for alignment of DNA and protein sequences</p>
            </title>
            <aug>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Morgenstern</snm>
                  <fnm>B</fnm>
               </au>
            </aug>
            <source>Applied Mathematics Letters</source>
            <pubdate>2002</pubdate>
            <volume>15</volume>
            <fpage>11</fpage>
            <lpage>16</lpage>
            <xrefbib>
               <pubid idtype="doi" link="fulltext">10.1016/S0893-9659(01)00085-4</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B27" rating="0">
            <aug>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Gusfield</snm>
                  <fnm>D</fnm>
               </au>
            </aug>
            <source>Algorithms on Strings, Trees, and Sequences: Computer Science and Computational Biology</source>
            <publisher>Cambridge, UK: Cambridge University Press</publisher>
            <pubdate>1997</pubdate>
         </bibl>
         <bibl id="B28" rating="0">
            <title>
               <p>Fast and sensitive alignment of large genomic sequences</p>
            </title>
            <aug>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Brudno</snm>
                  <fnm>M</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Morgenstern</snm>
                  <fnm>B</fnm>
               </au>
            </aug>
            <source>In Proceedings IEEE Computer Society Bioinformatics Conference: 14 &#8211; 16 August 2002; Paolo Alto</source>
            <publisher>IEEE Computer Society</publisher>
            <editor>Vicky Markstein and Peter Markstein</editor>
            <pubdate>2002</pubdate>
            <fpage>138</fpage>
            <lpage>147</lpage>
         </bibl>
         <bibl id="B29" rating="0">
            <title>
               <p>Multiple DNA and protein sequence alignment based on segment-to-segment comparison</p>
            </title>
            <aug>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Morgenstern</snm>
                  <fnm>B</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Dress</snm>
                  <fnm>AWM</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Werner</snm>
                  <fnm>T</fnm>
               </au>
            </aug>
            <source>Proc Natl Acad Sci USA</source>
            <pubdate>1996</pubdate>
            <volume>93</volume>
            <fpage>12098</fpage>
            <lpage>12103</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid" link="fulltext">37949</pubid>
                  <pubid idtype="pmpid" link="fulltext">8901539</pubid>
                  <pubid idtype="doi" link="fulltext">10.1073/pnas.93.22.12098</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B30" rating="0">
            <title>
               <p>Speeding up the DIALIGN multiple alignment program by using the 'greedy alignment of biological sequences library' (GABIOS-LIB)</p>
            </title>
            <aug>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Abdedda&#239;m</snm>
                  <fnm>S</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Morgenstern</snm>
                  <fnm>B</fnm>
               </au>
            </aug>
            <source>Lecture Notes in Computer Science</source>
            <pubdate>2001</pubdate>
            <volume>2066</volume>
            <fpage>1</fpage>
            <lpage>11</lpage>
         </bibl>
         <bibl id="B31" rating="0">
            <title>
               <p>Comparative analysis of multiple protein-sequence alignment methods</p>
            </title>
            <aug>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>McClure</snm>
                  <fnm>MA</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Vasi</snm>
                  <fnm>TK</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Fitch</snm>
                  <fnm>WM</fnm>
               </au>
            </aug>
            <source>Mol Biol Evol</source>
            <pubdate>1994</pubdate>
            <volume>11</volume>
            <fpage>571</fpage>
            <lpage>592</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">8078398</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B32" rating="0">
            <title>
               <p>BAliBASE: A benchmark alignment database for the evaluation of multiple sequence alignment programs</p>
            </title>
            <aug>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Thompson</snm>
                  <fnm>JD</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Plewniak</snm>
                  <fnm>F</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Poch</snm>
                  <fnm>O</fnm>
               </au>
            </aug>
            <source>Bioinformatics</source>
            <pubdate>1999</pubdate>
            <volume>15</volume>
            <fpage>87</fpage>
            <lpage>88</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi" link="fulltext">10.1093/bioinformatics/15.1.87</pubid>
                  <pubid idtype="pmpid" link="fulltext">10068696</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B33" rating="0">
            <title>
               <p>Quality assessment of multiple alignment programs</p>
            </title>
            <aug>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Lassmann</snm>
                  <fnm>T</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Sonnhammer</snm>
                  <fnm>ELL</fnm>
               </au>
            </aug>
            <source>FEBS Letters</source>
            <pubdate>2002</pubdate>
            <volume>529</volume>
            <fpage>126</fpage>
            <lpage>130</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi" link="fulltext">10.1016/S0014-5793(02)03189-7</pubid>
                  <pubid idtype="pmpid" link="fulltext">12354624</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B34" rating="0">
            <title>
               <p>The SCL gene: from case report to critical hematopoietic regulator</p>
            </title>
            <aug>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Begley</snm>
                  <fnm>CG</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Green</snm>
                  <fnm>AR</fnm>
               </au>
            </aug>
            <source>Blood</source>
            <pubdate>1999</pubdate>
            <volume>93</volume>
            <fpage>2760</fpage>
            <lpage>2770</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">10216069</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B35" rating="0">
            <title>
               <p>Regulation of the stem cell leukemia (SCL) gene: a tale of two fishes</p>
            </title>
            <aug>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Barton</snm>
                  <fnm>LM</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>G&#246;ttgens</snm>
                  <fnm>B</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Gering</snm>
                  <fnm>M</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Gilbert</snm>
                  <fnm>JG</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Grafham</snm>
                  <fnm>D</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Rogers</snm>
                  <fnm>J</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Bentley</snm>
                  <fnm>D</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Patient</snm>
                  <fnm>R</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Green</snm>
                  <fnm>AR</fnm>
               </au>
            </aug>
            <source>Proc Natl Acad Sci USA</source>
            <pubdate>2001</pubdate>
            <volume>98</volume>
            <fpage>6747</fpage>
            <lpage>6752</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid" link="fulltext">34424</pubid>
                  <pubid idtype="pmpid" link="fulltext">11381108</pubid>
                  <pubid idtype="doi" link="fulltext">10.1073/pnas.101532998</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B36" rating="0">
            <title>
               <p>Distinct mechanisms direct SCL/TAL-1 expression in erythroid cells and CD34 positive primitive myeloid cells</p>
            </title>
            <aug>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Bockamp</snm>
                  <fnm>EO</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>McLaughlin</snm>
                  <fnm>F</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>G&#246;ttgens</snm>
                  <fnm>B</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Murrell</snm>
                  <fnm>AM</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Elefanty</snm>
                  <fnm>AG</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Green</snm>
                  <fnm>AR</fnm>
               </au>
            </aug>
            <source>J Biol Chem</source>
            <pubdate>1997</pubdate>
            <volume>272</volume>
            <fpage>8781</fpage>
            <lpage>8790</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi" link="fulltext">10.1074/jbc.272.13.8781</pubid>
                  <pubid idtype="pmpid" link="fulltext">9079714</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B37" rating="0">
            <title>
               <p>Lineage-restricted regulation of the murine SCL/TAL-1 promoter</p>
            </title>
            <aug>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Bockamp</snm>
                  <fnm>EO</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>McLaughlin</snm>
                  <fnm>F</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Murrell</snm>
                  <fnm>AM</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>G&#246;ttgens</snm>
                  <fnm>B</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Robb</snm>
                  <fnm>L</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Begley</snm>
                  <fnm>CG</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Green</snm>
                  <fnm>AR</fnm>
               </au>
            </aug>
            <source>Blood</source>
            <pubdate>1995</pubdate>
            <volume>86</volume>
            <fpage>1502</fpage>
            <lpage>1514</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">7632958</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B38" rating="0">
            <title>
               <p>Distinct 5' SCL enhancers direct transcription to developing brain, spinal cord, and endothelium: neural expression is mediated by GATA factor binding sites</p>
            </title>
            <aug>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Sinclair</snm>
                  <fnm>AM</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>G&#246;ttgens</snm>
                  <fnm>B</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Barton</snm>
                  <fnm>LM</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Stanley</snm>
                  <fnm>ML</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Pardanaud</snm>
                  <fnm>L</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Klaine</snm>
                  <fnm>M</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Bahn</snm>
                  <fnm>MGS</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Sanchez</snm>
                  <fnm>M</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Bench</snm>
                  <fnm>AJ</fnm>
               </au>
            </aug>
            <source>Dev Biol</source>
            <pubdate>1999</pubdate>
            <volume>209</volume>
            <fpage>128</fpage>
            <lpage>142</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi" link="fulltext">10.1006/dbio.1999.9236</pubid>
                  <pubid idtype="pmpid" link="fulltext">10208748</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B39" rating="0">
            <title>
               <p>GATA-and SP1-binding sites are required for the full activity of the tissue-specific promoter of the TAL-1 gene</p>
            </title>
            <aug>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Lecointe</snm>
                  <fnm>N</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Bernard</snm>
                  <fnm>O</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Naert</snm>
                  <fnm>K</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Joulin</snm>
                  <fnm>V</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Larsen</snm>
                  <fnm>JC</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Romeo</snm>
                  <fnm>PH</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Mathieu-Mahul</snm>
                  <fnm>D</fnm>
               </au>
            </aug>
            <source>Oncogene</source>
            <pubdate>1994</pubdate>
            <volume>9</volume>
            <fpage>2623</fpage>
            <lpage>2632</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">8058326</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B40" rating="0">
            <title>
               <p>DNA binding sites for the transcriptional activator/repressor YY1</p>
            </title>
            <aug>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Hyde-DeRuyscher</snm>
                  <fnm>RP</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Jennings</snm>
                  <fnm>E</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Shenk</snm>
                  <fnm>T</fnm>
               </au>
            </aug>
            <source>Nuc Acids Res</source>
            <pubdate>1995</pubdate>
            <volume>23</volume>
            <fpage>4457</fpage>
            <lpage>4465</lpage>
         </bibl>
         <bibl id="B41" rating="0">
            <title>
               <p>MAVID multiple alignment server</p>
            </title>
            <aug>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Bray</snm>
                  <fnm>N</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Pachter</snm>
                  <fnm>L</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Research</source>
            <pubdate>2003</pubdate>
            <volume>31</volume>
            <fpage>3525</fpage>
            <lpage>3526</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid" link="fulltext">169029</pubid>
                  <pubid idtype="pmpid" link="fulltext">12824358</pubid>
                  <pubid idtype="doi" link="fulltext">10.1093/nar/gkg623</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B42" rating="0">
            <title>
               <p>LAGAN and multi-LAGAN: Efficient tools for large-scale multiple alignment of genomic DNA</p>
            </title>
            <aug>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Brudno</snm>
                  <fnm>M</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Do</snm>
                  <fnm>C</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Cooper</snm>
                  <fnm>G</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Kim</snm>
                  <fnm>MF</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Davydov</snm>
                  <fnm>E</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <cnm>NISC Sequencing Consortium</cnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Green</snm>
                  <fnm>ED</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Sidow</snm>
                  <fnm>A</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Batzoglou</snm>
                  <fnm>S</fnm>
               </au>
            </aug>
            <source>Genome Research</source>
            <pubdate>2003</pubdate>
            <volume>13</volume>
            <fpage>721</fpage>
            <lpage>731</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi" link="fulltext">10.1101/gr.926603</pubid>
                  <pubid idtype="pmpid" link="fulltext">12654723</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B43" rating="0">
            <title>
               <p>Active conservation of noncoding sequences revealed by three-way species comparisons</p>
            </title>
            <aug>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Dubchak</snm>
                  <fnm>I</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Brudno</snm>
                  <fnm>M</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Loots</snm>
                  <fnm>GG</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Pachter</snm>
                  <fnm>L</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Mayor</snm>
                  <fnm>C</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Rubin</snm>
                  <fnm>EM</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Frazer</snm>
                  <fnm>KA</fnm>
               </au>
            </aug>
            <source>Genome Research</source>
            <pubdate>2000</pubdate>
            <volume>10</volume>
            <fpage>1304</fpage>
            <lpage>1306</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi" link="fulltext">10.1101/gr.142200</pubid>
                  <pubid idtype="pmpid" link="fulltext">10984448</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B44" rating="0">
            <title>
               <p>Algorithms for phylogenetic footprinting</p>
            </title>
            <aug>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Blanchette</snm>
                  <fnm>M</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Schwikowski</snm>
                  <fnm>B</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Tompa</snm>
                  <fnm>M</fnm>
               </au>
            </aug>
            <source>Journal of Computational Biology</source>
            <pubdate>2002</pubdate>
            <volume>9</volume>
            <fpage>211</fpage>
            <lpage>223</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi" link="fulltext">10.1089/10665270252935421</pubid>
                  <pubid idtype="pmpid" link="fulltext">12015878</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B45" rating="0">
            <title>
               <p>Discovery of regulatory elements by a computational method for phylogenetic footprinting</p>
            </title>
            <aug>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Blanchette</snm>
                  <fnm>M</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Tompa</snm>
                  <fnm>M</fnm>
               </au>
            </aug>
            <source>Genome Research</source>
            <pubdate>2002</pubdate>
            <volume>12</volume>
            <fpage>739</fpage>
            <lpage>748</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid" link="fulltext">186562</pubid>
                  <pubid idtype="pmpid" link="fulltext">11997340</pubid>
                  <pubid idtype="doi" link="fulltext">10.1101/gr.6902</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B46" rating="0">
            <title>
               <p>AGenDA: Homology-based gene prediction</p>
            </title>
            <aug>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Taher</snm>
                  <fnm>L</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Rinner</snm>
                  <fnm>O</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Gargh</snm>
                  <fnm>ASS</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Brudno</snm>
                  <fnm>M</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Batzoglou</snm>
                  <fnm>S</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Morgenstern</snm>
                  <fnm>B</fnm>
               </au>
            </aug>
            <source>Bioinformatics</source>
            <pubdate>2003</pubdate>
            <volume>19</volume>
            <fpage>1575</fpage>
            <lpage>1577</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi" link="fulltext">10.1093/bioinformatics/btg181</pubid>
                  <pubid idtype="pmpid" link="fulltext">12912840</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B47" rating="0">
            <title>
               <p>Basic local alignment search tool</p>
            </title>
            <aug>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Altschul</snm>
                  <fnm>SF</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Gish</snm>
                  <fnm>W</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Miller</snm>
                  <fnm>W</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Myers</snm>
                  <fnm>E-M</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Lipman</snm>
                  <fnm>DJ</fnm>
               </au>
            </aug>
            <source>J Mol Biol</source>
            <pubdate>1990</pubdate>
            <volume>215</volume>
            <fpage>403</fpage>
            <lpage>410</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi" link="fulltext">10.1006/jmbi.1990.9999</pubid>
                  <pubid idtype="pmpid" link="fulltext">2231712</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B48" rating="0">
            <title>
               <p>Improved tools for biological sequence comparison</p>
            </title>
            <aug>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Pearson</snm>
                  <fnm>WR</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Lipman</snm>
                  <fnm>DJ</fnm>
               </au>
            </aug>
            <source>Proc Natl Acad Sci USA</source>
            <pubdate>1988</pubdate>
            <volume>85</volume>
            <fpage>2444</fpage>
            <lpage>2448</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">3162770</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B49" rating="0">
            <title>
               <p>Analysis of conserved noncoding dna in drosophila reveals similar constraints in intergenic and intronic sequences</p>
            </title>
            <aug>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Bergman</snm>
                  <fnm>CM</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Kreitman</snm>
                  <fnm>M</fnm>
               </au>
            </aug>
            <source>Genome Research</source>
            <pubdate>2001</pubdate>
            <volume>11</volume>
            <fpage>1335</fpage>
            <lpage>1345</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi" link="fulltext">10.1101/gr.178701</pubid>
                  <pubid idtype="pmpid" link="fulltext">11483574</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B50" rating="0">
            <title>
               <p>An IgH enhancer that drives transcription through basic helix-loop-helix and Oct transcription factor binding motifs. Functional analysis of the E(mu)3' enhancer of the catfish</p>
            </title>
            <aug>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Cioffi</snm>
                  <fnm>CC</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Middleton</snm>
                  <fnm>DL</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Wilson</snm>
                  <fnm>MR</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Miller</snm>
                  <fnm>NW</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Clem</snm>
                  <fnm>LW</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Warr</snm>
                  <fnm>GW</fnm>
               </au>
            </aug>
            <source>J Biol Chem</source>
            <pubdate>2001</pubdate>
            <volume>276</volume>
            <fpage>27825</fpage>
            <lpage>27830</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi" link="fulltext">10.1074/jbc.M100110200</pubid>
                  <pubid idtype="pmpid" link="fulltext">11375977</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B51" rating="0">
            <title>
               <p>Rose: Generating sequence families</p>
            </title>
            <aug>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Stoye</snm>
                  <fnm>J</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Evers</snm>
                  <fnm>D</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Meyer</snm>
                  <fnm>F</fnm>
               </au>
            </aug>
            <source>Bioinformatics</source>
            <pubdate>1998</pubdate>
            <volume>14</volume>
            <fpage>157</fpage>
            <lpage>163</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi" link="fulltext">10.1093/bioinformatics/14.2.157</pubid>
                  <pubid idtype="pmpid" link="fulltext">9545448</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B52" rating="0">
            <title>
               <p>Efficient string matching: an aid to bibliographic search</p>
            </title>
            <aug>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Aho</snm>
                  <fnm>A</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Corasick</snm>
                  <fnm>M</fnm>
               </au>
            </aug>
            <source>Comm ACM</source>
            <pubdate>1975</pubdate>
            <volume>18</volume>
            <fpage>333</fpage>
            <lpage>340</lpage>
            <xrefbib>
               <pubid idtype="doi" link="fulltext">10.1145/360825.360855</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B53" rating="0">
            <title>
               <p>Trie memory</p>
            </title>
            <aug>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Fredkin</snm>
                  <fnm>E</fnm>
               </au>
            </aug>
            <source>Comm ACM</source>
            <pubdate>1960</pubdate>
            <volume>3</volume>
            <fpage>490</fpage>
            <lpage>500</lpage>
            <xrefbib>
               <pubid idtype="doi" link="fulltext">10.1145/367390.367400</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B54" rating="0">
            <title>
               <p>Skip lists: A probabilistic alternative to balanced trees</p>
            </title>
            <aug>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Pugh</snm>
                  <fnm>W</fnm>
               </au>
            </aug>
            <source>Comm ACM</source>
            <pubdate>1990</pubdate>
            <volume>33</volume>
            <fpage>668</fpage>
            <lpage>676</lpage>
            <xrefbib>
               <pubid idtype="doi" link="fulltext">10.1145/78973.78977</pubid>
            </xrefbib>
         </bibl>
      </refgrp>
   </bm>
</art>
