<?xml version='1.0'?>
<!DOCTYPE art SYSTEM 'http://www.biomedcentral.com/xml/article.dtd'>
<art>
   <ui>1471-2105-6-70</ui>
   <ji>1471-2105</ji>
   <fm>
      <dochead>Software</dochead>
      <bibl>
         <title>
            <p>Paircomp, FamilyRelationsII and Cartwheel: tools for interspecific sequence comparison</p>
         </title>
         <aug>
            <au id="A1" ca="yes">
               <snm>Brown</snm>
               <fnm>C Titus</fnm>
               <insr iid="I1"/>
               <insr iid="I2"/>
               <email>titus@caltech.edu</email>
            </au>
            <au id="A2">
               <snm>Xie</snm>
               <fnm>Yuan</fnm>
               <insr iid="I2"/>
               <email>yuan@warmjune.com</email>
            </au>
            <au id="A3">
               <snm>Davidson</snm>
               <mi>H</mi>
               <fnm>Eric</fnm>
               <insr iid="I1"/>
               <email>davidson@caltech.edu</email>
            </au>
            <au id="A4">
               <snm>Cameron</snm>
               <fnm>R Andrew</fnm>
               <insr iid="I1"/>
               <insr iid="I2"/>
               <email>acameron@caltech.edu</email>
            </au>
         </aug>
         <insg>
            <ins id="I1">
               <p>Division of Biological Sciences, California Institute of Technology, Pasadena, CA 91125, USA</p>
            </ins>
            <ins id="I2">
               <p>Center for Computational Regulatory Genomics, California Institute of Technology, Pasadena, CA 91125, USA</p>
            </ins>
         </insg>
         <source>BMC Bioinformatics</source>
         <issn>1471-2105</issn>
         <pubdate>2005</pubdate>
         <volume>6</volume>
         <issue>1</issue>
         <fpage>70</fpage>
         <url>http://www.biomedcentral.com/1471-2105/6/70</url>
         <xrefbib>
            <pubidlist>
               <pubid idtype="pmpid">15790396</pubid>
               <pubid idtype="doi">10.1186/1471-2105-6-70</pubid>
            </pubidlist>
         </xrefbib>
      </bibl>
      <history>
         <rec>
            <date>
               <day>18</day>
               <month>11</month>
               <year>2004</year>
            </date>
         </rec>
         <acc>
            <date>
               <day>24</day>
               <month>3</month>
               <year>2005</year>
            </date>
         </acc>
         <pub>
            <date>
               <day>24</day>
               <month>3</month>
               <year>2005</year>
            </date>
         </pub>
      </history>
      <cpyrt>
         <year>2005</year>
         <collab>Brown et al; licensee BioMed Central Ltd.</collab>
         <note>This is an Open Access article distributed under the terms of the Creative Commons Attribution License (<url>http://creativecommons.org/licenses/by/2.0</url>), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</note>
      </cpyrt>
      <abs>
         <sec>
            <st>
               <p>Abstract</p>
            </st>
            <sec>
               <st>
                  <p>Background</p>
               </st>
               <p>Comparative sequence analysis is an effective and increasingly common way to identify <it>cis</it>-regulatory regions in animal genomes.</p>
            </sec>
            <sec>
               <st>
                  <p>Results</p>
               </st>
               <p>We describe three tools for comparative analysis of pairs of BAC-sized genomic regions. Paircomp is a tool that does windowed (ungapped) comparisons of two sequences and reports all matches above a set threshold. FamilyRelationsII is a graphical viewer for comparisons that enables interactive exploration of several different kinds of comparisons. Cartwheel is a Web site and compute-cluster management system used to execute and store comparisons for display by FamilyRelationsII. These tools are specialized for the discovery of <it>cis</it>-regulatory regions in animal genomes. All tools and their source code are freely available at <url>http://family.caltech.edu/</url>.</p>
            </sec>
            <sec>
               <st>
                  <p>Conclusion</p>
               </st>
               <p>These tools have been shown to effectively identify regulatory regions in echinoderms, mammals, and nematodes.</p>
            </sec>
         </sec>
      </abs>
   </fm>
   <bdy>
      <sec>
         <st>
            <p>Background</p>
         </st>
         <p>Comparative sequence analysis is fast becoming a standard method for discovering <it>cis</it>-regulatory modules <abbrgrp><abbr bid="B1">1</abbr></abbrgrp>. The technique relies on the signatures of conservation left by functional genomic regions as the background sequence evolves. It is often the only way to computationally discover <it>cis</it>-regulatory modules in animal genomes when definite knowledge of upstream regulators is lacking, and it can serve as an excellent complement to experimental techniques.</p>
         <p>Paircomp, FamilyRelationsII (FRII), and Cartwheel are an integrated system for comparing two BAC-sized (~100 kb) genomic sequences, viewing the comparison, manipulating thresholds and views, and extracting the results. These tools and their predecessors, seqcomp and FamilyRelations, have been used extensively in the years since we first made them available <abbrgrp><abbr bid="B2">2</abbr></abbrgrp>. However, the addition of Cartwheel, a Web server system for performing, storing, and revisiting analyses, makes this combined toolkit considerably more useful to the experimental biologist.</p>
         <p>The first analysis done with FamilyRelations was a comparison of the <it>otx </it>region between two sea urchins; 11 of the 17 conserved blocks were shown to drive expression of a reporter <abbrgrp><abbr bid="B3">3</abbr></abbrgrp>. Kirouac and Sternberg <abbrgrp><abbr bid="B4">4</abbr></abbrgrp> showed that features conserved between <it>C. elegans </it>and <it>C. briggsae </it>encode functional regulatory regions. Romano and Wray <abbrgrp><abbr bid="B5">5</abbr></abbrgrp> used FamilyRelations to show that primary sequence identity was conserved in only part of the previously identified <it>endo16 cis</it>-regulatory region, when the <it>L. variegatus </it>sequence was used as a partner to the <it>S. purpuratus </it>sequence. Leung <it>et al. </it><abbrgrp><abbr bid="B6">6</abbr></abbrgrp> used FRII to analyze regions in which NFKB bound to verify that the regions were conserved between mouse and human. And, most recently, Revilla-i-Domingo <it>et al. </it><abbrgrp><abbr bid="B7">7</abbr></abbrgrp> identified a small conserved region in the <it>delta </it>genomic locus as a <it>cis</it>-regulatory element responsible for localized expression of <it>delta </it>in <it>S. purpuratus</it>. Similar analyses of the regulation of <it>gatae</it>, <it>krox</it>, <it>wnt8</it>, <it>brachyury</it>, <it>tbrain</it>, <it>foxa </it>and <it>deadringer </it>in <it>S. purpuratus </it>are forthcoming from this lab. While most published use of FRII and Cartwheel has been in sea urchins and nematodes, users have reported that the tools accurately identify regulatory regions in vertebrates and plants.</p>
         <p>FRII and Cartwheel are specialized for identifying conservation within relatively small genomic regions, and can be used for comparing BAC sequences between organisms for which no whole genome assembly exists (e.g. <it>S. purpuratus/L. variegatus</it>). The exhaustive "dot-plot"-style search algorithm used (described below) assumes nothing about the relative positioning or orientation of regulatory regions and can be used to detect rearrangements that might be missed by a global alignment algorithm (see e.g. <abbrgrp><abbr bid="B4">4</abbr></abbrgrp>). Because of these features, FRII and Cartwheel are particularly useful in targeted searches for regulatory regions.</p>
         <p>In this paper, we present these effective tools for comparative sequence analysis to the wider biological community.</p>
      </sec>
      <sec>
         <st>
            <p>Implementation</p>
         </st>
         <p>Paircomp is a program for doing windowed comparisons of two sequences. It is an expanded reimplementation of the seqcomp program <abbrgrp><abbr bid="B2">2</abbr></abbrgrp>. Paircomp contains several algorithms for doing exhaustive fixed-width-window sequence comparisons, optimized for different parameters. The default algorithm uses a sliding window to do a "rolling comparison" and runs in time O(NxM) for two sequences of lengths N and M. Paircomp is written in C++ and has a Python interface.</p>
         <p>FamilyRelationsII (FRII) is a graphical viewer for sequence analyses. It is a C++ reimplementation of the original Java/Jython FamilyRelations <abbrgrp><abbr bid="B2">2</abbr></abbrgrp>. FRII uses the cross-platform FLTK windowing toolkit to present a common interface on Windows, Mac OS X, and Linux/X11.</p>
         <p>Cartwheel is a server-side system that presents a uniform interface for job coordination and execution. It has several components, including a Web interface through which users can establish analyses; a remote interface for programs to retrieve analysis data; and a batch job queueing system based on a method of parallel processing known as a Linda tuple space. All of the components are built on top of a PostgreSQL database. Cartwheel is written in Python and provides libraries in Python, Java, and C++ for remote access.</p>
         <p>A technical history of the design decisions made in the implementation of these tools has been published online (<abbrgrp><abbr bid="B8">8</abbr></abbrgrp>, article "Python in Bioinformatics").</p>
         <sec>
            <st>
               <p>Availability</p>
            </st>
            <p>FRII is freely available for download in a binary distribution for Mac OS X and Windows <abbrgrp><abbr bid="B9">9</abbr></abbrgrp>; FRII will also run under most UNIX distributions but must be compiled individually. The Center for Computational Regulatory Genomics at Caltech maintains a public Cartwheel server <abbrgrp><abbr bid="B10">10</abbr></abbrgrp>. A tutorial for FRII is available online <abbrgrp><abbr bid="B11">11</abbr></abbrgrp>, and an example homework assignment for an undergraduate class is also available. The source code for paircomp, FRII and Cartwheel and all their components is freely available under the L/GPL through the above Web sites. Paircomp, FamilyRelationsII and Cartwheel are Copyright <sup>&#169; </sup>2001&#8211;2004 the California Institute of Technology.</p>
         </sec>
      </sec>
      <sec>
         <st>
            <p>Results and discussion</p>
         </st>
         <sec>
            <st>
               <p>Paircomp</p>
            </st>
            <p>Several different classes of algorithms are available for comparing two genomic sequences. Windowed comparisons do an exhaustive comparison of two sequences with a fixed-width window, and record strict (ungapped) sequence identity within that window <abbrgrp><abbr bid="B2">2</abbr><abbr bid="B12">12</abbr></abbrgrp>. Local alignment algorithms such as BLAST search for common "words" of DNA in a pair of sequences and build a gapped alignment around these words <abbrgrp><abbr bid="B13">13</abbr></abbrgrp>. These gapped alignments are often scored by overall length, so that e.g. a 500 bp match at 90% is ranked higher than a 200 bp match at 90%. Global alignment algorithms such as AVID <abbrgrp><abbr bid="B14">14</abbr></abbrgrp> and LAGAN <abbrgrp><abbr bid="B15">15</abbr></abbrgrp> seek to build a start-to-end gapped alignment of syntenic genomic regions. Windowed comparisons and local alignment algorithms usually search for matches in both forward and reverse complement directions, while global alignment algorithms typically try to build an alignment without inversions. Implementations of all three strategies for genomic comparisons have been publicly available for some time: Dotter and seqcomp implement windowed comparisons <abbrgrp><abbr bid="B2">2</abbr><abbr bid="B12">12</abbr></abbrgrp>; PipMaker uses a local alignment algorithm, blastz <abbrgrp><abbr bid="B16">16</abbr><abbr bid="B17">17</abbr></abbrgrp>; and Vista relies on a global alignment generated by AVID <abbrgrp><abbr bid="B18">18</abbr></abbrgrp>. All three comparison strategies have been successful at finding regulatory regions <abbrgrp><abbr bid="B1">1</abbr><abbr bid="B19">19</abbr></abbrgrp>.</p>
            <p>Of the three general classes of algorithms, we chose to use windowed comparisons in our search for <it>cis</it>-regulatory modules. Our decision was based on several criteria. First, these comparisons report matches based solely on strict sequence identity with no gapping, unlike alignment algorithms. This is a good <it>ab initio </it>requirement when comparing sequences in search of <it>cis</it>-regulatory modules, whose evolution is still poorly understood; in particular, binding sites could be sensitive to indels, which are somewhat elided in gapped alignments. Moreover, we had no <it>a priori </it>expectation for the locations, sizes, or degrees of similarity of conserved regions, necessitating an exhaustive search strategy that did not bias scores based on the length or position of matches. And, finally, from a user-interface perspective the parameters for paircomp &#8211; windowsize and threshold &#8211; are simple and intuitively linked to the results. Our success with this basic approach means that we have not needed to move to alternative algorithms.</p>
            <p>Paircomp is a standalone program that executes windowed comparisons (see Methods). It searches for matches in both the forward and reverse complement directions. Paircomp runs within Cartwheel; the results are stored in a database and communicated to FRII.</p>
         </sec>
         <sec>
            <st>
               <p>Cartwheel</p>
            </st>
            <p>Cartwheel is a Web site through which analyses are executed and from which analyses are loaded into FamilyRelationsII. It provides an easy-to-use interface through which to establish a set of analyses on a pair of sequences. Cartwheel also allows the annotation of sequences with a variety of features; features can be uploaded to Cartwheel in the standard GFF format. A tutorial for setting up pairwise comparisons is available online <abbrgrp><abbr bid="B11">11</abbr></abbrgrp>.</p>
         </sec>
         <sec>
            <st>
               <p>FamilyRelationsII</p>
            </st>
            <p>FamilyRelationsII, or FRII, displays comparisons of BAC-sized genomic sequences of lengths ~100 kb. It is a graphical program that runs directly from a desktop and loads data from the Cartwheel server. From within FRII, users can zoom in to look more closely at features, alter scoring thresholds for comparisons, change the color of features, and turn on or off the display of specific analyses. FRII can also display closeup views of comparisons and alignments against DNA and protein sequence.</p>
            <p>Figure <figr fid="F1">1</figr> shows the main FRII view of a comparison between the <it>otx </it>locus in <it>S. purpuratus </it>and <it>L. variegatus</it>, two sea urchins that diverged approx. 50 mya. The genomic sequences were obtained from BAC libraries as described in <abbrgrp><abbr bid="B3">3</abbr></abbrgrp>. In the case of <it>S. purpuratus</it>, the BAC contains the entire <it>otx </it>coding region; the <it>L. variegatus </it>sequence contains only the 5' region of the gene, and not the final exon.</p>
            <fig id="F1">
               <title>
                  <p>Figure 1</p>
               </title>
               <caption>
                  <p>A paircomp comparison of the <it>otx </it>gene locus from <it>S. purpuratus </it>(top) with <it>L. variegatus </it>(bottom)</p>
               </caption>
               <text>
                  <p>A paircomp comparison of the <it>otx </it>gene locus from <it>S. purpuratus </it>(top) with <it>L. variegatus </it>(bottom). We used paircomp to compare all 20 bp subsequences from a 160 kb <it>S. purpuratus </it>BAC with a 62 kb <it>L. variegatus </it>BAC; those 20 bp subsequences with a exact match of 19/20 or 20/20 bases are connected with a red line. Only the 80 kb surrounding the <it>otx </it>gene is shown on the top. Matches to the known <it>S. purpuratus </it>cDNA sequence are shown in red on the top sequence, and TBLASTX matches in <it>L. variegatus </it>to the same cDNA sequence are shown in blue on the bottom sequence. The <it>L. variegatus </it>genomic sequence does not extend to cover the 3' region of the coding sequence. On the top of the view are tabs to switch between the "pair view" (shown) and the "dot plot" view (see Figure 2). On the right side of the view are control buttons that allow the user to change both the color and the threshold at which matches are displayed. The user can also view a closeup of a region by selecting the region on the sequence (e.g. as on the bottom sequence, where a region from 40 kb to 61.6 kb is selected) and then pressing the "View closeup..." button. An example closeup view is shown in Figure 3.</p>
               </text>
               <graphic file="1471-2105-6-70-1"/>
            </fig>
            <p>The comparison shown is a paircomp comparison performed with a 20 bp window at 90% and then displayed at a 95% threshold. The general colinearity of the matches suggests that the majority of the similar regions are conserved with respect to size, orientation, and relative distance from the exons. This colinearity is typical of conserved features in our comparisons. The diagonal lines crossing the comparison often identify low complexity regions such as simple sequence repeats present throughout both genomic regions. This pairwise mapping view is one of the two large-scale views in FRII; the other large-scale view is a dot-plot view, shown in Figure <figr fid="F2">2</figr>.</p>
            <fig id="F2">
               <title>
                  <p>Figure 2</p>
               </title>
               <caption>
                  <p>A "dot-plot" style view of a subregion of the <it>otx </it>comparison (see Figure 1)</p>
               </caption>
               <text>
                  <p>A "dot-plot" style view of a subregion of the <it>otx </it>comparison (see Figure 1). The top sequence is a zoomed-in view of the <it>otx </it>genomic region from <it>S. purpuratus</it>, as in Figure 1; the region runs from 119.6 kb to 133.0 kb. The side sequence is a zoomed-in view of the orthologous region from <it>L. variegatus</it>, running from 38.5 kb to 51.5 kb. The region surrounding the first exon (in red) of the <it>sp &#945;-otx </it>transcript is selected on the top (<it>S. purpuratus</it>) sequence, and the corresponding TBLASTX matches are highlighted on the left (<it>L. variegatus</it>) sequence in blue. The selection box in the center of the view contains the paircomp matches in this region, showing only 20 bp matches that match at 19/20 or 20/20 (corresponding to a 95% threshold). A closeup view of this region, showing the DNA sequence of the two regions with the corresponding matches, is shown in Figure 3.</p>
               </text>
               <graphic file="1471-2105-6-70-2"/>
            </fig>
            <p>Figure <figr fid="F2">2</figr> shows a dot-plot view of an expanded region of the comparison, centered on the first exon of the <it>&#945;-otx </it>transcript. In addition to the exon itself, there is patchy conservation throughout the region; again, this is typical of many comparisons. This view also shows that all of the elements are collinear on scales of ~10 kb.</p>
            <p>In both the dot-plot and pairwise mapping view, multiple comparisons done with different parameters can be displayed in different colors. The threshold for the matches shown can be adjusted until the desired view is obtained, and sequence can be exported from any of the views via a pop-up menu.</p>
            <p>Once a threshold is chosen, the user can expand the view of a particular region. Figure <figr fid="F3">3</figr> shows a closeup view of the region outlined in blue in Figure <figr fid="F2">2</figr>. The sequence shown in Figure <figr fid="F3">3</figr> is a small patch of conservation upstream of the first exon, displayed at a 19/20 threshold. Here the user scans along the sequence and visually compares both the boundaries of the matches and the complexity of the sequence. Sequences are directly exported to other applications via the "paste" buffer.</p>
            <fig id="F3">
               <title>
                  <p>Figure 3</p>
               </title>
               <caption>
                  <p>A closeup view of the paircomp comparison of the genomic sequence surrounding the first exon of <it>otx </it>in <it>S. purpuratus </it>(top sequence) and <it>L. variegatus </it>(bottom sequence)</p>
               </caption>
               <text>
                  <p>A closeup view of the paircomp comparison of the genomic sequence surrounding the first exon of <it>otx </it>in <it>S. purpuratus </it>(top sequence) and <it>L. variegatus </it>(bottom sequence). The top half of the closeup view shows orthologous 2 kb genomic regions (126.2 kb &#8211; 128.3 kb in the <it>S. purpuratus </it>BAC, 44.4 kb &#8211; 46.5 kb in the <it>L. variegatus </it>BAC). Matches of 19/20 or 20/20 bases are drawn in red between the sequences, and the exon matches from Figure 2 are shown in black on the sequence lines. The bottom half of the closeup view shows the part of the sequence selected in blue on the top half of the view. Lines are drawn in black between individual matching bases, and the matching bases are colored in red. Note that both blocks shown match at 19/20 because of the single mismatch in the middle of the blocks.</p>
               </text>
               <graphic file="1471-2105-6-70-3"/>
            </fig>
            <p>FRII also performs searches for motifs using the IUPAC notation in which e.g. W represents A or T. This feature allows users to search for matches to known "consensus" binding sites for transcription factors. Searches are either stored on the Cartwheel server and displayed as individual features on FRII views, or executed directly in FRII. One particularly convenient feature is the ability to ask for motifs that have mismatches in up to 5 positions; this lets users search for weaker matches to known consensi.</p>
         </sec>
         <sec>
            <st>
               <p>Other analyses</p>
            </st>
            <p>FRII displays a variety of analyses. In addition to paircomp windowed comparisons, FRII displays and manipulates Vista-style comparisons, BLAST and blastz comparisons, BLAST database searches, cDNA and protein comparisons, and the results of several different gene finders (genscan, geneid, and hmmgene <abbrgrp><abbr bid="B20">20</abbr><abbr bid="B21">21</abbr><abbr bid="B22">22</abbr></abbrgrp>). All of these analyses may be executed directly on the Cartwheel server, excepting only Vista comparisons using the (default) AVID alignment program. The data for Vista comparisons must be uploaded from the results returned by the Vista Web site; however, Vista-style comparisons with the LAGAN global alignment tool are executed directly on Cartwheel.</p>
         </sec>
         <sec>
            <st>
               <p>Discovering and analyzing regulatory regions</p>
            </st>
            <p>We and others have successfully used paircomp, FRII, and Cartwheel to discover a number of regulatory regions (see Introduction). Once we have a pair of genomic regions to compare, the steps we follow are essentially invariant from region to region:</p>
            <p>1. We set up two to three paircomp analyses at the following windowsizes and thresholds: 10 bp/90%; 20 bp/80%; 50 bp/60%.</p>
            <p>2. We match the cDNA or protein of interest against both regions, to determine where the coding regions lie.</p>
            <p>3. We also compare the RefSeq database from NCBI against both regions, to find other genes in the region.</p>
            <p>4. We load these analyses into FRII and zoom in to a view that includes as much intergenic sequence around the gene as is possible without also including other genes. We then adjust the thresholds on the 20 bp and 50 bp analyses until we obtain a roughly collinear pattern of conserved blocks. Typical values for these thresholds are 80&#8211;100% for a 20 bp windowed comparison, and 60&#8211;80% for a 50 bp windowed comparison.</p>
            <p>5. We use the closeup view to extract the conserved blocks, and design PCR primers to isolate all of the contiguous blocks of conserved sequence. We then individually subclone or fuse them into a GFP reporter construct together with a basal promoter. These constructs are then introduced into the sea urchin by microinjection and analyzed for appropriate spatiotemporal expression.</p>
            <p>In our experience, we have always been able to identify the relevant enhancer elements using this procedure. A similar procedure in which putatively negative elements are fused with a ubiquitous driver of expression often identifies necessary repressive elements. Also note that one caveat of these procedures is that for some genes, e.g. transcription factors, there are often many regions that appear to do nothing. These may be regulatory regions that affect expression at times or in places that are not under consideration, or could be other genomic features not relevant to gene regulation.</p>
         </sec>
      </sec>
      <sec>
         <st>
            <p>Conclusion</p>
         </st>
         <p>Paircomp, FamilyRelationsII, and Cartwheel are an effective, easy-to-use set of tools for analyzing conservation in BAC-sized genomic regions. Over 100 people are currently using them, and they have been effective in finding regulatory regions in a variety of organisms. In this paper we have described the tools and provided an introduction for biologists who wish to use them.</p>
      </sec>
      <sec>
         <st>
            <p>Availability and requirements</p>
         </st>
         <p>See Implementation, above, for information on server-side software.</p>
         <p><b>Project name: </b>FamilyRelationsII</p>
         <p>
            <b>Project home page: </b>
            <url>http://family.caltech.edu/</url>
         </p>
         <p><b>Operating systems: </b>Mac OS X, Windows NT/XP, UNIX/Linux (X Windows)</p>
         <p><b>Programming language: </b>C++</p>
         <p><b>License: </b>GPL/LGPL</p>
         <p>No restrictions placed on use.</p>
      </sec>
      <sec>
         <st>
            <p>Authors' contributions</p>
         </st>
         <p>CTB designed and implemented the majority of the functionality described. YX implemented a significant portion of the XML-RPC functionality used for client-server interaction. EHD laid out the design requirements, aided in writing the paper, and supervised the development of FRII. RAC is responsible for running the servers and did the majority of bug testing, and also contributed to the paper.</p>
      </sec>
   </bdy>
   <bm>
      <ack>
         <sec>
            <st>
               <p>Acknowledgements</p>
            </st>
            <p>Tristan De Buysscher and Madeleine Price, under the supervision of Dr. Barbara Wold, developed the original seqcomp and contributed to FamilyRelations. Ramon Cendejas and Kevin Berney aided in the development of features and helped exercise the Cartwheel server; a complete list of contributors to FamilyRelationsII and Cartwheel can be found on the Cartwheel Web site, under Developers. We especially thank Carolina Livi, Pei-Yun Lee, Dr. Ellen Rothenberg and Dr. Erich Schwarz for extensive user-interface testing over the years. Dr. Ellen Rothenberg and Dr. Erich Schwarz both contributed significantly to discussions of new features; in addition, Sagar Damle, Tracy Teal and Dr. Erich Schwarz gave many helpful comments on this paper. We also thank two anonymous reviewers for their comments. CTB is supported by National Institutes of Health Grant GM61005, and the Beckman Institute Center for Computational Regulatory Genomics is supported by National Institutes of Health Grant RR15044.</p>
         </sec>
      </ack>
      <refgrp>
         <bibl id="B1">
            <title>
               <p>Genomic regulatory regions: insights from comparative sequence analysis</p>
            </title>
            <aug>
               <au>
                  <snm>Cooper</snm>
                  <fnm>GM</fnm>
               </au>
               <au>
                  <snm>Sidow</snm>
                  <fnm>A</fnm>
               </au>
            </aug>
            <source>Curr Opin Genet Dev</source>
            <pubdate>2003</pubdate>
            <volume>13</volume>
            <issue>6</issue>
            <fpage>604</fpage>
            <lpage>610</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/j.gde.2003.10.001</pubid>
                  <pubid idtype="pmpid" link="fulltext">14638322</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B2">
            <title>
               <p>New computational approaches for analysis of cis-regulatory networks</p>
            </title>
            <aug>
               <au>
                  <snm>Brown</snm>
                  <fnm>CT</fnm>
               </au>
               <au>
                  <snm>Rust</snm>
                  <fnm>AG</fnm>
               </au>
               <au>
                  <snm>Clarke</snm>
                  <fnm>PJ</fnm>
               </au>
               <au>
                  <snm>Pan</snm>
                  <fnm>Z</fnm>
               </au>
               <au>
                  <snm>Schilstra</snm>
                  <fnm>MJ</fnm>
               </au>
               <au>
                  <snm>De Buysscher</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Griffin</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Wold</snm>
                  <fnm>BJ</fnm>
               </au>
               <au>
                  <snm>Cameron</snm>
                  <fnm>RA</fnm>
               </au>
               <au>
                  <snm>Davidson</snm>
                  <fnm>EH</fnm>
               </au>
               <au>
                  <snm>Bolouri</snm>
                  <fnm>H</fnm>
               </au>
            </aug>
            <source>Dev Biol</source>
            <pubdate>2002</pubdate>
            <volume>246</volume>
            <issue>1</issue>
            <fpage>86</fpage>
            <lpage>102</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1006/dbio.2002.0619</pubid>
                  <pubid idtype="pmpid" link="fulltext">12027436</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B3">
            <title>
               <p>Patchy interspecific sequence similarities efficiently identify positive cis-regulatory elements in the sea urchin</p>
            </title>
            <aug>
               <au>
                  <snm>Yuh</snm>
                  <fnm>CH</fnm>
               </au>
               <au>
                  <snm>Brown</snm>
                  <fnm>CT</fnm>
               </au>
               <au>
                  <snm>Livi</snm>
                  <fnm>CB</fnm>
               </au>
               <au>
                  <snm>Rowen</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Clarke</snm>
                  <fnm>PJ</fnm>
               </au>
               <au>
                  <snm>Davidson</snm>
                  <fnm>EH</fnm>
               </au>
            </aug>
            <source>Dev Biol</source>
            <pubdate>2002</pubdate>
            <volume>246</volume>
            <issue>1</issue>
            <fpage>148</fpage>
            <lpage>161</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1006/dbio.2002.0618</pubid>
                  <pubid idtype="pmpid" link="fulltext">12027440</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B4">
            <title>
               <p>cis-Regulatory control of three cell fate-specific genes in vulval organogenesis of Caenorhabditis elegans and C. briggsae</p>
            </title>
            <aug>
               <au>
                  <snm>Kirouac</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Sternberg</snm>
                  <fnm>PW</fnm>
               </au>
            </aug>
            <source>Dev Biol</source>
            <pubdate>2003</pubdate>
            <volume>257</volume>
            <issue>1</issue>
            <fpage>85</fpage>
            <lpage>103</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/S0012-1606(03)00032-0</pubid>
                  <pubid idtype="pmpid" link="fulltext">12710959</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B5">
            <title>
               <p>Conservation of Endo16 expression in sea urchins despite evolutionary divergence in both cis and trans-acting components of transcriptional regulation</p>
            </title>
            <aug>
               <au>
                  <snm>Romano</snm>
                  <fnm>LA</fnm>
               </au>
               <au>
                  <snm>Wray</snm>
                  <fnm>GA</fnm>
               </au>
            </aug>
            <source>Development</source>
            <pubdate>2003</pubdate>
            <volume>130</volume>
            <issue>17</issue>
            <fpage>4187</fpage>
            <lpage>4199</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1242/dev.00611</pubid>
                  <pubid idtype="pmpid" link="fulltext">12874137</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B6">
            <title>
               <p>One nucleotide in a kappaB site can determine cofactor specificity for NF-kappaB dimers</p>
            </title>
            <aug>
               <au>
                  <snm>Leung</snm>
                  <fnm>TH</fnm>
               </au>
               <au>
                  <snm>Hoffmann</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Baltimore</snm>
                  <fnm>D</fnm>
               </au>
            </aug>
            <source>Cell</source>
            <pubdate>2004</pubdate>
            <volume>118</volume>
            <issue>4</issue>
            <fpage>453</fpage>
            <lpage>464</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/j.cell.2004.08.007</pubid>
                  <pubid idtype="pmpid" link="fulltext">15315758</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B7">
            <title>
               <p>R11: a cis-regulatory node of the sea urchin embryo gene network that controls early expression of SpDelta in micromeres</p>
            </title>
            <aug>
               <au>
                  <snm>Revilla-i-Domingo</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Minokawa</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Davidson</snm>
                  <fnm>EH</fnm>
               </au>
            </aug>
            <source>Dev Biol</source>
            <pubdate>2004</pubdate>
            <volume>274</volume>
            <issue>2</issue>
            <fpage>438</fpage>
            <lpage>451</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/j.ydbio.2004.07.008</pubid>
                  <pubid idtype="pmpid" link="fulltext">15385170</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B8">
            <title>
               <p>PyZine online magazine</p>
            </title>
            <url>http://www.pyzine.com/Issue006/index.html</url>
         </bibl>
         <bibl id="B9">
            <title>
               <p>FamilyRelations Web site</p>
            </title>
            <url>http://family.caltech.edu/</url>
         </bibl>
         <bibl id="B10">
            <title>
               <p>Caltech Cartwheel server, "Woodward"</p>
            </title>
            <url>http://woodward.caltech.edu/canal/</url>
         </bibl>
         <bibl id="B11">
            <title>
               <p>FamilyRelations tutorial</p>
            </title>
            <url>http://family.caltech.edu/tutorial/</url>
         </bibl>
         <bibl id="B12">
            <title>
               <p>A dot-matrix program with dynamic threshold control suited for genomic DNA and protein sequence analysis</p>
            </title>
            <aug>
               <au>
                  <snm>Sonnhammer</snm>
                  <fnm>EL</fnm>
               </au>
               <au>
                  <snm>Durbin</snm>
                  <fnm>R</fnm>
               </au>
            </aug>
            <source>Gene</source>
            <pubdate>1995</pubdate>
            <volume>167</volume>
            <issue>1&#8211;2</issue>
            <fpage>GC1</fpage>
            <lpage>10</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/0378-1119(95)00714-8</pubid>
                  <pubid idtype="pmpid">8566757</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B13">
            <title>
               <p>Basic local alignment search tool</p>
            </title>
            <aug>
               <au>
                  <snm>Altschul</snm>
                  <fnm>SF</fnm>
               </au>
               <au>
                  <snm>Gish</snm>
                  <fnm>W</fnm>
               </au>
               <au>
                  <snm>Miller</snm>
                  <fnm>W</fnm>
               </au>
               <au>
                  <snm>Myers</snm>
                  <fnm>EW</fnm>
               </au>
               <au>
                  <snm>Lipman</snm>
                  <fnm>DJ</fnm>
               </au>
            </aug>
            <source>J Mol Biol</source>
            <pubdate>1990</pubdate>
            <volume>215</volume>
            <issue>3</issue>
            <fpage>403</fpage>
            <lpage>410</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1006/jmbi.1990.9999</pubid>
                  <pubid idtype="pmpid" link="fulltext">2231712</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B14">
            <title>
               <p>AVID: A global alignment program</p>
            </title>
            <aug>
               <au>
                  <snm>Bray</snm>
                  <fnm>N</fnm>
               </au>
               <au>
                  <snm>Dubchak</snm>
                  <fnm>I</fnm>
               </au>
               <au>
                  <snm>Pachter</snm>
                  <fnm>L</fnm>
               </au>
            </aug>
            <source>Genome Res</source>
            <pubdate>2003</pubdate>
            <volume>13</volume>
            <issue>1</issue>
            <fpage>97</fpage>
            <lpage>102</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">430967</pubid>
                  <pubid idtype="pmpid" link="fulltext">12529311</pubid>
                  <pubid idtype="doi">10.1101/gr.789803</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B15">
            <title>
               <p>LAGAN and Multi-LAGAN: efficient tools for large-scale multiple alignment of genomic DNA</p>
            </title>
            <aug>
               <au>
                  <snm>Brudno</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Do</snm>
                  <fnm>CB</fnm>
               </au>
               <au>
                  <snm>Cooper</snm>
                  <fnm>GM</fnm>
               </au>
               <au>
                  <snm>Kim</snm>
                  <fnm>MF</fnm>
               </au>
               <au>
                  <snm>Davydov</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Green</snm>
                  <fnm>ED</fnm>
               </au>
               <au>
                  <snm>Sidow</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Batzoglou</snm>
                  <fnm>S</fnm>
               </au>
            </aug>
            <source>Genome Res</source>
            <pubdate>2003</pubdate>
            <volume>13</volume>
            <issue>4</issue>
            <fpage>721</fpage>
            <lpage>731</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">430158</pubid>
                  <pubid idtype="pmpid" link="fulltext">12654723</pubid>
                  <pubid idtype="doi">10.1101/gr.926603</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B16">
            <title>
               <p>PipTools: a computational toolkit to annotate and analyze pairwise comparisons of genomic sequences</p>
            </title>
            <aug>
               <au>
                  <snm>Elnitski</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Riemer</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Petrykowska</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Florea</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Schwartz</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Miller</snm>
                  <fnm>W</fnm>
               </au>
               <au>
                  <snm>Hardison</snm>
                  <fnm>R</fnm>
               </au>
            </aug>
            <source>Genomics</source>
            <pubdate>2002</pubdate>
            <volume>80</volume>
            <issue>6</issue>
            <fpage>681</fpage>
            <lpage>690</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1006/geno.2002.7018</pubid>
                  <pubid idtype="pmpid" link="fulltext">12504859</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B17">
            <title>
               <p>PipMaker &#8211; a web server for aligning two genomic DNA sequences</p>
            </title>
            <aug>
               <au>
                  <snm>Schwartz</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Zhang</snm>
                  <fnm>Z</fnm>
               </au>
               <au>
                  <snm>Frazer</snm>
                  <fnm>KA</fnm>
               </au>
               <au>
                  <snm>Smit</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Riemer</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Bouck</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Gibbs</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Hardison</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Miller</snm>
                  <fnm>W</fnm>
               </au>
            </aug>
            <source>Genome Res</source>
            <pubdate>2000</pubdate>
            <volume>10</volume>
            <issue>4</issue>
            <fpage>577</fpage>
            <lpage>586</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">310868</pubid>
                  <pubid idtype="pmpid" link="fulltext">10779500</pubid>
                  <pubid idtype="doi">10.1101/gr.10.4.577</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B18">
            <title>
               <p>VISTA: computational tools for comparative genomics</p>
            </title>
            <aug>
               <au>
                  <snm>Frazer</snm>
                  <fnm>KA</fnm>
               </au>
               <au>
                  <snm>Pachter</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Poliakov</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Rubin</snm>
                  <fnm>EM</fnm>
               </au>
               <au>
                  <snm>Dubchak</snm>
                  <fnm>I</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>2004</pubdate>
            <volume>32</volume>
            <issue>Web Server</issue>
            <fpage>W273</fpage>
            <lpage>279</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">441596</pubid>
                  <pubid idtype="pmpid" link="fulltext">15215394</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B19">
            <title>
               <p>Rabbit muscle creatine kinase: genomic cloning, sequencing, and analysis of upstream sequences important for expression in myocytes</p>
            </title>
            <aug>
               <au>
                  <snm>Yi</snm>
                  <fnm>TM</fnm>
               </au>
               <au>
                  <snm>Walsh</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Schimmel</snm>
                  <fnm>P</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>1991</pubdate>
            <volume>19</volume>
            <issue>11</issue>
            <fpage>3027</fpage>
            <lpage>3033</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">328266</pubid>
                  <pubid idtype="pmpid">2057360</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B20">
            <title>
               <p>Prediction of complete gene structures in human genomic DNA</p>
            </title>
            <aug>
               <au>
                  <snm>Burge</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Karlin</snm>
                  <fnm>S</fnm>
               </au>
            </aug>
            <source>J Mol Biol</source>
            <pubdate>1997</pubdate>
            <volume>268</volume>
            <issue>1</issue>
            <fpage>78</fpage>
            <lpage>94</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1006/jmbi.1997.0951</pubid>
                  <pubid idtype="pmpid" link="fulltext">9149143</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B21">
            <title>
               <p>GeneID in Drosophila</p>
            </title>
            <aug>
               <au>
                  <snm>Parra</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Blanco</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Guigo</snm>
                  <fnm>R</fnm>
               </au>
            </aug>
            <source>Genome Res</source>
            <pubdate>2000</pubdate>
            <volume>10</volume>
            <issue>4</issue>
            <fpage>511</fpage>
            <lpage>515</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">310871</pubid>
                  <pubid idtype="pmpid" link="fulltext">10779490</pubid>
                  <pubid idtype="doi">10.1101/gr.10.4.511</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B22">
            <title>
               <p>Using database matches with for HMMGene for automated gene detection in Drosophila</p>
            </title>
            <aug>
               <au>
                  <snm>Krogh</snm>
                  <fnm>A</fnm>
               </au>
            </aug>
            <source>Genome Res</source>
            <pubdate>2000</pubdate>
            <volume>10</volume>
            <issue>4</issue>
            <fpage>523</fpage>
            <lpage>528</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">310864</pubid>
                  <pubid idtype="pmpid" link="fulltext">10779492</pubid>
                  <pubid idtype="doi">10.1101/gr.10.4.523</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
      </refgrp>
   </bm>
</art>
