<?xml version='1.0'?>
<!DOCTYPE art SYSTEM 'http://www.biomedcentral.com/xml/article.dtd'>
<art>
   <ui>1471-2164-9-106</ui>
   <ji>1471-2164</ji>
   <fm>
      <dochead>Methodology article</dochead>
      <bibl>
         <title>
            <p>Rapid detection and curation of conserved DNA via <it>enhanced</it>-BLAT and <it>EvoPrinterHD </it>analysis</p>
         </title>
         <aug>
            <au id="A1">
               <snm>Yavatkar</snm>
               <mi>S</mi>
               <fnm>Amarendra</fnm>
               <insr iid="I1"/>
               <email>yavatka@ninds.nih.gov</email>
            </au>
            <au id="A2">
               <snm>Lin</snm>
               <fnm>Yong</fnm>
               <insr iid="I1"/>
               <email>linyon@ninds.nih.gov</email>
            </au>
            <au id="A3">
               <snm>Ross</snm>
               <fnm>Jermaine</fnm>
               <insr iid="I2"/>
               <email>rossje@ninds.nih.gov</email>
            </au>
            <au id="A4">
               <snm>Fann</snm>
               <fnm>Yang</fnm>
               <insr iid="I1"/>
               <email>Fann@ninds.nih.gov</email>
            </au>
            <au id="A5" ca="yes">
               <snm>Brody</snm>
               <fnm>Thomas</fnm>
               <insr iid="I2"/>
               <email>brodyt@ninds.nih.gov</email>
            </au>
            <au id="A6" ca="yes">
               <snm>Odenwald</snm>
               <mi>F</mi>
               <fnm>Ward</fnm>
               <insr iid="I2"/>
               <email>ward@codon.nih.gov</email>
            </au>
         </aug>
         <insg>
            <ins id="I1">
               <p>Division of Intramural Research, Information Technology Program, NINDS, NIH, Bethesda, Maryland, USA</p>
            </ins>
            <ins id="I2">
               <p>The Neural Cell-Fate Determinants Section, NINDS, NIH, Bethesda, Maryland, USA</p>
            </ins>
         </insg>
         <source>BMC Genomics</source>
         <issn>1471-2164</issn>
         <pubdate>2008</pubdate>
         <volume>9</volume>
         <issue>1</issue>
         <fpage>106</fpage>
         <url>http://www.biomedcentral.com/1471-2164/9/106</url>
         <xrefbib>
            <pubidlist>
               <pubid idtype="pmpid">18307801</pubid>
               <pubid idtype="doi">10.1186/1471-2164-9-106</pubid>
            </pubidlist>
         </xrefbib>
      </bibl>
      <history>
         <rec>
            <date>
               <day>17</day>
               <month>10</month>
               <year>2007</year>
            </date>
         </rec>
         <acc>
            <date>
               <day>28</day>
               <month>2</month>
               <year>2008</year>
            </date>
         </acc>
         <pub>
            <date>
               <day>28</day>
               <month>2</month>
               <year>2008</year>
            </date>
         </pub>
      </history>
      <cpyrt>
         <year>2008</year>
         <collab>Yavatkar et al; licensee BioMed Central Ltd.</collab>
         <note>This is an Open Access article distributed under the terms of the Creative Commons Attribution License (<url>http://creativecommons.org/licenses/by/2.0</url>), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</note>
      </cpyrt>
      <abs>
         <sec>
            <st>
               <p>Abstract</p>
            </st>
            <sec>
               <st>
                  <p>Background</p>
               </st>
               <p>Multi-genome comparative analysis has yielded important insights into the molecular details of gene regulation. We have developed <it>EvoPrinter</it>, a web-accessed genomics tool that provides a single uninterrupted view of conserved sequences as they appear in a species of interest. An <it>EvoPrint </it>reveals with near base-pair resolution those sequences that are essential for gene function.</p>
            </sec>
            <sec>
               <st>
                  <p>Results</p>
               </st>
               <p>We describe here <it>EvoPrinterHD</it>, a 2<sup>nd</sup>-generation comparative genomics tool that automatically generates from a single input sequence an enhanced view of sequence conservation between evolutionarily distant species. Currently available for 5 nematode, 3 mosquito, 12 <it>Drosophila</it>, 20 vertebrate, 17 <it>Staphylococcus </it>and 20 enteric bacteria genomes, <it>EvoPrinterHD </it>employs a modified BLAT algorithm [<it>enhanced</it>-BLAT (<it>e</it>BLAT)], which detects up to 75% more conserved bases than identified by the BLAT alignments used in the earlier <it>EvoPrinter </it>program. The new program also identifies conserved sequences within rearranged DNA, highlights repetitive DNA, and detects sequencing gaps. <it>EvoPrinterHD </it>currently holds over 112 billion bp of indexed genomes in memory and has the flexibility of selecting a subset of genomes for analysis. An <it>EvoDifferences </it>profile is also generated to portray conserved sequences that are uniquely lost in any one of the orthologs. Finally, <it>EvoPrinterHD </it>incorporates options that allow for (1) re-initiation of the analysis using a different genome's aligning region as the reference DNA to detect species-specific changes in less-conserved regions, (2) rapid extraction and curation of conserved sequences, and (3) for bacteria, identifies unique or uniquely shared sequences present in subsets of genomes.</p>
            </sec>
            <sec>
               <st>
                  <p>Conclusion</p>
               </st>
               <p><it>EvoPrinterHD </it>is a fast, high-resolution comparative genomics tool that automatically generates an uninterrupted species-centric view of sequence conservation and enables the discovery of conserved sequences within rearranged DNA. When combined with <it>cis</it>-Decoder, a program that discovers sequence elements shared among tissue specific enhancers, <it>EvoPrinterHD </it>facilitates the analysis of conserved sequences that are essential for coordinate gene regulation.</p>
            </sec>
         </sec>
      </abs>
   </fm>
   <bdy>
      <sec>
         <st>
            <p>Background</p>
         </st>
         <p>Comparative analysis of orthologous DNA has revealed that many <it>cis</it>-regulatory enhancers contain multi-species conserved sequences (MCSs) that are essential for their transcriptional regulation (reviewed by <abbrgrp><abbr bid="B1">1</abbr><abbr bid="B2">2</abbr><abbr bid="B3">3</abbr><abbr bid="B4">4</abbr></abbrgrp>). We have previously described <it>EvoPrinter </it>and <it>cis</it>-Decoder, both web-accessed tools for discovering and comparing conserved sequences that are shared among three or more orthologs <abbrgrp><abbr bid="B4">4</abbr><abbr bid="B5">5</abbr></abbrgrp>. Generated from superimposition of multiple pair-wise BLAT alignments <abbrgrp><abbr bid="B6">6</abbr></abbrgrp>, an <it>EvoPrint </it>provides an ordered uninterrupted representation of conserved sequences as they exist in the genome of interest. When multiple species are included in the analysis, near base-pair resolution of conserved sequences required for gene function can be achieved. For example, when 12 <it>Drosophila </it>species, representing ~200 million years of cumulative evolutionary divergence, are included in the <it>EvoPrint </it>process, one can identify sequences that are essential for <it>cis</it>-regulatory function (both enhancers and minimal promoters), conserved protein encoding sequences, and micro-RNA binding sites. <it>EvoPrinterHD </it>is a second-generation alignment tool that automates the comparative analysis to rapidly identify a significantly higher percentage of conserved sequences shared among evolutionarily distant orthologs even if they exist within rearranged DNA. In contrast to most comparative multi-sequence alignment tools (reviewed by <abbrgrp><abbr bid="B7">7</abbr></abbrgrp>), which display columns of sequences that contain gaps to optimize alignments, the species-centric <it>EvoPrint </it>is a single uninterrupted sequence and thus displays more bases in a single view than is possible with conventional alignments. In addition, the uninterrupted readout allows for the rapid extraction and automated curation of conserved DNA from the genome of interest.</p>
         <p>At the core of the original multi-genome <it>EvoPrinter </it>alignment algorithms is the BLAT algorithm <abbrgrp><abbr bid="B6">6</abbr></abbrgrp> for pairwise alignments. Although BLAT alignments generate uninterrupted representations of the aligning regions, one drawback of BLAT when performing alignments of evolutionarily distant DNAs, as initially noted by Kent <abbrgrp><abbr bid="B6">6</abbr></abbrgrp>, is that short regions of homology that span the non-overlapping 11-mers go undetected. We developed <it>e</it>BLAT to overcome the inability of BLAT to detect these short blocks of homology. To accomplish this, each genome is indexed three independent ways, each staggered differently; additionally, the alignment parameters have been adjusted to enhance the detection of short blocks of sequence conservation. By performing three independent alignments using the staggered indices with the optimized alignment parameters and then superimposing the resulting alignments to show all aligning sequences, the overall detection of conserved sequences has been improved by as much as 75% when evolutionary distant orthologous sequences are aligned.</p>
         <p>In addition to the automated alignments for bacteria, nematode, mosquito, <it>Drosophila</it>, and vertebrate genomes, and the higher <it>e</it>BLAT resolution, <it>EvoPrinterHD </it>includes algorithms that search the intra-genomic aligning regions for rearrangements, duplications and sequencing gaps. <it>EvoPrints </it>generated with composite <it>e</it>BLATs highlight conserved sequences within the reference DNA irrespective of genomic rearrangements within one or more of the aligning regions. Four additional programs have been added: (1) an <it>EvoDifferences </it>profile, portraying in a single view the conserved sequences that are detected in all but one of the species included in the <it>EvoPrint</it>; (2) input reference DNA exchange, allowing for detection of species-specific changes in the less-conserved DNA flanking MCSs; (3) automated extraction and curation of conserved sequence blocks (CSBs), facilitating their comparative analysis <abbrgrp><abbr bid="B4">4</abbr></abbrgrp>, and (4) for bacteria, an <it>EvoUnique </it>print that highlights unique or uniquely shared sequences among subsets of genomes. Due in part to its speed and flexibility of genome selection, <it>EvoPrinterHD </it>interfaces well with other web-accessed tools. The time required to undertake a comparative genome analysis of sequences that contain putative <it>cis</it>-regulatory enhancers is significantly reduced. For example, a 12 <it>Drosophila EvoPrint </it>analysis and curation of CSBs within a 2 Kb genomic region that contains a cluster of transcription factor DNA-binding sites (discovered using the <it>FlyEnhancer </it>genome motif search tool <abbrgrp><abbr bid="B8">8</abbr></abbrgrp>) requires less than 30 seconds. Once CSBs are discovered, subsequent analysis via <it>cis</it>-Decoder algorithms enable the generation of conserved sequence tag libraries that further facilitate enhancer comparative studies.</p>
      </sec>
      <sec>
         <st>
            <p>Results and Discussion</p>
         </st>
         <p>The following is a description of the sequential steps and accompanying algorithms used by <it>EvoPrinterHD </it>to identify conserved sequences shared among multiple genomes. Instructions and a tutorial for optimizing its use can be accessed at the <it>EvoPrinterHD </it>web site <abbrgrp><abbr bid="B9">9</abbr></abbrgrp>.</p>
         <sec>
            <st>
               <p>Genome Indexing</p>
            </st>
            <p>In addition to the original non-overlapping 11-mer genomic index of BLAT <abbrgrp><abbr bid="B6">6</abbr></abbrgrp>, <it>EvoPrinterHD </it>indexes each genome into a second set of non-overlapping 11-mers, offset by four base pairs from the initial indexing, and into a third set of non-overlapping 9-mers. The resulting staggered indexing increases the likelihood that homologous regions missed by any one of the individual indices will be identified. The use of multiple genome indices and optimization of the alignment phase parameters (see below) is the basis of the enhanced detection of conserved sequences between evolutionarily distant orthologous DNAs.</p>
            <p><it>EvoPrinterHD </it>currently holds in memory three independent indices of each of 37 bacteria, 3 mosquito, 5 nematode, 12 <it>Drosophila </it>and 20 vertebrate genomes, representing ~112 billion bp in total memory.</p>
         </sec>
         <sec>
            <st>
               <p>Modification of BLAT search and alignment parameters</p>
            </st>
            <p>The alignment sensitivity of <it>EvoPrinterHD </it>for the discovering short blocks of conserved sequence homology between evolutionary distant orthologs was increased by optimizing the Genomic Finding (gf) client program parameters of the original BLAT algorithm <abbrgrp><abbr bid="B6">6</abbr></abbrgrp>. The search and alignment parameters were adjusted by: (1) optimizing the stringency factor for low homology alignments by increasing it from 0.0005 to 0.001, (2) reducing the initial expansion gap between adjacent hits from a setting of four to three, (3) reducing the additional expansion gap penalty from three to one, (4) maximizing the allowable gaps and inserts from 12 to 16, and (5) changing the value of allowable codon gap parameter from two to three to optimize for codon polymorphisms in open reading frames.</p>
         </sec>
         <sec>
            <st>
               <p>Detecting conserved sequences with EvoPrinterHD algorithms</p>
            </st>
            <p>To maximize the identification of short CSBs between evolutionary divergent orthologs, <it>EvoPrinterHD </it>generates 3 different input reference DNA vs. test genome BLAT alignments to the same aligning region using the three indices described above. As an output of the client program, <it>EvoPrinterHD </it>then generates a superimposed composite of the 3 different alignments. The algorithm does this by first creating an array of nucleotide strings of each of the 3 input reference DNA BLAT alignment sequences and then loops through the strings one base at a time, outputting a capital letter when at least one of the 3 readouts has an aligning base at that position, thereby generating a composite readout that displays all conserved bases. The program also generates BLAT readouts of the test genome aligning region and both are stored in memory for later analysis, <it>EvoPrint </it>generation and for exchange of input reference DNA, accomplished by selecting one of the aligning region sequences as the new reference sequence to reinitiate the analysis. The algorithm also generates <it>e</it>BLATs for the second and third highest score aligning regions for each of the selected genomes.</p>
            <p>The mosquito, nematode, <it>Drosophila </it>and <it>Staphylococcus EvoPrinterHD </it>algorithms automatically generate, respectively, 27, 45, 108 and 153 pairwise BLAT alignments, assembles 9, 15, 36, and 51 <it>e</it>BLAT readouts, and then superimposes the individual pairwise <it>e</it>BLAT alignments (3 per genome) to generate a color-coded composite-<it>e</it>BLAT (c<it>e</it>BLAT) for each aligning region. The vertebrate <it>EvoPrinterHD </it>and enteric bacteria <it>EvoPrinterHD </it>both generate up to 180 pairwise BLAT alignments assembling 60 <it>e</it>BLAT readouts and 20 c<it>e</it>BLATs. To reduce alignment times, <it>EvoPrinterHD </it>algorithms currently employ two <it>Dell PowerEdge </it>(2.8 GHz/64 GB RAM; 6950 series) dual quad-core processor servers operating in parallel with the <it>RedHat Enterprise Linux </it>5 operating system and the Network File System to simultaneously query multiple indexed genomes.</p>
            <p>To assess the efficacy of <it>e</it>BLAT alignments in comparison to the original BLAT, we compared the pairwise alignment scores (the total number of aligning bases in the input DNA) of <it>e</it>BLAT to those obtained with BLAT, using 10 different intergenic regions from the <it>Drosophila melanogaster </it>genome (Figure <figr fid="F1">1</figr>). The genomic fragments (1.3 to 4.7 kb in length -totaling 27.7 kb) were selected because they each had been previously shown to contain <it>cis</it>-regulatory transcriptional enhancers. They include DNA flanking the following genes: <it>gooseberry-neuro </it><abbrgrp><abbr bid="B10">10</abbr></abbrgrp>, <it>snail </it><abbrgrp><abbr bid="B11">11</abbr></abbrgrp>, <it>hunchback </it><abbrgrp><abbr bid="B12">12</abbr></abbrgrp>, <it>slit </it>(enhancer 2.6 RV) <abbrgrp><abbr bid="B13">13</abbr></abbrgrp>, <it>string </it>(enhancer 5.8) <abbrgrp><abbr bid="B14">14</abbr></abbrgrp>, <it>atonal </it><abbrgrp><abbr bid="B15">15</abbr></abbrgrp>, <it>Sex combs reduced </it>(enhancer 3.0 RR) <abbrgrp><abbr bid="B16">16</abbr></abbrgrp>, <it>Toll </it>(enhancer 6.5 RL/LR) <abbrgrp><abbr bid="B13">13</abbr></abbrgrp> and <it>Par domain protein 1 </it>(1<sup>st </sup>intron enhancer) <abbrgrp><abbr bid="B17">17</abbr></abbrgrp>. Nine of these regions are described in <it>RedFly</it>, the regulatory element database for <it>Drosophila </it><abbrgrp><abbr bid="B18">18</abbr></abbrgrp>, while the tenth, the <it>nerfin-1 </it>neuroblast enhancer was identified by A. Kuzin in the Odenwald laboratory (personal communication). In addition, twelve genome <it>EvoPrint </it>analysis of each of the ten intragenic regions revealed that each region contained highly conserved sequences that were shared by all Drosophilids (data not shown). As demonstrated in Figure <figr fid="F1">1</figr>, the pairwise <it>e</it>BLAT alignment exhibited only a modest increase in the identification of shared sequences between closely related species over the conventional BLAT alignment; however, <it>e</it>BLAT identified significantly more conserved sequences when the <it>D. melanogaster </it>genomic fragments were aligned to the more evolutionarily distant orthologs. The increased identification of shared sequences varied from a 7.5% increase for <it>D. simulans </it>(evolutionary divergent time from <it>D. melanogaster </it>is ~2 My) to 74.8% for <it>D. grimshawi </it>(separated from <it>D. melanogaster </it>for ~40 My). The same enhanced discovery of sequence conservation was also observed when evolutionarily distant nematode or vertebrate species were compared. For example, <it>e</it>BLAT alignments between <it>C. elegans </it>and <it>C. briggsae </it>or human and <it>Xenopus </it>orthologous DNAs both identified greater than 70% more shared sequences when compared to original BLAT alignments (data not shown).</p>
            <fig id="F1">
               <title>
                  <p>Figure 1</p>
               </title>
               <caption>
                  <p>Increased identification of conserved DNA in evolutionary distant orthologs via <it>enhanced</it>-BLAT pairwise alignments</p>
               </caption>
               <text>
                  <p><b>Increased identification of conserved DNA in evolutionary distant orthologs via <it>enhanced</it>-BLAT pairwise alignments</b>. Shown are the total number of aligning bases in pairwise BLAT and pairwise <it>enhanced</it>-BLAT alignments from 10 different <it>Drosophila melanogaster </it>genomic regions that contain conserved sequence blocks (1.3 to 4.7 kb; 27.7 kb in total) aligned to the orthologous DNAs from <it>D. melanogaster</it>, <it>D. simulans</it>, <it>D. sechellia</it>, <it>D. yakuba</it>, <it>D. erecta</it>, <it>D. ananassae</it>, <it>D. pseudoobscura</it>, <it>D. virilis</it>, <it>D. mojavensis </it>or <it>D. grimshawi</it>. The average percent increase in the number of <it>e</it>BLAT aligning bases vs. BLAT alignments is also shown. The approximate evolutionary separation/divergence time (in million years) between <it>D. melanogaster </it>and the other Drosophilids is indicated in brackets.</p>
               </text>
               <graphic file="1471-2164-9-106-1"/>
            </fig>
            <p>Another measure of <it>e</it>BLAT efficacy in identifying evolutionary conservation is to compare the detection of conserved sequences when <it>e</it>BLAT vs. BLAT alignments are used to generate an <it>EvoPrint</it>. To demonstrate the increased alignment sensitivity of <it>e</it>BLAT over BLAT in the <it>EvoPrint </it>analysis, the <it>Drosophila melanogaster Kr&#252;ppel </it>central domain enhancer <abbrgrp><abbr bid="B19">19</abbr></abbrgrp> was <it>EvoPrinted </it>using 11 of the <it>Drosophila </it>species (Figure <figr fid="F2">2A</figr>). The original <it>EvoPrinter </it>(which uses the BLAT algorithm) detected a total of 169 conserved bases compared with 254 conserved bases identified with an <it>e</it>BLAT generated EvoPrint &#8211; a 50% increase in alignment recognition. In addition, the <it>EvoDifferences </it>profile identified additional bases (shown in color) that are conserved in all but one of the genomes used to generate the <it>EvoPrint </it>(Figure <figr fid="F2">2B</figr> and see below).</p>
            <fig id="F2">
               <title>
                  <p>Figure 2</p>
               </title>
               <caption>
                  <p><it>EvoPrints </it>generated with <it>e</it>BLAT alignments reveal additional conserved sequences when compared to the original method</p>
               </caption>
               <text>
                  <p><b><it>EvoPrints </it>generated with <it>e</it>BLAT alignments reveal additional conserved sequences when compared to the original method</b>. A) Shown is a composite <it>EvoPrint </it>of the <it>Drosophila melanogaster Kr&#252;ppel </it>central domain (CD2) enhancer region generated by superimposing an <it>EvoPrint </it>generated from <it>e</it>BLAT alignments and a second prepared from BLAT alignments. Pairwise alignments between <it>D. melanogaster </it>and <it>D. sechellia, D. simulans, D. erecta, D. yakuba, D. ananassae, D. pseudoobscura, D. persimilis, D. virilis, D. willistoni, D. mojavensis and D. grimshawi </it>were used to generate both <it>EvoPrints</it>. Conserved sequences identified by both procedures are shown as uppercase black nucleotides and yellow highlighted nucleotides represent the additional sequences recognized by <it>EvoPrinterHD</it>. The boxed region contains the <it>cis</it>-regulatory DNA required for enhancer function as determined by Hoch et al. [9]. B) An <it>EvoDifferences </it>profile identifies those DNA sequences that are shared by all but one of the species included in the analysis. As in the <it>EvoPrint</it>, black uppercase letters indicate sequences shared by all species and colored uppercase letters, which denote individual species, represent sequences that were not detected by the <it>e</it>BLAT alignment for just one of the genomes included in the <it>EvoPrint </it>analysis (<it>D. erecta</it>, dark-red; <it>D. yakuba</it>, teal; <it>D. pseudoobscura</it>, light-blue; <it>D. persimilis</it>, brown; <it>D. ananassae</it>, pink; <it>D. virilis</it>, orange; <it>D. willistoni</it>, blue; <it>D. mojavensis</it>, green; <it>or D. grimshawi</it>, red). The underline indicates the region of the <it>EvoDifferences </it>profile that is compared with the alignments obtained from the UCSC genome browser (shown in panel C). C) Comparison of the <it>EvoDifferences </it>profile with the UCSC genome alignments. Shown is the underlined sequence in panel (B) aligned to the corresponding alignments obtained at the <it>Drosophila </it>UCSC comparative genome bioinformatics web site.</p>
               </text>
               <graphic file="1471-2164-9-106-2"/>
            </fig>
            <p>We also compared <it>EvoPrinterHD</it>-generated <it>EvoPrints </it>to multi-genome alignments obtained from the UCSC comparative genome bioinformatics alignment program <abbrgrp><abbr bid="B20">20</abbr><abbr bid="B21">21</abbr></abbrgrp>. The alignment resolution of <it>EvoPrinterHD </it>is equivalent to the multi-species UCSC alignments in detecting CSBs. The two alignment programs detect the same conserved sequences with 93% to 95% correspondence in five different enhancers compared (Figure <figr fid="F2">2C</figr>; and data not shown).</p>
         </sec>
         <sec>
            <st>
               <p>EvoPrinterHD repeat finder</p>
            </st>
            <p>One prominent feature of all bacteria and metazoan genomes is that they harbor diverse populations of repetitive elements that range in copy number from single duplications to thousands of transposable elements dispersed throughout the genome. Given that many of these repeats contain highly conserved sequences that may interfere with alignments between evolutionary distant orthologs, it is important to first identify the repetitive sequence(s) within the reference genome before comparative analysis is considered. To accomplish this, the <it>EvoPrinterHD </it>repeat finder algorithm superimposes the first, second and third highest scoring <it>e</it>BLAT alignments of the input DNA to its own genome and then color-codes the readout to identify single or multiple repeat sequences within the input reference DNA (Figure <figr fid="F3">3</figr>). Sequences that have one additional copy in the reference genome are noted with blue-colored uppercase bases while those that are present three or more times are highlighted with red-colored bases. The algorithm also reveals if one of the multiple repeat sequences is more homologous to the repeat present in the input DNA by highlighting single repeat sequences that flank the core multi-repeat element (Figure <figr fid="F3">3</figr>). By underlining repeat sequences in the <it>EvoPrint </it>and <it>EvoDifference </it>readouts potential 'false positive' alignments that have their origin in repetitive elements are highlighted.</p>
            <fig id="F3">
               <title>
                  <p>Figure 3</p>
               </title>
               <caption>
                  <p><it>EvoPrinterHD </it>repeat finder algorithm identifies repetitive elements within the input DNA</p>
               </caption>
               <text>
                  <p><b><it>EvoPrinterHD </it>repeat finder algorithm identifies repetitive elements within the input DNA</b>. The repeat finder algorithm superimposes the three highest scoring <it>e</it>BLAT input reference DNA to reference genome alignments to reveal those sequences within the input DNA that are repeated within the input DNA itself and/or elsewhere in the reference genome. Single-copy repeat sequences, identified just once in the second or third highest scoring <it>e</it>BLATs but not in both, are highlighted by blue-colored bases. Multiple (&#8805; 3 copies) repeats are highlighted with red-colored bases. Shown is a 1,958 bp genomic fragment that flanks the 3' end of the <it>Caenorhabditis elegans egl-26 </it>gene (+5,290 to +7,248 bp from the start of transcription) that was initially part of a 20 kb input DNA repeat finder readout. Note, the single copy repeat (blue-colored) sequences that flank the multi-copy repeat sequences (red-colored) indicate that one of the repeat copies located elsewhere in the reference genome is more homologous to the input DNA repeat sequence than with its other repeat family members.</p>
               </text>
               <graphic file="1471-2164-9-106-3"/>
            </fig>
         </sec>
         <sec>
            <st>
               <p>Alignment scorecard</p>
            </st>
            <p>As a prelude to generating an <it>EvoPrint</it>, the inter-genome comparative program first displays the results of the different alignments in a tabular form referred to here as the alignment scorecard (Figure <figr fid="F4">4</figr> and see examples at the website tutorial <abbrgrp><abbr bid="B9">9</abbr></abbrgrp>). The scorecard shown in Figure <figr fid="F4">4</figr> was generated from a <it>cis</it>-regulatory enhancer region associated with the <it>Drosophila melanogaster fushi tarazu </it>gene (see below for more details). The alignment score for each species' <it>e</it>BLAT alignments shows the total number of aligning bases in the input reference DNA. The positions of the first and last aligning bases in the input reference DNA are also noted, along with the number of sequencing gaps detected in the aligning regions of the test genomes and the total number of "Ns" (the presumed number of missing bases as indicated in the database). Links to the alignment readouts for each species are provided on the scorecard, allowing the user to view the individual reference DNA and test species alignments. A second link for each species leads to a color-coded composite <it>e</it>BLAT of all 3 of its alignments that highlights sequence rearrangements and/or duplications in the test species (see below). The data is arrayed in a descending order of alignment scores. By default, top scoring genomes with no sequencing gaps in their highest scoring alignments are selected for the initial <it>EvoPrint </it>analysis. After the initial <it>EvoPrint </it>and <it>EvoDifferences </it>profile is examined, it is recommended that the lower scoring species be included one at a time to extend the evolutionary comparison (see below).</p>
            <fig id="F4">
               <title>
                  <p>Figure 4</p>
               </title>
               <caption>
                  <p><it>EvoPrinterHD </it>alignment scorecard</p>
               </caption>
               <text>
                  <p><b><it>EvoPrinterHD </it>alignment scorecard</b>. A) Once the <it>e</it>BLAT alignment phase is completed, the algorithm initially displays the data in a tabular/scorecard form. The total number of aligning bases for each pair-wise alignment (the homology score) is shown along with the position of the first and last aligning bases within the input reference DNA sequence. The genomes are arrayed in descending order of alignment score and the 3 highest pairwise alignment scores for each species are shown. The intra-genomic algorithm compares the second and third scoring alignments of each genome to its highest scoring alignment to identify potential regions that harbor conserved sequences that have either rearranged and/or duplicated, in addition to identifying sequencing gaps within the aligning regions. The input reference DNA <it>e</it>BLAT readouts and the aligning region BLAT for each alignment can be accessed by clicking on the species name and links to the Composite eBLATs are also provided. Each species can be selected or deselected for <it>EvoPrinting </it>and by default, <it>EvoPrinterHD </it>selects the 6 highest scoring species for generating the initial <it>EvoPrint </it>and <it>EvoDifferences </it>profile readouts. "Ns" represent the number of sequencing gaps detected in each of the aligning regions. The "R" value (indicative of a putative rearrangement) for the second and third alignments indicates the number of aligning bases not detected in the first alignment and the "D" value (indicative of a putative duplication) is the number of aligning bases shared with the first alignment. A link in provided for changing the input reference DNA to the aligning region of one of the other species. Shown is the alignment scorecard for a 3,570 bp <it>Drosophila melanogaster </it>sequence that is located 6 kb upstream of the <it>fushi tarazu </it>gene. As indicated by the "R/D" values for each of the species, the intra-genomic comparative program has identified potential rearrangements and duplications. The color code reveals 1) whether the R or D value is derived from the second or third alignment and 2) whether a putative rearrangement or duplication has been detected.</p>
               </text>
               <graphic file="1471-2164-9-106-4"/>
            </fig>
         </sec>
         <sec>
            <st>
               <p>Identification of rearranged and duplicated conserved sequences</p>
            </st>
            <p>Once the initial <it>e</it>BLAT alignments are completed, the <it>EvoPrinterHD </it>intra-genomic comparative algorithm automatically determines: (1) the number of aligning bases in the second and third <it>e</it>BLAT alignments that are not identified in the first (highest scoring) alignment for each species, called the "R" value indicating putative rearrangements in the test species, (2) the number of aligning bases in the second and third alignments that are also aligning in the highest score alignment, termed the "D" value for putative duplications, and (3) the number of aligning bases that are shared by all three alignments, indicating conserved sequences within putative repetitive elements. For example, the alignment scorecard of a <it>D. melanogaster </it>3,570 bp input reference sequence, located 6 kb 5' to the <it>fushi tarazu </it>gene, reveals that 5 of the 11 species included in the analysis have undergone putative rearrangements in their aligning regions compared to the reference genome (Figure <figr fid="F4">4</figr>). The rearrangements within 4 of the 5 genomes (<it>D. mojavensis, D. grimshawi, D. willistoni </it>and <it>D. virilis</it>) flank the aligning bases in each of their highest score aligning regions (noted by the color coded number in the R column) (Figure <figr fid="F4">4</figr>). c<it>e</it>BLATs of these 5 species identified that each contained at least two different MCS rearrangements relative to the input <it>D. melanogaster </it>reference DNA (Figure <figr fid="F5">5A</figr> and data not shown).</p>
            <fig id="F5">
               <title>
                  <p>Figure 5</p>
               </title>
               <caption>
                  <p>Intra-species c<it>e</it>BLATs and composite-<it>EvoPrints </it>identify conserved sequences within the input reference DNA that have rearranged in the aligning regions of other genomes</p>
               </caption>
               <text>
                  <p><b>Intra-species c<it>e</it>BLATs and composite-<it>EvoPrints </it>identify conserved sequences within the input reference DNA that have rearranged in the aligning regions of other genomes</b>. A) Shown is a <it>D. melanogaster </it>(reference DNA) to <it>D. virilis </it>c<it>e</it>BLAT alignment that spans a 3,570 bp sequence located upstream of the <it>fushi tarazu </it>gene (-7184 to -3,434 bp from its transcription start). Black-colored uppercase nucleotides represent aligning bases found only in the highest scoring <it>D. virilis e</it>BLAT alignment, green-colored bases identify aligning bases that are unique to the second highest scoring alignment and blue-colored bases are aligning bases unique to the third highest score <it>e</it>BLAT alignment. B) Shown is an <it>EvoPrint </it>of the input <it>D. melanogaster </it>sequence shown in (A) that was generated with c<it>e</it>BLATs of the <it>D. simulans</it>, <it>D. sechellia</it>, <it>D. yakuba</it>, <it>D. erecta</it>, <it>D. ananassae</it>, <it>D. pseudoobscura</it>, <it>D. persimilis</it>, <it>D. virilis</it>, <it>D. mojavensis</it>, <it>D. grimshawi </it>and <it>D. willistoni </it>alignments. C) The accompanying <it>EvoDifferences </it>profile of the <it>EvoPrint </it>shown in (B). Black uppercase letters are aligning bases shared by all species examined. Colored uppercase letters, which denote individual species, represent sequences that were not aligned in the c<it>e</it>BLAT for just one of the genomes included in the analysis (<it>D. simulans</it>, teal; <it>D. sechellia</it>, dark-red; <it>D. yakuba</it>, brown; <it>D. erecta</it>, light-blue; <it>D. ananassae</it>, orange; <it>D. pseudoobscura</it>, pink; <it>D. virilis</it>, blue; <it>D. mojavensis</it>, green; or <it>D. grimshawi</it>, red).</p>
               </text>
               <graphic file="1471-2164-9-106-5"/>
            </fig>
         </sec>
         <sec>
            <st>
               <p>Generating EvoPrints, and EvoDifferences profiles and EvoUnique Prints</p>
            </st>
            <p>Based on the data provided on the alignment scorecard, different combinations of c<it>e</it>BLAT alignments can be chosen to generate an <it>EvoPrint</it>. The <it>EvoPrinter </it>algorithm <abbrgrp><abbr bid="B5">5</abbr></abbrgrp> creates an array of nucleotide strings from each of the selected alignments and then looks for conservation of sequence by looping through each of the strings one base at a time, outputting an uppercase base for only those input reference DNA nucleotides that are aligned in all of the different c<it>e</it>BLATs included in the analysis (Figure <figr fid="F5">5B</figr>). Those DNA bases within the input DNA that are not shared with all species are represented as lowercase nucleotides. The "All Alignments or None" options for each species allows for rapid changes in the repertoire of species alignments used to generate an <it>EvoPrint</it>. As a default setting, <it>EvoPrinterHD </it>selects c<it>e</it>BLATs to generate an <it>EvoPrint</it>; however, the user can select just the highest scoring alignment to generate an <it>EvoPrint</it>, and doing so eliminates potential false positives that are identified as repeat sequences. As discussed above, when evolutionarily distant species are included in the analysis, MCS containing genomic rearrangements in one or more of the selected genomes are identified in the second and third <it>e</it>BLAT alignments. To include the rearranged sequences in the analysis, c<it>e</it>BLATs are used to generate the <it>EvoPrint</it>. The use of the intra-species c<it>e</it>BLATs in the <it>EvoPrint </it>procedure, rather than selecting first, second or third alignments for generation of the <it>EvoPrint</it>, enhances the ability of <it>EvoPrinterHD </it>to identify and display, in a single uninterrupted sequence, conserved sequences within the input DNA even though the MCSs reside within genomic rearrangements in one or more of the orthologous DNAs included in the comparative analysis. Our experience indicates that highly repetitious sequences do not interfere with the use of c<it>e</it>BLATs, because the presence and position of repeats varies across the species used to generate the <it>EvoPrint</it>. For the 20 vertebrate or for the enteric bacteria, genomes can be added or removed from the initial analysis simply by returning to the selection page and adding or deselecting different genomes. Because <it>EvoPrinterHD </it>holds the previous alignments in memory, the time required to add additional genomes to the comparative analysis is significantly reduced.</p>
            <p>An additional readout, the <it>EvoDifferences </it>profile, is also displayed along with the <it>EvoPrint</it>; it highlights the unique differences (conserved sequence losses) that each species contributes to the comparative analysis (Figures <figr fid="F2">2B</figr> and <figr fid="F5">5C</figr>). The <it>EvoDifferences </it>profile can also be considered a "relaxed <it>EvoPrint</it>" since bases identified by the different colors are present in all species except for the single species denoted by that color. The apparent absence of a conserved sequence or base change in a single species could have several explanations: (1) the difference represents a unique evolutionary change, (2) it may be the result of a sequencing error, and/or (3) the sequence is present but not identified by the c<it>e</it>BLAT due to three or more genomic rearrangements in the aligning region.</p>
            <p>For bacteria, a third readout, the color-coded <it>EvoUnique </it>print, highlights those bases in the input reference DNA that are unique (that do not align with any of the other genomes included in the analysis) and those bases that align with only a single other or two other genomes included in the analysis (data not shown).</p>
         </sec>
         <sec>
            <st>
               <p>Parsing and curation of selected conserved sequences</p>
            </st>
            <p>To facilitate the comparative analysis of different conserved sequences from different enhancers, <it>EvoPrinterHD </it>allows for the curation of CSBs by enabling the user to automatically extract and collate CSBs in both forward and reverse-complimented orientations (data not shown). The "extract conserved sequence block" option (located at the top of each <it>EvoPrint </it>readout) provides for the automatic extraction, naming and consecutive numbering of 6 bp or longer CSBs from selected regions of an <it>EvoPrint </it>or <it>EvoDifferences </it>profile (see tutorial <abbrgrp><abbr bid="B9">9</abbr></abbrgrp>). In addition to the annotated list of forward and reverse sequences the readout shows the selected <it>EvoPrinted </it>region from which the conserved sequences were extracted. A link is also provided to the <it>cis</it>-Decoder CSB comparative algorithms <abbrgrp><abbr bid="B4">4</abbr></abbrgrp>.</p>
         </sec>
         <sec>
            <st>
               <p>Identifying species-specific changes in less-conserved DNA</p>
            </st>
            <p><it>EvoPrinterHD </it>allows for the rapid exchange of the input reference DNA; it draws from memory the genomic sequence of the highest aligning region of any species identified in the initial analysis. Once a change in reference DNA is requested (at the additional alignment options page <abbrgrp><abbr bid="B8">8</abbr></abbrgrp>), the alignment process is automatically reinitiated using the highest scoring aligning region of the selected genome as the new input reference DNA. Figure <figr fid="F5">5</figr> highlights the genome-specific variability of less-conserved sequences between vertebrate MCS regions. Within the second intron of the human <it>CASZ1 </it>gene <abbrgrp><abbr bid="B22">22</abbr></abbrgrp>, a homolog of the <it>Drosophila castor </it>gene <abbrgrp><abbr bid="B23">23</abbr><abbr bid="B24">24</abbr></abbrgrp>, two highly conserved MCSs were identified that are each present once in most, if not all, vertebrate genomes. Using the human <it>CASZ1 </it>2<sup>nd </sup>intron as the input reference DNA and all 20 vertebrate genomes, a relaxed <it>EvoPrint </it>reveals that the intervening distance between the MCSs in the human genome is 441 bp (Figure <figr fid="F6">6A</figr>). By exchanging the human sequence with the highest scoring aligning region from the zebrafish genome and repeating the analysis, the separation between the conserved sequence clusters was found to be 7,502 bp (Figure <figr fid="F6">6B</figr>). Both human and zebrafish relaxed <it>EvoPrints </it>identified the same conserved bases in the two MSC clusters with few exceptions, and the spacing between conserved sequence blocks within the MCSs remained almost unchanged. Additional reference DNA swapping revealed that the non-or less-conserved intervening sequence between these MCSs is quite variable. For example, in fish the length varied between 1,609 to 7,502 bp and in frogs and chickens the distance was 1,610 and 408 bp, respectively (data not shown).</p>
            <fig id="F6">
               <title>
                  <p>Figure 6</p>
               </title>
               <caption>
                  <p>Genome-specific flexibility in less-conserved sequences revealed by exchanging input reference DNAs</p>
               </caption>
               <text>
                  <p><b>Genome-specific flexibility in less-conserved sequences revealed by exchanging input reference DNAs</b>. By swapping the input reference DNA for one of the aligning regions in another genome and reinitiating the <it>EvoPrint </it>analysis, one can identify species-specific changes in the spacing between conserved sequences. A) <it>EvoPrint </it>analysis of the human CASZ1 gene identified two highly conserved MCSs within its second intron that are separated by 441 bp. Shown is a relaxed <it>EvoPrint </it>that was generated with c<it>e</it>BLAT alignments of the human sequence to: chimpanzee, rhesus, mouse, rat, dog, cat, horse, cow, hedgehog, elephant, armadillo, opossum, chicken, <it>X. tropicalis, Fugu, Tetraodon, Medaka</it>, stickleback, and zebrafish genomes. Uppercase black-colored bases are present in all orthologs or found in all but one of the aligning regions. B) Shown is a relaxed <it>EvoPrint </it>obtained when the human input reference sequence, used to generate the <it>EvoPrint </it>shown in (A), is exchanged for the highest scoring aligning region in the zebrafish genome. The zebrafish CASZ1 relaxed EvoPrint reveals that the intervening genomic region between the two highly conserved MCSs in the zebrafish orthologue is 7,061 bp longer than that found in the human genome.</p>
               </text>
               <graphic file="1471-2164-9-106-6"/>
            </fig>
         </sec>
      </sec>
      <sec>
         <st>
            <p>Conclusion</p>
         </st>
         <p><it>EvoPrinterHD </it>affords a rapid, convenient way to detect and curate DNA sequence conservation between related and evolutionarily distant animals. When multiple genomes are included in the analysis, the uninterrupted <it>EvoPrint </it>readout provides a species-centric view of conserved sequences that are required for gene function. <it>EvoPrinterHD </it>advances the <it>EvoPrint </it>method by providing an automated higher-definition view of sequence conservation from which the conserved sequence blocks can be rapidly curated for subsequent analysis. <it>EvoPrinterHD </it>also identifies genomic regions within one or more of the selected species that harbor rearrangements of the conserved DNA, and identifies unique or uniquely shared DNA sequences within bacterial genomes.</p>
      </sec>
      <sec>
         <st>
            <p>Methods</p>
         </st>
         <sec>
            <st>
               <p>Genome sequence files and their assembly dates</p>
            </st>
            <p>The following genome sequence files were curated from the Genome Bioinformatics Group of University of California, Santa Cruz <abbrgrp><abbr bid="B25">25</abbr></abbrgrp>: Human, March 2006 (hg18); Chimpanzee, March 2006 (panTro2); Rhesus, January 2006 (rheMac2); Rat, November 2004 (rn4); Mouse, February 2006 (mm8); Cat, March 2006 (felCat3); Dog, May 2005 (canFam2); Horse, January 2007 (equCab1); Cow, March 2005 (bosTau2); Opossum, January 2006 (monDom4); Chicken, May 2006 (galGal3); <it>Xenopus tropicalis</it>, August 2005 (xenTro2); Zebrafish, March 2006 (danRer4); <it>Tetraodon</it>, February 2004 (tetNig1); <it>Fugu</it>, October 2004 (fr2); Stickleback, February 2006 (gasAcu1); <it>Medaka</it>, April 2006 (oryLat1); <it>D. melanogaster</it>, April 2006 (dm3); <it>D. simulans</it>, April 2005 (droSim1); <it>D. sechellia</it>, October 2005 (droSec1); <it>D. yakuba</it>, November 2005 (droYak2); <it>D. erecta</it>, August 2005 (droEre1); <it>D. ananassae</it>, August 2005 (droAna2); <it>D. pseudoobscura</it>, November 2005 (dp3); <it>D. persimilis</it>, October 2005 (droPer1); <it>D. virilis</it>, August 2005 (droVir2); <it>D. mojavensis</it>, August 2005 (droMoj2); <it>D. grimshawi</it>, August 2005 (droGri1); <it>C. elegans</it>, January 2007 (ce4); <it>C. brenneri</it>, January 2007 (caePb1); <it>C. briggsae</it>, January 2007 (cb3); <it>C. remanei</it>, March 2006 (caeRem2); and <it>P. pacificus</it>, February 2007 (priPac1); The genome sequence files for the Elephant, June 2005; Hedgehog, June 2006 and Armadillo, June 2005 were downloaded from the Broad Institute <abbrgrp><abbr bid="B26">26</abbr></abbrgrp>.</p>
            <p>The following bacteria genome sequence files were curated from the BacMap database of University of Alberta <abbrgrp><abbr bid="B27">27</abbr></abbrgrp>: <it>Staphylococcus aureus </it>COL; <it>Staphylococcus aureus </it>MRSA252; <it>Staphylococcus aureus </it>MSSA476, <it>Staphylococcus aureus </it>Mu50; <it>Staphylococcus aureus </it>MW2; <it>Staphylococcus aureus </it>N315; <it>Staphylococcus aureus subsp. aureus </it>NCTC 8325; <it>Staphylococcus aureus </it>RF122; <it>Staphylococcus aureus subsp. aureus </it>USA300; <it>Staphylococcus epidermidis </it>ATCC 12228; <it>Staphylococcus epidermidis </it>RP62; <it>Staphylococcus haemolyticus </it>JCSC1435; <it>Escherichia coli </it>536; <it>Escherichia coli </it>APEC O1; <it>Escherichia coli </it>CFT073; <it>Escherichia coli </it>O157:H7 EDL933; <it>Escherichia coli </it>K12 MG1655; <it>Escherichia coli </it>W3110; <it>Escherichia coli </it>O157:H7 <it>Sakai</it>; <it>Klebsiella pneumoniae </it>MGH 78578; <it>Salmonella enterica </it>Choleraesuis SC-B67; <it>Salmonella enterica </it>Paratypi A ATCC 9150; <it>Salmonella typhimurium </it>LT2; <it>Salmonella enterica </it>CT18; <it>Salmonella enterica </it>Ty2; <it>Shigella boydii </it>Sb227; <it>Shigella dysenteriae </it>Sd197; <it>Shigella flexneri </it>2a 2457T; and <it>Shigella flexneri </it>301. The genome sequence files for <it>Staphylococcus aureus subsp. aureus </it>JH1, <it>Staphylococcus aureus subsp. aureus </it>JH9, <it>Staphylococcus aureus </it>Mu3, and <it>Staphylococcus aureus subsp. aureus str</it>. Newman were curated from the European Bioinformatics Institute of the European Molecular Biology Laboratory <abbrgrp><abbr bid="B28">28</abbr></abbrgrp>. The genome sequence file for <it>Escherichia coli </it>UT189 was taken from Enteropathogen Resource Integration Center <abbrgrp><abbr bid="B29">29</abbr></abbrgrp>, and genome sequence data for <it>Salmonella bongori </it>was downloaded from the Sanger Institute Sequencing Centre <abbrgrp><abbr bid="B30">30</abbr></abbrgrp>.</p>
            <p>The mosquito genome sequence files for <it>Aedes aegypti</it>, <it>Anopheles gambiae </it>and <it>Culex pipiens </it>were curated from the VectorBase database [31].</p>
         </sec>
      </sec>
      <sec>
         <st>
            <p>Authors' contributions</p>
         </st>
         <p>ASY, YL and YF participated in the design and implementation of the algorithms. JR participated in the web page design and tutorial. TB and WFO conceived the study, participated in the design and coordination of the algorithms and prepared the manuscript. All authors have read and approved the final draft of the manuscript.</p>
      </sec>
   </bdy>
   <bm>
      <ack>
         <sec>
            <st>
               <p>Acknowledgements</p>
            </st>
            <p>We are grateful to Jim Kent, Kory Johnson and Howard Nash for helpful discussions and advice during the <it>EvoPrinterHD </it>development phase. We also thank Ken Weeks and Jack Bishop for their technical expertise and acknowledge the editorial expertise and assistance of Judith Brody. This research was supported by the Intramural Research Program of the NIH, NINDS.</p>
         </sec>
      </ack>
      <refgrp>
         <bibl id="B1">
            <title>
               <p>Human-mouse genome comparisons to locate regulatory sites</p>
            </title>
            <aug>
               <au>
                  <snm>Wasserman</snm>
                  <fnm>WW</fnm>
               </au>
               <au>
                  <snm>Palumbo</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Thompson</snm>
                  <fnm>W</fnm>
               </au>
               <au>
                  <snm>Fickett</snm>
                  <fnm>JW</fnm>
               </au>
               <au>
                  <snm>Lawrence</snm>
                  <fnm>CE</fnm>
               </au>
            </aug>
            <source>Nat Genet</source>
            <pubdate>2000</pubdate>
            <volume>26</volume>
            <fpage>225</fpage>
            <lpage>228</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/79965</pubid>
                  <pubid idtype="pmpid" link="fulltext">11017083</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B2">
            <title>
               <p>Patchy interspecific sequence similarities efficiently identify positive cis-regulatory elements in the sea urchin</p>
            </title>
            <aug>
               <au>
                  <snm>Yuh</snm>
                  <fnm>CH</fnm>
               </au>
               <au>
                  <snm>Brown</snm>
                  <fnm>CT</fnm>
               </au>
               <au>
                  <snm>Livi</snm>
                  <fnm>CB</fnm>
               </au>
               <au>
                  <snm>Rowen</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Clarke</snm>
                  <fnm>PJ</fnm>
               </au>
               <au>
                  <snm>Davidson</snm>
                  <fnm>EH</fnm>
               </au>
            </aug>
            <source>Dev Biol</source>
            <pubdate>2002</pubdate>
            <volume>246</volume>
            <fpage>148</fpage>
            <lpage>161</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1006/dbio.2002.0618</pubid>
                  <pubid idtype="pmpid" link="fulltext">12027440</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B3">
            <title>
               <p>CONREAL: conserved regulatory elements anchored alignment algorithm for identification of transcription factor binding sites by phylogenetic footprinting</p>
            </title>
            <aug>
               <au>
                  <snm>Berezikov</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Guryev</snm>
                  <fnm>V</fnm>
               </au>
               <au>
                  <snm>Plasterk</snm>
                  <fnm>RH</fnm>
               </au>
               <au>
                  <snm>Cuppen</snm>
                  <fnm>E</fnm>
               </au>
            </aug>
            <source>Genome Res</source>
            <pubdate>2004</pubdate>
            <volume>14</volume>
            <fpage>170</fpage>
            <lpage>178</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">314294</pubid>
                  <pubid idtype="pmpid" link="fulltext">14672977</pubid>
                  <pubid idtype="doi">10.1101/gr.1642804</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B4">
            <title>
               <p><it>cis </it>-Decoder discovers constellations of conserved DNA sequences shared among tissue-specific enhancers</p>
            </title>
            <aug>
               <au>
                  <snm>Brody</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Rasband</snm>
                  <fnm>W</fnm>
               </au>
               <au>
                  <snm>Baler</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Kuzin</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Kundu</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Odenwald</snm>
                  <fnm>WF</fnm>
               </au>
            </aug>
            <source>Genome Biol</source>
            <pubdate>2007</pubdate>
            <volume>5</volume>
            <fpage>R75</fpage>
            <xrefbib>
               <pubid idtype="doi">10.1186/gb-2007-8-5-r75</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B5">
            <title>
               <p>EVOPRINTER: a multi-genomic comparative tool for rapid identification of functionally important DNA</p>
            </title>
            <aug>
               <au>
                  <snm>Odenwald</snm>
                  <fnm>WF</fnm>
               </au>
               <au>
                  <snm>Rasband</snm>
                  <fnm>W</fnm>
               </au>
               <au>
                  <snm>Kuzin</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Brody</snm>
                  <fnm>T</fnm>
               </au>
            </aug>
            <source>Proc Natl Acad Sci</source>
            <pubdate>2005</pubdate>
            <volume>102</volume>
            <fpage>14700</fpage>
            <lpage>14705</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1239946</pubid>
                  <pubid idtype="pmpid" link="fulltext">16203978</pubid>
                  <pubid idtype="doi">10.1073/pnas.0506915102</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B6">
            <title>
               <p>BLAT-the BLAST-like alignment tool</p>
            </title>
            <aug>
               <au>
                  <snm>Kent</snm>
                  <fnm>WJ</fnm>
               </au>
            </aug>
            <source>Genome Res</source>
            <pubdate>2002</pubdate>
            <volume>12</volume>
            <fpage>656</fpage>
            <lpage>64</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">187518</pubid>
                  <pubid idtype="pmpid" link="fulltext">11932250</pubid>
                  <pubid idtype="doi">10.1101/gr.229202. Article published online before March 2002</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B7">
            <title>
               <p>Computation and analysis of genomic multi-sequence alignments</p>
            </title>
            <aug>
               <au>
                  <snm>Blanchette</snm>
                  <fnm>M</fnm>
               </au>
            </aug>
            <source>Annu Rev Genomics Hum Genet</source>
            <pubdate>2007</pubdate>
            <volume>8</volume>
            <fpage>193</fpage>
            <lpage>213</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1146/annurev.genom.8.080706.092300</pubid>
                  <pubid idtype="pmpid" link="fulltext">17489682</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B8">
            <title>
               <p>A regulatory code for neurogenic gene expression in the <it>Drosophila </it>embryo</p>
            </title>
            <aug>
               <au>
                  <snm>Markstein</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Zinzen</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Markstein</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Yee</snm>
                  <fnm>KP</fnm>
               </au>
               <au>
                  <snm>Erives</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Stathopoulos</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Levine</snm>
                  <fnm>MA</fnm>
               </au>
            </aug>
            <source>Development</source>
            <pubdate>2004</pubdate>
            <volume>131</volume>
            <fpage>2387</fpage>
            <lpage>94</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1242/dev.01124</pubid>
                  <pubid idtype="pmpid" link="fulltext">15128669</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B9">
            <title>
               <p>EvoPrinter</p>
            </title>
            <url>http://evoprinter.ninds.nih.gov/</url>
         </bibl>
         <bibl id="B10">
            <title>
               <p>Separable regulatory elements mediate the establishment and maintenance of cell states by the <it>Drosophila </it>segment-polarity gene <it>gooseberry</it></p>
            </title>
            <aug>
               <au>
                  <snm>Li</snm>
                  <fnm>X</fnm>
               </au>
               <au>
                  <snm>Gutjahr</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Noll</snm>
                  <fnm>M</fnm>
               </au>
            </aug>
            <source>EMBO J</source>
            <pubdate>1993</pubdate>
            <volume>12</volume>
            <fpage>1427</fpage>
            <lpage>1436</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">413354</pubid>
                  <pubid idtype="pmpid">8096813</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B11">
            <title>
               <p>Neurogenic expression of <it>snail </it>is controlled by separable CNS and PNS promoter elements</p>
            </title>
            <aug>
               <au>
                  <snm>Ip</snm>
                  <fnm>YT</fnm>
               </au>
               <au>
                  <snm>Levine</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Bier</snm>
                  <fnm>E</fnm>
               </au>
            </aug>
            <source>Development</source>
            <pubdate>1994</pubdate>
            <volume>120</volume>
            <fpage>199</fpage>
            <lpage>207</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">8119127</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B12">
            <title>
               <p>Posterior stripe expression of <it>hunchback </it>is driven from two promoters by a common enhancer element</p>
            </title>
            <aug>
               <au>
                  <snm>Margolis</snm>
                  <fnm>JS</fnm>
               </au>
               <au>
                  <snm>Borowsky</snm>
                  <fnm>ML</fnm>
               </au>
               <au>
                  <snm>Steingrimsson</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Shim</snm>
                  <fnm>CW</fnm>
               </au>
               <au>
                  <snm>Lengyel</snm>
                  <fnm>JA</fnm>
               </au>
               <au>
                  <snm>Posakony</snm>
                  <fnm>JW</fnm>
               </au>
            </aug>
            <source>Development</source>
            <pubdate>1995</pubdate>
            <volume>121</volume>
            <fpage>3067</fpage>
            <lpage>3077</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">7555732</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B13">
            <title>
               <p>CNS midline enhancers of the <it>Drosophila slit </it>and <it>Toll </it>genes</p>
            </title>
            <aug>
               <au>
                  <snm>Wharton</snm>
                  <fnm>KA</fnm>
                  <suf>Jr</suf>
               </au>
               <au>
                  <snm>Crews</snm>
                  <fnm>ST</fnm>
               </au>
            </aug>
            <source>Mech Dev</source>
            <pubdate>1993</pubdate>
            <volume>40</volume>
            <fpage>141</fpage>
            <lpage>154</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/0925-4773(93)90072-6</pubid>
                  <pubid idtype="pmpid">8494768</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B14">
            <title>
               <p><it>Cis </it>-regulatory elements of the mitotic regulator, <it>string/Cdc25</it></p>
            </title>
            <aug>
               <au>
                  <snm>Lehman</snm>
                  <fnm>DA</fnm>
               </au>
               <au>
                  <snm>Patterson</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Johnston</snm>
                  <fnm>LA</fnm>
               </au>
               <au>
                  <snm>Balzer</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Britton</snm>
                  <fnm>JS</fnm>
               </au>
               <au>
                  <snm>Saint</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Edgar</snm>
                  <fnm>BA</fnm>
               </au>
            </aug>
            <source>Development</source>
            <pubdate>1999</pubdate>
            <volume>126</volume>
            <fpage>1793</fpage>
            <lpage>1803</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">10101114</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B15">
            <title>
               <p>Transcriptional regulation of <it>atonal </it>during development of the <it>Drosophila </it>peripheral nervous system</p>
            </title>
            <aug>
               <au>
                  <snm>Sun</snm>
                  <fnm>Y</fnm>
               </au>
               <au>
                  <snm>Jan</snm>
                  <fnm>LY</fnm>
               </au>
               <au>
                  <snm>Jan</snm>
                  <fnm>YN</fnm>
               </au>
            </aug>
            <source>Development</source>
            <pubdate>1998</pubdate>
            <volume>125</volume>
            <fpage>3731</fpage>
            <lpage>3740</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">9716538</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B16">
            <title>
               <p>Characterization of the cis-regulatory region of the <it>Drosophila </it>homeotic gene <it>Sex combs reduced</it></p>
            </title>
            <aug>
               <au>
                  <snm>Gindhart</snm>
                  <fnm>JG</fnm>
                  <suf>Jr</suf>
               </au>
               <au>
                  <snm>King</snm>
                  <fnm>AN</fnm>
               </au>
               <au>
                  <snm>Kaufman</snm>
                  <fnm>TC</fnm>
               </au>
            </aug>
            <source>Genetics</source>
            <pubdate>1995</pubdate>
            <volume>139</volume>
            <fpage>781</fpage>
            <lpage>95</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1206381</pubid>
                  <pubid idtype="pmpid" link="fulltext">7713432</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B17">
            <title>
               <p>The <it>Drosophila </it>PAR domain protein 1 (Pdp1) gene encodes multiple differentially expressed mRNAs and proteins through the use of multiple enhancers and promoters</p>
            </title>
            <aug>
               <au>
                  <snm>Reddy</snm>
                  <fnm>KL</fnm>
               </au>
               <au>
                  <snm>Wohlwill</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Dzitoeva</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Lin</snm>
                  <fnm>MH</fnm>
               </au>
               <au>
                  <snm>Holbrook</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Storti</snm>
                  <fnm>RV</fnm>
               </au>
            </aug>
            <source>Dev Biol</source>
            <pubdate>2000</pubdate>
            <volume>224</volume>
            <fpage>401</fpage>
            <lpage>14</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1006/dbio.2000.9797</pubid>
                  <pubid idtype="pmpid" link="fulltext">10926776</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B18">
            <title>
               <p><it>REDFly</it>: a regulatory element database for <it>Drosophila</it></p>
            </title>
            <aug>
               <au>
                  <snm>Gallo</snm>
                  <fnm>SM</fnm>
               </au>
               <au>
                  <snm>Li</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Hu</snm>
                  <fnm>Z</fnm>
               </au>
               <au>
                  <snm>Halfon</snm>
                  <fnm>MS</fnm>
               </au>
            </aug>
            <source>Bioinformatics</source>
            <pubdate>2006</pubdate>
            <volume>22</volume>
            <fpage>381</fpage>
            <lpage>383</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/bioinformatics/bti794</pubid>
                  <pubid idtype="pmpid" link="fulltext">16303794</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B19">
            <title>
               <p>Gene expression mediated by cis-acting sequences of the Kruppel gene in response to the <it>Drosophila </it>morphogens bicoid and hunchback</p>
            </title>
            <aug>
               <au>
                  <snm>Hoch</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Seifert</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>J&#228;ckle</snm>
                  <fnm>H</fnm>
               </au>
            </aug>
            <source>EMBO J</source>
            <pubdate>1991</pubdate>
            <volume>10</volume>
            <fpage>2267</fpage>
            <lpage>78</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">452917</pubid>
                  <pubid idtype="pmpid">2065664</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B20">
            <title>
               <p>Genome Bioinformatics Group of UC Santa Cruz</p>
            </title>
            <url>http://hgdownload.cse.ucsc.edu/downloads.html</url>
         </bibl>
         <bibl id="B21">
            <title>
               <p>Aligning Multiple Genomic Sequences with the Threaded Blockset Aligner</p>
            </title>
            <aug>
               <au>
                  <snm>Blanchette</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Kent</snm>
                  <fnm>WJ</fnm>
               </au>
               <au>
                  <snm>Riemer</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Elnitski</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Smit</snm>
                  <fnm>AF</fnm>
               </au>
               <au>
                  <snm>Roskin</snm>
                  <fnm>KM</fnm>
               </au>
               <au>
                  <snm>Baertsch</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Rosenbloom</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Clawson</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Green</snm>
                  <fnm>ED</fnm>
               </au>
               <au>
                  <snm>Haussler</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Miller</snm>
                  <fnm>W</fnm>
               </au>
            </aug>
            <source>Genome Res</source>
            <pubdate>2004</pubdate>
            <volume>14</volume>
            <fpage>708</fpage>
            <lpage>15</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">383317</pubid>
                  <pubid idtype="pmpid" link="fulltext">15060014</pubid>
                  <pubid idtype="doi">10.1101/gr.1933104</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B22">
            <title>
               <p>Molecular cloning and characterization of human <it>Castor</it>, a novel human gene up-regulated during cell differentiation</p>
            </title>
            <aug>
               <au>
                  <snm>Liu</snm>
                  <fnm>Z</fnm>
               </au>
               <au>
                  <snm>Yang</snm>
                  <fnm>X</fnm>
               </au>
               <au>
                  <snm>Tan</snm>
                  <fnm>F</fnm>
               </au>
               <au>
                  <snm>Cullion</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Thiele</snm>
                  <fnm>CJ</fnm>
               </au>
            </aug>
            <source>Biochem Biophys Res Commun</source>
            <pubdate>2006</pubdate>
            <volume>344</volume>
            <fpage>834</fpage>
            <lpage>844</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/j.bbrc.2006.03.207</pubid>
                  <pubid idtype="pmpid" link="fulltext">16631614</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B23">
            <title>
               <p><it>castor </it>encodes a novel zinc finger protein required for the development of a subset of CNS neurons in <it>Drosophila</it></p>
            </title>
            <aug>
               <au>
                  <snm>Mellerick</snm>
                  <fnm>DM</fnm>
               </au>
               <au>
                  <snm>Kassis</snm>
                  <fnm>JA</fnm>
               </au>
               <au>
                  <snm>Zhang</snm>
                  <fnm>SD</fnm>
               </au>
               <au>
                  <snm>Odenwald</snm>
                  <fnm>WF</fnm>
               </au>
            </aug>
            <source>Neuron</source>
            <pubdate>1992</pubdate>
            <volume>9</volume>
            <fpage>789</fpage>
            <lpage>803</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/0896-6273(92)90234-5</pubid>
                  <pubid idtype="pmpid" link="fulltext">1418995</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B24">
            <title>
               <p>Regulation of POU genes by <it>castor </it>and <it>hunchback </it>establishes layered compartments in the <it>Drosophila </it>CNS</p>
            </title>
            <aug>
               <au>
                  <snm>Kambadur</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Koizumi</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Stivers</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Nagle</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Poole</snm>
                  <fnm>SJ</fnm>
               </au>
               <au>
                  <snm>Odenwald</snm>
                  <fnm>WF</fnm>
               </au>
            </aug>
            <source>Genes Dev</source>
            <pubdate>1998</pubdate>
            <volume>12</volume>
            <fpage>246</fpage>
            <lpage>60</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">316437</pubid>
                  <pubid idtype="pmpid" link="fulltext">9436984</pubid>
                  <pubid idtype="doi">10.1101/gad.12.2.246</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B25">
            <title>
               <p>Broad Institute</p>
            </title>
            <url>http://www.broad.mit.edu/mammals/</url>
         </bibl>
         <bibl id="B26">
            <title>
               <p>BacMap database of University of Alberta</p>
            </title>
            <url>http://wishart.biology.ualberta.ca/BacMap/</url>
         </bibl>
         <bibl id="B27">
            <title>
               <p>European Bioinformatics Institute of the European Molecular Biology Laboratory</p>
            </title>
            <url>http://www.ebi.ac.uk/genomes/bacteria.html</url>
         </bibl>
         <bibl id="B28">
            <title>
               <p>Enteropathogen Resource Integration Center</p>
            </title>
            <url>http://www.ericbrc.org/portal/eric/ecoliut189</url>
         </bibl>
         <bibl id="B29">
            <title>
               <p>Sequencing Centre Sanger Institute</p>
            </title>
            <url>http://xbase.bham.ac.uk/genome.pl?id=1843</url>
         </bibl>
         <bibl id="B30">
            <title>
               <p>VectorBase: a home for invertebrate vectors of human pathogens</p>
            </title>
            <aug>
               <au>
                  <snm>Lawson</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Arensburger</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Atkinson</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Besansky</snm>
                  <fnm>NJ</fnm>
               </au>
               <au>
                  <snm>Bruggner</snm>
                  <fnm>RV</fnm>
               </au>
               <au>
                  <snm>Butler</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Campbell</snm>
                  <fnm>KS</fnm>
               </au>
               <au>
                  <snm>Christophides</snm>
                  <fnm>GK</fnm>
               </au>
               <au>
                  <snm>Christley</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Dialynas</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Emmert</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Hammond</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Hill</snm>
                  <fnm>CA</fnm>
               </au>
               <au>
                  <snm>Kennedy</snm>
                  <fnm>RC</fnm>
               </au>
               <au>
                  <snm>Lobo</snm>
                  <fnm>NF</fnm>
               </au>
               <au>
                  <snm>MacCallum</snm>
                  <fnm>MR</fnm>
               </au>
               <au>
                  <snm>Madey</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Megy</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Redmond</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Russo</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Severson</snm>
                  <fnm>DW</fnm>
               </au>
               <au>
                  <snm>Stinson</snm>
                  <fnm>EO</fnm>
               </au>
               <au>
                  <snm>Topalis</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Zdobnov</snm>
                  <fnm>EM</fnm>
               </au>
               <au>
                  <snm>Birney</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Gelbart</snm>
                  <fnm>WM</fnm>
               </au>
               <au>
                  <snm>Kafatos</snm>
                  <fnm>FC</fnm>
               </au>
               <au>
                  <snm>Louis</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Collins</snm>
                  <fnm>FH</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>2007</pubdate>
            <volume>35</volume>
            <fpage>D503</fpage>
            <lpage>505</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1751530</pubid>
                  <pubid idtype="pmpid" link="fulltext">17145709</pubid>
                  <pubid idtype="doi">10.1093/nar/gkl960</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
      </refgrp>
   </bm>
</art>
