<?xml version='1.0'?>
<!DOCTYPE art SYSTEM 'http://www.biomedcentral.com/xml/article.dtd'>
<art>
   <ui>gb-2007-8-2-r15</ui>
   <ji>GBJ</ji>
   <fm>
      <dochead>Research</dochead>
      <bibl>
         <title>
            <p>Parallel evolution of conserved non-coding elements that target a common set of developmental regulatory genes from worms to humans</p>
         </title>
         <aug>
            <au id="A1" ca="yes">
               <snm>Vavouri</snm>
               <fnm>Tanya</fnm>
               <insr iid="I1"/>
               <insr iid="I2"/>
               <email>tv1@sanger.ac.uk</email>
            </au>
            <au id="A2">
               <snm>Walter</snm>
               <fnm>Klaudia</fnm>
               <insr iid="I3"/>
               <email>klaudia.walter@mrc-bsu.cam.ac.uk</email>
            </au>
            <au id="A3">
               <snm>Gilks</snm>
               <mi>R</mi>
               <fnm>Walter</fnm>
               <insr iid="I4"/>
               <email>wally.gilks@maths.leeds.ac.uk</email>
            </au>
            <au id="A4" ce="yes">
               <snm>Lehner</snm>
               <fnm>Ben</fnm>
               <insr iid="I5"/>
               <email>ben.lehner@crg.es</email>
            </au>
            <au id="A5" ce="yes">
               <snm>Elgar</snm>
               <fnm>Greg</fnm>
               <insr iid="I2"/>
               <email>g.elgar@qmul.ac.uk</email>
            </au>
         </aug>
         <insg>
            <ins id="I1">
               <p>Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, UK</p>
            </ins>
            <ins id="I2">
               <p>School of Biological and Chemical Sciences, Queen Mary, University of London, London E1 4NS, UK</p>
            </ins>
            <ins id="I3">
               <p>MRC Biostatistics Unit, Institute of Public Health, Cambridge CB2 2SR, UK</p>
            </ins>
            <ins id="I4">
               <p>Department of Statistics, University of Leeds, Leeds LS2 9JT, UK</p>
            </ins>
            <ins id="I5">
               <p>EMBL/CRG Systems Biology Unit, Centre for Genomic Regulation (CRG), UPF, C/Dr. Aiguader 88, Barcelona 08003, Spain</p>
            </ins>
         </insg>
         <source>Genome Biology</source>
         <issn>1465-6906</issn>
         <pubdate>2007</pubdate>
         <volume>8</volume>
         <issue>2</issue>
         <fpage>R15</fpage>
         <url>http://genomebiology.com/2007/8/2/R15</url>
         <xrefbib>
            <pubidlist>
               <pubid idtype="pmpid">17274809</pubid>
               <pubid idtype="doi">10.1186/gb-2007-8-2-r15</pubid>
            </pubidlist>
         </xrefbib>
      </bibl>
      <history>
         <rec>
            <date>
               <day>25</day>
               <month>7</month>
               <year>2006</year>
            </date>
         </rec>
         <revrec>
            <date>
               <day>20</day>
               <month>10</month>
               <year>2006</year>
            </date>
         </revrec>
         <acc>
            <date>
               <day>2</day>
               <month>2</month>
               <year>2007</year>
            </date>
         </acc>
         <pub>
            <date>
               <day>02</day>
               <month>02</month>
               <year>2007</year>
            </date>
         </pub>
      </history>
      <cpyrt>
         <year>2007</year>
         <collab>Vavouri et al.; licensee BioMed Central Ltd.</collab>
         <note>This is an open access article distributed under the terms of the Creative Commons Attribution License (<url>http://creativecommons.org/licenses/by/2.0</url>), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</note>
      </cpyrt>
      <shorttitle>
         <p>Parallel evolution of conserved noncoding elements</p>
      </shorttitle>
      <shortabs>
         <p>Invertebrate conserved noncoding elements (CNEs) are associated with the same core set of genes as vertebrate CNEs, and may reflect the parallel evolution of enhancers in the gene regulatory networks that define alternative animal body plans.</p>
      </shortabs>
      <abs>
         <sec>
            <st>
               <p>Abstract</p>
            </st>
            <sec>
               <st>
                  <p>Background</p>
               </st>
               <p>The human genome contains thousands of non-coding sequences that are often more conserved between vertebrate species than protein-coding exons. These highly conserved non-coding elements (CNEs) are associated with genes that coordinate development, and have been proposed to act as transcriptional enhancers. Despite their extreme sequence conservation in vertebrates, sequences homologous to CNEs have not been identified in invertebrates.</p>
            </sec>
            <sec>
               <st>
                  <p>Results</p>
               </st>
               <p>Here we report that nematode genomes contain an alternative set of CNEs that share sequence characteristics, but not identity, with their vertebrate counterparts. CNEs thus represent a very unusual class of sequences that are extremely conserved within specific animal lineages yet are highly divergent between lineages. Nematode CNEs are also associated with developmental regulatory genes, and include well-characterized enhancers and transcription factor binding sites, supporting the proposed function of CNEs as <it>cis</it>-regulatory elements. Most remarkably, 40 of 156 human CNE-associated genes with invertebrate orthologs are also associated with CNEs in both worms and flies.</p>
            </sec>
            <sec>
               <st>
                  <p>Conclusion</p>
               </st>
               <p>A core set of genes that regulate development is associated with CNEs across three animal groups (worms, flies and vertebrates). We propose that these CNEs reflect the parallel evolution of alternative enhancers for a common set of developmental regulatory genes in different animal groups. This 're-wiring' of gene regulatory networks containing key developmental coordinators was probably a driving force during the evolution of animal body plans. CNEs may, therefore, represent the genomic traces of these 'hard-wired' core gene regulatory networks that specify the development of each alternative animal body plan.</p>
            </sec>
         </sec>
      </abs>
   </fm>
   <meta>
      <classifications>
         <classification type="BMC" subtype="man_spc_id" id="30010009">Genetics</classification>
         <classification type="BMC" subtype="man_spc_id" id="30010008">Evolution</classification>
         <classification type="BMC" subtype="man_spc_id" id="30010016">Molecular biology</classification>
      </classifications>
   </meta>
   <bdy>
      <sec>
         <st>
            <p>Background</p>
         </st>
         <p>Comparisons of the human genome against the genomes of distantly related vertebrates have revealed an abundance of highly conserved non-coding elements (CNEs) that appear to have been 'frozen' throughout vertebrate evolution <abbrgrp><abbr bid="B1">1</abbr><abbr bid="B2">2</abbr><abbr bid="B3">3</abbr><abbr bid="B4">4</abbr><abbr bid="B5">5</abbr><abbr bid="B6">6</abbr><abbr bid="B7">7</abbr></abbrgrp>. The exact number of elements shared between any set of species varies depending on the precise definition of similarity and the divergence of the genomes used. For example, a comparison of the human genome against the mouse and the rat genomes revealed that all three share 256 elements with no evidence of transcription that are 100% identical over at least 200 base-pairs (bp) <abbrgrp><abbr bid="B2">2</abbr></abbrgrp>. Furthermore, the human genome and the genome of the Japanese pufferfish (<it>Fugu rubripes</it>), which diverged from a common ancestor approximately 450 million years ago (MYA), share 1,373 CNEs, with an average length of 199 bp and average identity of 84% <abbrgrp><abbr bid="B4">4</abbr></abbrgrp>.</p>
         <p>A striking property of human CNEs is that they cluster in genomic regions that contain genes coding for transcription factors and signaling genes involved in the regulation of development ('trans-dev' genes) <abbrgrp><abbr bid="B2">2</abbr><abbr bid="B3">3</abbr><abbr bid="B4">4</abbr><abbr bid="B6">6</abbr></abbrgrp>. Therefore, CNEs have been proposed to act as <it>cis</it>-regulatory sequences for these trans-dev genes. In support of this, where tested, the majority of assayed CNEs can act as tissue-specific enhancers for a transgene in zebrafish or mice <abbrgrp><abbr bid="B4">4</abbr><abbr bid="B7">7</abbr><abbr bid="B8">8</abbr><abbr bid="B9">9</abbr><abbr bid="B10">10</abbr></abbrgrp>.</p>
         <p>Vertebrate CNEs show extreme sequence conservation among distantly related species, often showing higher conservation than protein-coding exons <abbrgrp><abbr bid="B4">4</abbr><abbr bid="B5">5</abbr></abbrgrp>. However, there appear to be no traces of vertebrate CNEs in invertebrate genomes that can be identified by sequence similarity searches <abbrgrp><abbr bid="B2">2</abbr><abbr bid="B4">4</abbr><abbr bid="B11">11</abbr></abbrgrp>. The evolutionary origin of most vertebrate CNEs therefore remains unknown <abbrgrp><abbr bid="B11">11</abbr></abbrgrp>. Although CNEs have also been identified in invertebrate genomes <abbrgrp><abbr bid="B12">12</abbr><abbr bid="B13">13</abbr><abbr bid="B14">14</abbr></abbrgrp>, they have been found to be smaller and less frequent than vertebrate CNEs. Recently, Glazov <it>et al</it>. <abbrgrp><abbr bid="B13">13</abbr></abbrgrp> identified 20,301 non-coding elements that are conserved over at least 50 bp between the very closely related genomes of <it>Drosophila melanogaster </it>and <it>Drosophila pseudoobscura </it>and showed that these elements were also found preferentially near genes encoding transcription factors and developmental regulatory genes. <it>D. melanogaster </it>and <it>D. pseudoobscura </it>diverged from their common ancestor 25 to 55 MYA <abbrgrp><abbr bid="B15">15</abbr></abbrgrp> and show sequence divergence similar to that between the human and mouse genomes <abbrgrp><abbr bid="B13">13</abbr></abbrgrp>. Consequently, it is difficult to distinguish functionally conserved elements from background sequence conservation by comparing these two genomes alone. In fact, only two of these elements were conserved in the more distantly related genome of <it>Anopheles gambiae</it>, which shared a common ancestor with the <it>Drosophila </it>species approximately 250 MYA <abbrgrp><abbr bid="B16">16</abbr></abbrgrp>. Therefore, it is still unclear how widespread highly conserved non-coding elements are among different animal genomes and whether similar genes are associated with the most conserved non-coding elements in both invertebrate and vertebrate genomes.</p>
         <p>To provide further insight into the function and evolution of CNEs, we have focused on the simplest animal group for which multiple genome sequences are currently available. Two nematode genomes, <it>Caenorhabditis elegans </it>and <it>Caenorhabditis briggsae </it>have been fully sequenced and assembled <abbrgrp><abbr bid="B17">17</abbr><abbr bid="B18">18</abbr></abbrgrp>. These two species diverged from a common ancestor approximately 80 to 110 MYA <abbrgrp><abbr bid="B18">18</abbr></abbrgrp>. Although <it>C. elegans </it>and <it>C. briggsae </it>diverged at a similar time as human and mouse, the neutral substitution rate estimated for these two <it>Caenorhabditis </it>genomes is roughly three-fold higher than for human-mouse <abbrgrp><abbr bid="B18">18</abbr></abbrgrp>, so providing a substantial period of evolutionarily divergence between these species. Whole genome shotgun sequence has also been released recently for a third nematode genome, <it>C. remanei</it>. <it>C. remanei </it>is a sister species of <it>C. briggsae</it>, and these two genomes show sequence divergence similar to that between the human and mouse genomes <abbrgrp><abbr bid="B19">19</abbr></abbrgrp>.</p>
         <p>The CNEs that we have identified in <it>C. elegans </it>have many properties that mirror those of vertebrate CNEs. Although smaller than vertebrate CNEs, worm CNEs also reside near developmental regulatory genes. Moreover, they share both a striking base composition transition signal and a similar A+T content with vertebrate CNEs. Worm CNEs identify many previously characterized transcriptional enhancers and transcription factor binding sites. Most strikingly, we find that vertebrate and invertebrate CNEs are often associated with orthologous genes. Our analysis indicates that CNEs are commonly associated with the same developmental genes in different animal groups. Therefore, it seems likely that CNEs evolved in parallel in different animal lineages to regulate the expression of a core set of regulatory genes. The extreme sequence conservation of CNEs likely reflects the functional importance of these elements as components of the gene regulatory networks that define each different evolutionarily stable animal body plan.</p>
      </sec>
      <sec>
         <st>
            <p>Results</p>
         </st>
         <sec>
            <st>
               <p>Identification of worm conserved non-coding elements</p>
            </st>
            <p>To identify highly conserved non-coding elements in the genome of <it>C. elegans</it>, we searched for sequences that contain large blocks of identity with the genome of <it>C. briggsae </it>and show no evidence of transcription. We used MegaBlast (with soft masking, e-value threshold of 0.001 and with the rest of the parameters set to the default values) to identify sequences that contain at least 30 (word seed size 30, W30) to 100 (word seed size 100, W100) consecutive nucleotides identical between the two nematode genomes, and removed any elements overlapping protein-coding exons, non-coding RNAs or repetitive sequences (see Materials and methods for details). We identified no non-coding elements with W100, 19 elements with W75, 304 elements with W50, 746 elements with W40 and 3,061 elements with W30. All further analysis was carried out on the W30 set. Of these elements, 69% are also found in the early draft genome sequence of <it>C. remanei</it>. We refer to these non-coding sequences conserved in all three genomes as worm CNEs (wCNEs), which comprise 1,460 intergenic elements with no evidence of transcription and 624 elements located in introns covering, in total, approximately 144 kb. These wCNEs have a mean length of 69 bp (minimum 30 bp, maximum 432 bp, median 59 bp) and a mean identity of 96% between <it>C. elegans </it>and <it>C. briggsae </it>with 990 elements being 100% identical between all three species. Using the PhastCons method <abbrgrp><abbr bid="B14">14</abbr></abbrgrp>, 93% of the total sequence contained in wCNEs is estimated to be under purifying selection rather than to be evolving neutrally. Moreover, this figure is probably an underestimation because the lack of sequence from other nematode species may result in an underestimation of branch lengths <abbrgrp><abbr bid="B14">14</abbr></abbrgrp>. Therefore, the vast majority of wCNEs are likely to be functional elements under negative selection.</p>
         </sec>
         <sec>
            <st>
               <p>wCNEs cluster around genes and are enriched on the X chromosome</p>
            </st>
            <p>wCNEs are not distributed evenly along the chromosomes of <it>C. elegans</it>. Rather, they tend to reside in the gene-rich centers of the autosomes (Additional data files 1 and 2) and, as with human CNEs (hCNEs) <abbrgrp><abbr bid="B2">2</abbr><abbr bid="B3">3</abbr><abbr bid="B4">4</abbr></abbrgrp>, multiple wCNEs are often clustered around a single gene (mean of 1.7 and maximum of 14 wCNEs per gene). Moreover, 884 out of 2,084 wCNEs (42.4%) are found on the single <it>C. elegans </it>sex chromosome, which is more than expected by chance (<it>p </it>value &lt; 0.001, based on 1,000 randomizations; Figure <figr fid="F1">1</figr>). The <it>C. elegans </it>sex chromosome is almost devoid of essential genes, and is instead enriched for genes with regulatory functions <abbrgrp><abbr bid="B20">20</abbr></abbrgrp>. The enrichment of wCNEs on the X chromosome may, therefore, result from more of the genes on X requiring complex <it>cis</it>-regulatory architectures. This enrichment for CNEs on the X chromosome may also explain the larger synteny blocks that are observed on the X chromosome than on the autosomes in <it>C. elegans </it><abbrgrp><abbr bid="B21">21</abbr></abbrgrp>. In vertebrates it has been proposed that the requirement to maintain linkage between CNEs and their target genes places a constraint on chromosomal rearrangements <abbrgrp><abbr bid="B10">10</abbr></abbrgrp> and this may also be occurring on the <it>C. elegans </it>X chromosome.</p>
            <fig id="F1">
               <title>
                  <p>Figure 1</p>
               </title>
               <caption>
                  <p>The distribution of CNEs in the <it>C. elegans </it>genome reveals enrichment on chromosome X</p>
               </caption>
               <text>
                  <p>The distribution of CNEs in the <it>C. elegans </it>genome reveals enrichment on chromosome X. Chromosome X contains 884 out of 2,084 wCNEs. This enrichment for wCNEs on chromosome X cannot be explained by either <b>(a) </b>its size or <b>(b) </b>the number of genes it contains compared to the autosomes.</p>
               </text>
               <graphic file="gb-2007-8-2-r15-1"/>
            </fig>
         </sec>
         <sec>
            <st>
               <p>Vertebrate and invertebrate CNEs share a striking nucleotide frequency pattern at their boundaries</p>
            </st>
            <p>Vertebrate CNEs have a characteristic pattern of nucleotide composition, showing a sharp base composition change at their boundaries <abbrgrp><abbr bid="B22">22</abbr></abbrgrp>. Fugu and human CNEs contain 59% and 62% A+T nucleotides <abbrgrp><abbr bid="B22">22</abbr></abbrgrp>, respectively, which is 6% and 3% above the genome averages <abbrgrp><abbr bid="B23">23</abbr><abbr bid="B24">24</abbr></abbrgrp>. A gradual G+C enrichment followed by a sharp AT-rich peak at the CNE boundaries marks the transition of base composition from the flanking DNA to the CNE DNA (Figure <figr fid="F2">2</figr>). The genome of <it>C. elegans </it>has increased A+T content (65%) compared to vertebrates. Yet wCNEs have an A+T content very similar to vertebrate CNEs (58%). Moreover, we find that worm CNEs also show a similar nucleotide frequency transition at their borders: there is a decrease of A+T content from the genome average (65%) down to 50% at the wCNE border followed by a sharp increase to 58% within the wCNE (Figure <figr fid="F2">2</figr>). Furthermore, the same signal is present at the boundaries of CNEs from <it>D. melanogaster </it>(Figure <figr fid="F2">2d</figr>) <abbrgrp><abbr bid="B25">25</abbr></abbrgrp> (T Down, personal communication). The significance of this signal remains unknown, although its conservation from nematodes to humans indicates that it probably reflects a functional property of CNEs. For example, it might be a sign of a particular DNA conformation since AA/TT dinucleotides increase DNA rigidity, potentially making CNEs relatively rigid elements flanked by flexible DNA, or it may allow DNA unwinding and base unpairing <abbrgrp><abbr bid="B22">22</abbr></abbrgrp>. The conservation of this signal from nematodes to humans could be useful for the discovery of functional non-coding elements less conserved than the CNEs (T Down, personal communication).</p>
            <fig id="F2">
               <title>
                  <p>Figure 2</p>
               </title>
               <caption>
                  <p>CNEs share a striking nucleotide signature from <it>C. elegans </it>to vertebrates</p>
               </caption>
               <text>
                  <p>CNEs share a striking nucleotide signature from <it>C. elegans </it>to vertebrates. The plot shows the percentage of A+T nucleotides for 200 bp of sequence flanking CNEs (black) and 15 bp of CNE (red) at the CNE border defined by sequence conservation (the sequence on one end of each CNE is reverse complemented) for <b>(a) </b><it>F. rubripes</it>, <b>(b) </b><it>H. sapiens</it>, <b>(c) </b><it>C. elegans </it>and <b>(d) </b><it>D. melanogaster</it>. In all four species there is a decrease of A+T content in the 200 bp of sequence flanking the CNEs followed by a sharp A+T increase at the CNE border.</p>
               </text>
               <graphic file="gb-2007-8-2-r15-2"/>
            </fig>
         </sec>
         <sec>
            <st>
               <p>wCNEs are associated with developmental transcription factors and signaling genes</p>
            </st>
            <p>CNEs in the human genome are associated with genes involved in the regulation of development and, in particular, with transcription factors ('trans-dev' genes) <abbrgrp><abbr bid="B2">2</abbr><abbr bid="B3">3</abbr><abbr bid="B4">4</abbr><abbr bid="B6">6</abbr><abbr bid="B7">7</abbr></abbrgrp>. To assess whether CNEs are associated with certain types of genes in <it>C. elegans</it>, we spatially associated each wCNE to the protein-coding gene with the nearest transcription start site. The mean distance between a wCNE and the nearest transcription start site is 2,929 bp, with 1,206 (82.6%) of intergenic wCNEs lying more than 500 bp from the nearest transcription start site (Additional data file 3).</p>
            <p>In both the human and the <it>C. elegans </it>genome the most significantly enriched functions, according to the Gene Ontology (GO) terms <abbrgrp><abbr bid="B26">26</abbr></abbrgrp>, for CNE-associated genes are related to transcription factor activity and development (Figure <figr fid="F3">3</figr>). For example, 2.82% (18/638) of genes associated with wCNEs are annotated with the GO term 'development', whilst only 0.63% (52/8,301) of all annotated genes in <it>C. elegans </it>are annotated with this term (<it>p </it>value = 6.13e-11 for log odds ratio = 1.87 and <it>p </it>value &lt; 0.001, based on 1,000 randomizations). Similarly, 10.82% (69/638) of genes associated with wCNEs are annotated with the term 'transcription factor activity', whilst only 6.17% (512/8,301) of all annotated genes in <it>C. elegans </it>are annotated with this term (<it>p </it>value = 2.81e-7 for log odds ratio = 0.68 and <it>p </it>value = 0.006, based on 1,000 randomizations). The reverse association is also true: developmental genes in general are associated with wCNEs, as 34.62% (18/52) of annotated developmental genes (that is, annotated with the GO term 'development') are associated with wCNEs while only 7.69% (638/8,301) of all annotated genes in <it>C. elegans </it>are associated with wCNEs. Glazov <it>et al</it>. <abbrgrp><abbr bid="B13">13</abbr></abbrgrp> have noted a similar trend for elements conserved between two very closely related <it>Drosophila </it>species, indicating that the association of highly conserved non-coding elements with trans-dev genes is a property conserved from worms to humans.</p>
            <fig id="F3">
               <title>
                  <p>Figure 3</p>
               </title>
               <caption>
                  <p>CNEs are associated with genes involved in transcription regulation and development in both <it>H. sapiens </it>and <it>C. elegans</it></p>
               </caption>
               <text>
                  <p>CNEs are associated with genes involved in transcription regulation and development in both <it>H. sapiens </it>and <it>C. elegans</it>. The log odds ratios and the 95% confidence intervals are shown for all GOslim terms that appear in the annotation of genes spatially associated with CNEs significantly more often than in the rest of the genome for <it>H. sapiens </it>(black) and <it>C. elegans </it>(red). GOslim terms marked with three asterisks are significantly enriched in both <it>H. sapiens </it>and <it>C. elegans </it>CNE genes; those marked with two asterisks are significantly enriched only in <it>C. elegans</it>; and the term with one asterisk is significantly enriched only in <it>H. sapiens</it>. The domains are ordered according to their <it>p </it>value in <it>H. sapiens </it>(lowest <it>p </it>value in <it>H. sapiens </it>at the top). All terms related to transcription factor activity and development (that is, 'trans-dev' genes [4]) show a strong bias in the annotation of genes near CNEs in both genomes. In the <it>C. elegans </it>gene set, there is also a trend for genes to be involved in signal transduction and ion binding. The GO terms shown in this figure constitute all GOslim terms (excluding the term 'biological-process') with a positive log odds ratio and <it>p </it>value &#8804; 7.19 &#215; 10-3 (5% false discovery rate cut-off) in either <it>H. sapiens </it>or <it>C. elegans</it>.</p>
               </text>
               <graphic file="gb-2007-8-2-r15-3"/>
            </fig>
            <p>In addition, wCNE-associated genes are enriched for cell-signaling GO terms, which has also been noted for the elements in <it>Drosophila </it><abbrgrp><abbr bid="B13">13</abbr></abbrgrp>, but is less striking in humans<abbrgrp><abbr bid="B13">13</abbr></abbrgrp>. Nonetheless, several examples of major signaling genes involved in development are associated with CNEs in the human genome, with a classic example being the sonic hedgehog gene at 7q36.3 <abbrgrp><abbr bid="B10">10</abbr></abbrgrp>. This difference is an intriguing result considering that the human genome contains more signaling genes (1,790/15,023 = 11.92% of human genes annotated with the term 'signal transduction') than the <it>C. elegans </it>genome (599/8,301 = 7.22%), whereas there are fewer signaling genes among the human CNE-associated genes (15/274 = 5.47%) than the wCNE-associated genes in <it>C. elegans </it>(84/638 = 13.17%). A possible explanation for this difference is that signaling genes in vertebrates are associated with elements less conserved than the CNEs we previously identified. In support of this hypothesis, a set of vertebrate non-coding elements identified with less stringent criteria are significantly enriched for the GO term 'signal transduction' <abbrgrp><abbr bid="B27">27</abbr></abbrgrp>.</p>
            <p>To further analyze the types of genes associated with CNEs, we looked at the InterPro protein domains <abbrgrp><abbr bid="B28">28</abbr></abbrgrp> encoded by these genes. Both <it>Homo sapiens </it>and <it>C. elegans </it>CNEs are enriched in the neighborhoods of genes encoding DNA-binding transcription factor domains (including Homeodomain-like, Winged-helix repressor DNA-binding, Zinc finger C2H2-type and HMG1/2; Additional data file 4). We also examined the enrichment for transcription factors among the wCNE-associated genes using predicted transcription factors from two high-quality databases: DBD, a database of computationally predicted transcription factors through homology to known DNA-binding domains <abbrgrp><abbr bid="B29">29</abbr></abbrgrp>; and wTF2.0, a compendium of computationally and manually curated transcription factors in <it>C. elegans </it><abbrgrp><abbr bid="B30">30</abbr></abbrgrp>. Out of 1,241 of the wCNE-associated genes, 108 (8.7%) are annotated as transcription factors according to DBD and 137 out of 1,241 (11.0%) of the wCNE-associated genes are annotated as transcription factors according to wTF2.0, both being significantly higher than the proportion in the genome (Additional data file 5).</p>
            <p>Both <it>H. sapiens </it>and <it>C. elegans </it>CNEs are also associated with genes encoding cell-signaling domains, although, as also noted from the GO terms, this is more pronounced in <it>C. elegans</it>. These domains include those found in extracellular proteins, cell surface receptors, and intracellular signaling proteins. The lack of thoroughly annotated sets of signaling genes could potentially exaggerate differences between human and worm CNE-associated genes.</p>
            <p>Human CNEs are not always found directly adjacent to their most likely target genes <abbrgrp><abbr bid="B6">6</abbr><abbr bid="B7">7</abbr></abbrgrp>. It is also possible that CNEs may regulate more than one gene, for example, in the case of bidirectional promoters. Therefore, these statistics probably underestimate the true association of CNEs with developmental regulatory genes. We conclude that CNEs are associated with genes involved in transcription regulation and development and, to a certain degree, cell-signaling in both vertebrates and invertebrates, although the association with cell signaling genes appears to be stronger in invertebrates.</p>
         </sec>
         <sec>
            <st>
               <p>Vertebrate and invertebrate CNEs target a common set of core developmental genes</p>
            </st>
            <p>Most strikingly, we find that many of the genes associated with CNEs in the <it>C. elegans </it>genome are the direct orthologs of CNE-associated genes in the human genome. Of 397 human CNE-associated genes, 190 have identifiable orthologs in <it>C. elegans </it>and, of these, 60 are also associated with wCNEs in <it>C. elegans</it>. This is much greater than expected by chance (<it>p </it>&lt; 0.001, by randomization). For example, the <it>C. elegans </it>gene <it>mab-18 </it>is associated with ten wCNEs and its human ortholog PAX6 is associated with two hCNEs. For worm CNE-associated genes that have been duplicated in the vertebrate lineage, multiple paralogs are often associated with hCNEs. For example, the <it>C. elegans </it>gene <it>sem-4 </it>is associated with four wCNEs, and has four human orthologs. Of these, SALL1 is associated with two hCNEs, SALL3 with eleven hCNEs and SALL4 with one hCNE.</p>
            <p>Remarkably, of the 60 human CNE-associated genes that have <it>C. elegans </it>orthologs that are associated with wCNEs, 40 also have orthologs in <it>Drosophila </it>that are associated with the conserved elements identified by Glazov <it>et al</it>. <abbrgrp><abbr bid="B13">13</abbr></abbrgrp>. In summary, 40 of 156 human CNE-associated genes that have orthologs in both <it>C. elegans </it>and <it>D. melanogaster</it>, are also associated with CNEs in these two species. These genes represent a core set of developmental regulatory genes that are associated with CNEs across three different animal phyla (Table <tblr tid="T1">1</tblr>). Thus, despite the extensive evolutionary distance and duplication events that have occurred since the divergence of <it>C. elegans</it>, <it>D. melanogaster </it>and <it>H. sapiens</it>, a core set of orthologous genes are associated with highly conserved non-coding elements in all three organisms.</p>
            <tbl id="T1">
               <title>
                  <p>Table 1</p>
               </title>
               <caption>
                  <p>Orthologous genes associated with CNEs (and uc-elements) in humans, flies and worms</p>
               </caption>
               <tblbdy cols="7">
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c cspan="2" ca="left">
                        <p>
                           <it>C. elegans</it>
                        </p>
                     </c>
                     <c cspan="2" ca="left">
                        <p>
                           <it>D. melanogaster</it>
                        </p>
                     </c>
                     <c cspan="2" ca="left">
                        <p>
                           <it>H. sapiens</it>
                        </p>
                     </c>
                  </r>
                  <r>
                     <c cspan="7">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Cluster number</p>
                     </c>
                     <c ca="left">
                        <p>Gene name</p>
                     </c>
                     <c ca="center">
                        <p>Number of associated wCNEs</p>
                     </c>
                     <c ca="left">
                        <p>Gene symbol</p>
                     </c>
                     <c ca="center">
                        <p>Number of associated uc-elements</p>
                     </c>
                     <c ca="left">
                        <p>Gene name</p>
                     </c>
                     <c ca="center">
                        <p>Number of associated hCNEs</p>
                     </c>
                  </r>
                  <r>
                     <c cspan="7">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>1</p>
                     </c>
                     <c ca="left">
                        <p>ZC123.3</p>
                     </c>
                     <c ca="center">
                        <p>4</p>
                     </c>
                     <c ca="left">
                        <p>
                           <it>zfh2</it>
                        </p>
                     </c>
                     <c ca="center">
                        <p>2</p>
                     </c>
                     <c ca="left">
                        <p>
                           <it>ATBF1</it>
                        </p>
                     </c>
                     <c ca="center">
                        <p>3</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>
                           <it>ZFHX4</it>
                        </p>
                     </c>
                     <c ca="center">
                        <p>10</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>2</p>
                     </c>
                     <c ca="left">
                        <p>
                           <it>ceh-31</it>
                        </p>
                     </c>
                     <c ca="center">
                        <p>3</p>
                     </c>
                     <c ca="left">
                        <p>
                           <it>B-H1</it>
                        </p>
                     </c>
                     <c ca="center">
                        <p>40</p>
                     </c>
                     <c ca="left">
                        <p>
                           <it>BARHL2</it>
                        </p>
                     </c>
                     <c ca="center">
                        <p>18</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>
                           <it>ceh-30</it>
                        </p>
                     </c>
                     <c ca="center">
                        <p>2</p>
                     </c>
                     <c ca="left">
                        <p>
                           <it>B-H2</it>
                        </p>
                     </c>
                     <c ca="center">
                        <p>39</p>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>3</p>
                     </c>
                     <c ca="left">
                        <p>
                           <it>ceh-44</it>
                        </p>
                     </c>
                     <c ca="center">
                        <p>1</p>
                     </c>
                     <c ca="left">
                        <p>
                           <it>ct</it>
                        </p>
                     </c>
                     <c ca="center">
                        <p>38</p>
                     </c>
                     <c ca="left">
                        <p>
                           <it>CUTL2</it>
                        </p>
                     </c>
                     <c ca="center">
                        <p>2</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>4</p>
                     </c>
                     <c ca="left">
                        <p>
                           <it>unc-3</it>
                        </p>
                     </c>
                     <c ca="center">
                        <p>5</p>
                     </c>
                     <c ca="left">
                        <p>
                           <it>kn</it>
                        </p>
                     </c>
                     <c ca="center">
                        <p>1</p>
                     </c>
                     <c ca="left">
                        <p>
                           <it>EBF3</it>
                        </p>
                     </c>
                     <c ca="center">
                        <p>15</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>5</p>
                     </c>
                     <c ca="left">
                        <p>C18B12.3</p>
                     </c>
                     <c ca="center">
                        <p>1</p>
                     </c>
                     <c ca="left">
                        <p>
                           <it>al</it>
                        </p>
                     </c>
                     <c ca="center">
                        <p>17</p>
                     </c>
                     <c ca="left">
                        <p>ENSG00000165606</p>
                     </c>
                     <c ca="center">
                        <p>5</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>6</p>
                     </c>
                     <c ca="left">
                        <p>
                           <it>egl-43</it>
                        </p>
                     </c>
                     <c ca="center">
                        <p>1</p>
                     </c>
                     <c ca="left">
                        <p>CG31753</p>
                     </c>
                     <c ca="center">
                        <p>9</p>
                     </c>
                     <c ca="left">
                        <p>
                           <it>EVI1</it>
                        </p>
                     </c>
                     <c ca="center">
                        <p>2</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>
                           <it>PRDM16</it>
                        </p>
                     </c>
                     <c ca="center">
                        <p>1</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>7</p>
                     </c>
                     <c ca="left">
                        <p>
                           <it>lin-39</it>
                        </p>
                     </c>
                     <c ca="center">
                        <p>1</p>
                     </c>
                     <c ca="left">
                        <p>Scr</p>
                     </c>
                     <c ca="center">
                        <p>21</p>
                     </c>
                     <c ca="left">
                        <p>
                           <it>HOXA5</it>
                        </p>
                     </c>
                     <c ca="center">
                        <p>2</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>
                           <it>HOXB5</it>
                        </p>
                     </c>
                     <c ca="center">
                        <p>2</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>
                           <it>HOXC5</it>
                        </p>
                     </c>
                     <c ca="center">
                        <p>2</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>8</p>
                     </c>
                     <c ca="left">
                        <p>
                           <it>irx-1</it>
                        </p>
                     </c>
                     <c ca="center">
                        <p>1</p>
                     </c>
                     <c ca="left">
                        <p>
                           <it>caup</it>
                        </p>
                     </c>
                     <c ca="center">
                        <p>12</p>
                     </c>
                     <c ca="left">
                        <p>
                           <it>IRX4</it>
                        </p>
                     </c>
                     <c ca="center">
                        <p>8</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>
                           <it>mirr</it>
                        </p>
                     </c>
                     <c ca="center">
                        <p>23</p>
                     </c>
                     <c ca="left">
                        <p>
                           <it>IRX6</it>
                        </p>
                     </c>
                     <c ca="center">
                        <p>3</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>
                           <it>ara</it>
                        </p>
                     </c>
                     <c ca="center">
                        <p>5</p>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>9</p>
                     </c>
                     <c ca="left">
                        <p>
                           <it>mab-21</it>
                        </p>
                     </c>
                     <c ca="center">
                        <p>2</p>
                     </c>
                     <c ca="left">
                        <p>CG4766</p>
                     </c>
                     <c ca="center">
                        <p>6</p>
                     </c>
                     <c ca="left">
                        <p>
                           <it>MAB21L1</it>
                        </p>
                     </c>
                     <c ca="center">
                        <p>5</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>
                           <it>mab-2</it>
                        </p>
                     </c>
                     <c ca="center">
                        <p>9</p>
                     </c>
                     <c ca="left">
                        <p>
                           <it>MAB21L2</it>
                        </p>
                     </c>
                     <c ca="center">
                        <p>4</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>10</p>
                     </c>
                     <c ca="left">
                        <p>
                           <it>cog-1</it>
                        </p>
                     </c>
                     <c ca="center">
                        <p>3</p>
                     </c>
                     <c ca="left">
                        <p>
                           <it>HGTX</it>
                        </p>
                     </c>
                     <c ca="center">
                        <p>20</p>
                     </c>
                     <c ca="left">
                        <p>
                           <it>NKX6-1</it>
                        </p>
                     </c>
                     <c ca="center">
                        <p>7</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>11</p>
                     </c>
                     <c ca="left">
                        <p>
                           <it>nhr-67</it>
                        </p>
                     </c>
                     <c ca="center">
                        <p>1</p>
                     </c>
                     <c ca="left">
                        <p>
                           <it>dsf</it>
                        </p>
                     </c>
                     <c ca="center">
                        <p>3</p>
                     </c>
                     <c ca="left">
                        <p>
                           <it>NR2E1</it>
                        </p>
                     </c>
                     <c ca="center">
                        <p>1</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>12</p>
                     </c>
                     <c ca="left">
                        <p>
                           <it>nhr-6</it>
                        </p>
                     </c>
                     <c ca="center">
                        <p>2</p>
                     </c>
                     <c ca="left">
                        <p>
                           <it>Hr38</it>
                        </p>
                     </c>
                     <c ca="center">
                        <p>8</p>
                     </c>
                     <c ca="left">
                        <p>
                           <it>NR4A2</it>
                        </p>
                     </c>
                     <c ca="center">
                        <p>12</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>13</p>
                     </c>
                     <c ca="left">
                        <p>
                           <it>vab-3</it>
                        </p>
                     </c>
                     <c ca="center">
                        <p>10</p>
                     </c>
                     <c ca="left">
                        <p>
                           <it>toy</it>
                        </p>
                     </c>
                     <c ca="center">
                        <p>1</p>
                     </c>
                     <c ca="left">
                        <p>
                           <it>PAX6</it>
                        </p>
                     </c>
                     <c ca="center">
                        <p>2</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>14</p>
                     </c>
                     <c ca="left">
                        <p>
                           <it>unc-30</it>
                        </p>
                     </c>
                     <c ca="center">
                        <p>1</p>
                     </c>
                     <c ca="left">
                        <p>
                           <it>ptx1</it>
                        </p>
                     </c>
                     <c ca="center">
                        <p>11</p>
                     </c>
                     <c ca="left">
                        <p>
                           <it>PITX2</it>
                        </p>
                     </c>
                     <c ca="center">
                        <p>3</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>15</p>
                     </c>
                     <c ca="left">
                        <p>
                           <it>unc-86</it>
                        </p>
                     </c>
                     <c ca="center">
                        <p>2</p>
                     </c>
                     <c ca="left">
                        <p>
                           <it>acj6</it>
                        </p>
                     </c>
                     <c ca="center">
                        <p>6</p>
                     </c>
                     <c ca="left">
                        <p>
                           <it>POU4F1</it>
                        </p>
                     </c>
                     <c ca="center">
                        <p>1</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>
                           <it>POU4F2</it>
                        </p>
                     </c>
                     <c ca="center">
                        <p>1</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>16</p>
                     </c>
                     <c ca="left">
                        <p>
                           <it>ptc-1</it>
                        </p>
                     </c>
                     <c ca="center">
                        <p>1</p>
                     </c>
                     <c ca="left">
                        <p>
                           <it>ptc</it>
                        </p>
                     </c>
                     <c ca="center">
                        <p>4</p>
                     </c>
                     <c ca="left">
                        <p>
                           <it>PTCH</it>
                        </p>
                     </c>
                     <c ca="center">
                        <p>3</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>17</p>
                     </c>
                     <c ca="left">
                        <p>
                           <it>egl-27</it>
                        </p>
                     </c>
                     <c ca="center">
                        <p>3</p>
                     </c>
                     <c ca="left">
                        <p>
                           <it>gug</it>
                        </p>
                     </c>
                     <c ca="center">
                        <p>7</p>
                     </c>
                     <c ca="left">
                        <p>
                           <it>RERE</it>
                        </p>
                     </c>
                     <c ca="center">
                        <p>1</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>18</p>
                     </c>
                     <c ca="left">
                        <p>
                           <it>unc-10</it>
                        </p>
                     </c>
                     <c ca="center">
                        <p>1</p>
                     </c>
                     <c ca="left">
                        <p>
                           <it>Rim</it>
                        </p>
                     </c>
                     <c ca="center">
                        <p>3</p>
                     </c>
                     <c ca="left">
                        <p>
                           <it>RIMS2</it>
                        </p>
                     </c>
                     <c ca="center">
                        <p>1</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>19</p>
                     </c>
                     <c ca="left">
                        <p>
                           <it>rnt-1</it>
                        </p>
                     </c>
                     <c ca="center">
                        <p>3</p>
                     </c>
                     <c ca="left">
                        <p>
                           <it>run</it>
                        </p>
                     </c>
                     <c ca="center">
                        <p>13</p>
                     </c>
                     <c ca="left">
                        <p>
                           <it>RUNX3</it>
                        </p>
                     </c>
                     <c ca="center">
                        <p>1</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>20</p>
                     </c>
                     <c ca="left">
                        <p>
                           <it>sem-4</it>
                        </p>
                     </c>
                     <c ca="center">
                        <p>4</p>
                     </c>
                     <c ca="left">
                        <p>
                           <it>Salm</it>
                        </p>
                     </c>
                     <c ca="center">
                        <p>32</p>
                     </c>
                     <c ca="left">
                        <p>
                           <it>SALL1</it>
                        </p>
                     </c>
                     <c ca="center">
                        <p>5</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>
                           <it>SALL3</it>
                        </p>
                     </c>
                     <c ca="center">
                        <p>11</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>
                           <it>SALL4</it>
                        </p>
                     </c>
                     <c ca="center">
                        <p>1</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>21</p>
                     </c>
                     <c ca="left">
                        <p>
                           <it>sox-3</it>
                        </p>
                     </c>
                     <c ca="center">
                        <p>2</p>
                     </c>
                     <c ca="left">
                        <p>
                           <it>SoxN</it>
                        </p>
                     </c>
                     <c ca="center">
                        <p>13</p>
                     </c>
                     <c ca="left">
                        <p>
                           <it>SOX1</it>
                        </p>
                     </c>
                     <c ca="center">
                        <p>2</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>
                           <it>SOX2</it>
                        </p>
                     </c>
                     <c ca="center">
                        <p>2</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>22</p>
                     </c>
                     <c ca="left">
                        <p>
                           <it>tbx-2</it>
                        </p>
                     </c>
                     <c ca="center">
                        <p>4</p>
                     </c>
                     <c ca="left">
                        <p>
                           <it>bi</it>
                        </p>
                     </c>
                     <c ca="center">
                        <p>28</p>
                     </c>
                     <c ca="left">
                        <p>
                           <it>TBX2</it>
                        </p>
                     </c>
                     <c ca="center">
                        <p>1</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>23</p>
                     </c>
                     <c ca="left">
                        <p>K06A1.1</p>
                     </c>
                     <c ca="center">
                        <p>1</p>
                     </c>
                     <c ca="left">
                        <p>
                           <it>AP-2</it>
                        </p>
                     </c>
                     <c ca="center">
                        <p>5</p>
                     </c>
                     <c ca="left">
                        <p>
                           <it>TFAP2A</it>
                        </p>
                     </c>
                     <c ca="center">
                        <p>2</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>
                           <it>TFAP2D</it>
                        </p>
                     </c>
                     <c ca="center">
                        <p>2</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>24</p>
                     </c>
                     <c ca="left">
                        <p>
                           <it>zag-1</it>
                        </p>
                     </c>
                     <c ca="center">
                        <p>4</p>
                     </c>
                     <c ca="left">
                        <p>
                           <it>zfh1</it>
                        </p>
                     </c>
                     <c ca="center">
                        <p>7</p>
                     </c>
                     <c ca="left">
                        <p>ZFHX1B</p>
                     </c>
                     <c ca="center">
                        <p>18</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>25</p>
                     </c>
                     <c ca="left">
                        <p>
                           <it>ref-2</it>
                        </p>
                     </c>
                     <c ca="center">
                        <p>1</p>
                     </c>
                     <c ca="left">
                        <p>
                           <it>opa</it>
                        </p>
                     </c>
                     <c ca="center">
                        <p>11</p>
                     </c>
                     <c ca="left">
                        <p>
                           <it>ZIC1</it>
                        </p>
                     </c>
                     <c ca="center">
                        <p>1</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>
                           <it>ZIC2</it>
                        </p>
                     </c>
                     <c ca="center">
                        <p>1</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>
                           <it>ZIC4</it>
                        </p>
                     </c>
                     <c ca="center">
                        <p>4</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>26</p>
                     </c>
                     <c ca="left">
                        <p>
                           <it>tlp-1</it>
                        </p>
                     </c>
                     <c ca="center">
                        <p>1</p>
                     </c>
                     <c ca="left">
                        <p>
                           <it>elB</it>
                        </p>
                     </c>
                     <c ca="center">
                        <p>9</p>
                     </c>
                     <c ca="left">
                        <p>
                           <it>ZNF503</it>
                        </p>
                     </c>
                     <c ca="center">
                        <p>3</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>
                           <it>noc</it>
                        </p>
                     </c>
                     <c ca="center">
                        <p>35</p>
                     </c>
                     <c ca="left">
                        <p>
                           <it>ZNF703</it>
                        </p>
                     </c>
                     <c ca="center">
                        <p>4</p>
                     </c>
                  </r>
               </tblbdy>
            </tbl>
         </sec>
         <sec>
            <st>
               <p>wCNEs identify transcriptional enhancer sequences and may function as transcription factor binding sites</p>
            </st>
            <p>Human CNEs have been proposed to act as <it>cis</it>-elements that regulate the transcription of developmental genes, and of the relatively few vertebrate CNEs that have been tested, the majority can act as tissue-specific enhancers when co-injected with a reporter gene in zebrafish or in transgenic mice <abbrgrp><abbr bid="B4">4</abbr><abbr bid="B7">7</abbr><abbr bid="B8">8</abbr><abbr bid="B9">9</abbr><abbr bid="B10">10</abbr></abbrgrp>. Therefore, we reasoned that, if worm CNEs also function as enhancers, then they should overlap multiple previously characterized enhancer sequences in the worm genome. By using literature searches, we compiled a list of 17 <it>C. elegans </it>genes with extensively dissected <it>cis</it>-regulatory sequences. We found that six of these genes are associated with wCNEs, and that, in five of these six cases, the wCNEs are contained within the defined enhancer regions (Additional data file 6). For example, the gene <it>ser-2 </it>is associated with five wCNEs, and each of these wCNEs lies within a genomic region that acts as a transcriptional enhancer for a different tissue or cell type (Figure <figr fid="F4">4</figr>). This provides good evidence that CNEs can act as transcriptional enhancers <it>in vivo</it>.</p>
            <fig id="F4">
               <title>
                  <p>Figure 4</p>
               </title>
               <caption>
                  <p>CNEs identify previously characterized enhancer sequences and when located in introns are associated with alternative transcriptional start sites</p>
               </caption>
               <text>
                  <p>CNEs identify previously characterized enhancer sequences and when located in introns are associated with alternative transcriptional start sites. Five wCNEs are contained within four elements that regulate ser-2, the <it>C. elegans </it>ortholog of human serotonin receptor 1A. The products of ser-2 were identified as components of the AIY interneuron gene battery in <it>C. elegans </it>[60]. ser-2 has at least three alternative transcription start sites that produce a number of different gene products, considered to be expressed in different but overlapping regions [61]. Remarkably, each of the alternative transcription start sites is marked by a wCNE in the proximal upstream region, with additional wCNEs lying further away, highlighting the underlying <it>cis</it>-regulatory elements. The upstream sequences of each of the alternative transcription start sites were defined by deletion analysis [61]. One of the wCNEs lies within an approximately 280 bp element driving expression in the AIY and SIA neuronal cellular subtypes. A second wCNE lies within an approximately 520 bp element driving expression in the RME neurons and also, consistently, in other unidentified neurons. A third wCNE lies within an approximately 1,150 bp element driving expression in the head muscles. Two more wCNEs are contained within a region driving expression in PVD and lateral OLL neurons. Only the experimentally tested constructs that overlap wCNEs are shown in this diagram.</p>
               </text>
               <graphic file="gb-2007-8-2-r15-4"/>
            </fig>
            <p>The simplest hypothesis for how CNEs function is that they encode arrays of transcription factor binding sites. If this were the case, then CNEs associated with genes known to be expressed in a particular tissue type should be enriched for DNA binding sites for transcription factors regulating the co-expression of these genes in that tissue. To test this hypothesis, we used DNA microarray data <abbrgrp><abbr bid="B31">31</abbr></abbrgrp> to identify 54 wCNE-associated genes that are expressed in the <it>C. elegans </it>pharynx. These genes are associated with a total of 120 wCNEs (from here on referred to as 'pharyngeal wCNEs'). Our set of pharyngeal wCNEs contains 40 intronic and 80 intergenic wCNEs, ranging in size from 31 bp to 216 bp (mean = 68.2 bp; median = 60 bp). It is important to note that many of the intergenic wCNEs in this set lie further than the classical 'promoter region' (often described as the first 500 bp to 1,000 bp upstream of a gene), with pharyngeal wCNEs ranging from 27 bp to 9,577 bp (mean = 2,970 bp; median = 2,053 bp) from the associated pharyngeal gene. To identify putative transcription factor binding sites in the pharyngeal wCNEs, we searched for overrepresented sequence motifs using the Weeder motif discovery algorithm, which searches for overrepresented motifs and then carries out a post-processing step to identify similar ('redundant') motifs among the highest scoring motifs <abbrgrp><abbr bid="B32">32</abbr></abbrgrp>. Weeder identified a single redundant motif that is significantly enriched in these sequences (<it>p </it>&lt; 0.002). Strikingly, this motif (Figure <figr fid="F5">5</figr>) is very similar to the consensus binding site of the pharyngeal transcription factor PHA-4. PHA-4 is the major specifier of pharyngeal cell identity in <it>C. elegans </it><abbrgrp><abbr bid="B33">33</abbr><abbr bid="B34">34</abbr></abbrgrp>, suggesting that occurrences of this motif in wCNEs represent genuine PHA-4 binding sites. Indeed, inspection of the seven highest scoring occurrences of the motif in the pharyngeal CNEs (Table <tblr tid="T2">2</tblr>) revealed that one of the predicted sites lies 1.2 kb upstream of the gene <it>ceh-22</it>, within a 30 bp pharyngeal muscle enhancer previously shown to be bound by PHA-4 <abbrgrp><abbr bid="B35">35</abbr></abbrgrp> (annotated in WormBase as two overlapping PHA-4 binding sites with WormBase identifiers WBsf019089 and WBsf019090). Therefore, by searching for overrepresented motifs in a set of wCNEs associated with genes expressed in the pharynx, we were able to identify the binding site for the transcription factor that acts as the major specifier of pharyngeal cell identity. Taken together with the identification of other wCNEs within previously characterized enhancers, this suggests that wCNEs represent enhancer sequences that function (at least partially) by encoding transcription factor binding sites.</p>
            <tbl id="T2">
               <title>
                  <p>Table 2</p>
               </title>
               <caption>
                  <p>Occurrences of a sequence motif overrepresented in wCNEs associated with pharyngeal genes</p>
               </caption>
               <tblbdy cols="7">
                  <r>
                     <c ca="left">
                        <p>wCNE coordinates</p>
                     </c>
                     <c ca="center">
                        <p>Strand</p>
                     </c>
                     <c ca="center">
                        <p>Matching sequence</p>
                     </c>
                     <c ca="center">
                        <p>Position</p>
                     </c>
                     <c ca="center">
                        <p>Score</p>
                     </c>
                     <c ca="center">
                        <p>wCNE distance from TSS</p>
                     </c>
                     <c ca="center">
                        <p>Gene name</p>
                     </c>
                  </r>
                  <r>
                     <c cspan="7">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>IV:3776258..3776298</p>
                     </c>
                     <c ca="center">
                        <p>+</p>
                     </c>
                     <c ca="center">
                        <p>TATTTAGCATCT</p>
                     </c>
                     <c ca="center">
                        <p>9</p>
                     </c>
                     <c ca="center">
                        <p>85.59</p>
                     </c>
                     <c ca="center">
                        <p>9,435</p>
                     </c>
                     <c ca="center">
                        <p>
                           <it>vab-2</it>
                        </p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>IV:8369551..8369581</p>
                     </c>
                     <c ca="center">
                        <p>-</p>
                     </c>
                     <c ca="center">
                        <p>TTTTTTGCAACT</p>
                     </c>
                     <c ca="center">
                        <p>3</p>
                     </c>
                     <c ca="center">
                        <p>91.65</p>
                     </c>
                     <c ca="center">
                        <p>347</p>
                     </c>
                     <c ca="center">
                        <p>D2096.6</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>V:10673732..10673841</p>
                     </c>
                     <c ca="center">
                        <p>-</p>
                     </c>
                     <c ca="center">
                        <p>TGTTTGTCCACT</p>
                     </c>
                     <c ca="center">
                        <p>15</p>
                     </c>
                     <c ca="center">
                        <p>87.26</p>
                     </c>
                     <c ca="center">
                        <p>1,202</p>
                     </c>
                     <c ca="center">
                        <p><it>ceh-22</it>*</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>V:13217316..13217419</p>
                     </c>
                     <c ca="center">
                        <p>+</p>
                     </c>
                     <c ca="center">
                        <p>TGTTTGGCAACT</p>
                     </c>
                     <c ca="center">
                        <p>23</p>
                     </c>
                     <c ca="center">
                        <p>100</p>
                     </c>
                     <c ca="center">
                        <p>3,588</p>
                     </c>
                     <c ca="center">
                        <p>F57B1.6</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>X:2215856..2215898</p>
                     </c>
                     <c ca="center">
                        <p>-</p>
                     </c>
                     <c ca="center">
                        <p>TGTTTTGAAATT</p>
                     </c>
                     <c ca="center">
                        <p>12</p>
                     </c>
                     <c ca="center">
                        <p>85.67</p>
                     </c>
                     <c ca="center">
                        <p>230</p>
                     </c>
                     <c ca="center">
                        <p>
                           <it>peb-1</it>
                        </p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>X:6621897..6621968</p>
                     </c>
                     <c ca="center">
                        <p>-</p>
                     </c>
                     <c ca="center">
                        <p>TTTATGGCAACT</p>
                     </c>
                     <c ca="center">
                        <p>47</p>
                     </c>
                     <c ca="center">
                        <p>88.99</p>
                     </c>
                     <c ca="center">
                        <p>826</p>
                     </c>
                     <c ca="center">
                        <p>C25B8.4</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>X:7457940..7457992</p>
                     </c>
                     <c ca="center">
                        <p>+</p>
                     </c>
                     <c ca="center">
                        <p>TGTTTGACAATT</p>
                     </c>
                     <c ca="center">
                        <p>5</p>
                     </c>
                     <c ca="center">
                        <p>91.56</p>
                     </c>
                     <c ca="center">
                        <p>2,212</p>
                     </c>
                     <c ca="center">
                        <p>
                           <it>sox-2</it>
                        </p>
                     </c>
                  </r>
               </tblbdy>
               <tblfn>
                  <p>We used the Weeder motif discovery program to search for overrepresented motifs in all wCNEs spatially associated with genes predicted to be expressed in the pharynx based on microarray data [31]. From this dataset, Weeder identified a motif very similar to the consensus binding site for PHA-4, the master specifier of pharyngeal cell identity (TRTTKRY, where R = A/G, K = T/G, and Y = T/C) [33, 34]. This table shows the coordinates (WormBase version WS140) of the wCNEs that contain matches to the overrepresented motif, the coordinates of the matches within the wCNEs, the Weeder scores of the matches to the motif, the distances (in bp) between the wCNEs and the transcription start site (TSS) of the associated genes and the names of the associated genes. The predicted site in the element 1.2 kb upstream of <it>ceh-22 </it>(marked with an asterisk) lies within a 30 bp pharyngeal muscle enhancer bound by PHA-4 [35].</p>
               </tblfn>
            </tbl>
            <fig id="F5">
               <title>
                  <p>Figure 5</p>
               </title>
               <caption>
                  <p>Worm CNEs are enriched for transcription factor binding sites</p>
               </caption>
               <text>
                  <p>Worm CNEs are enriched for transcription factor binding sites. Sequence logo representation of the motif significantly overrepresented in wCNEs associated with pharyngeal genes, according to the Weeder motif discovery algorithm [32]. The first six bases of this motif (with consensus TGTTTGGCAACT) agree with the first six bases of the consensus binding site of the PHA-4 transcription factor (TRTTKRY, where R = A/G, K = T/G, and Y = T/C) [33, 34]. Note that the seventh position of the predicted motif has low information content, indicating that sites with differences in this position are still likely to represent variants of the same transcription factor binding site.</p>
               </text>
               <graphic file="gb-2007-8-2-r15-5"/>
            </fig>
         </sec>
         <sec>
            <st>
               <p>Intronic wCNEs likely function as enhancers for downstream alternative transcription start sites</p>
            </st>
            <p>Almost a third of wCNEs (624/2,084) are located in introns. To investigate whether intronic wCNEs represent a separate type of element, we examined whether they are associated with particular classes of transcripts. We found that there is a strong association between the presence of an intronic wCNE and genes that are known to produce multiple different transcripts (57% of genes containing intronic wCNEs have documented alternative transcripts, compared to 19% of all multi-exon genes). Moreover, in 70% of the cases of alternatively spliced genes containing intronic wCNEs, the gene encodes an alternative first exon (compared to 35% of all genes with alternative transcripts). This suggests that intronic wCNEs are strongly associated with genes with alternative first exons and, therefore, that intronic wCNEs may act as enhancers for downstream alternative start sites. In support of this hypothesis, in 78% of cases the intronic wCNE is located upstream of the alternative first exon (see Figure <figr fid="F4">4</figr> for examples). Therefore, we do not believe that, in general, intronic wCNEs regulate alternative splicing. Rather, we suggest that, at least in <it>C. elegans</it>, the majority of intronic wCNEs, like intergenic wCNEs, probably function as <it>cis</it>-regulatory transcriptional enhancers, but for downstream alternative transcriptional start sites.</p>
         </sec>
      </sec>
      <sec>
         <st>
            <p>Discussion</p>
         </st>
         <sec>
            <st>
               <p>Shared properties of nematode and vertebrate CNEs</p>
            </st>
            <p>We have identified a set of highly conserved non-coding sequences (wCNEs) in the genome of <it>C. elegans</it>. Just as with CNEs in the human genome, these wCNEs are clustered around genes that encode regulators of development, especially transcription factors and signaling genes. Both human and worm CNEs share striking nucleotide frequency patterns at their boundaries and are similarly AT-rich, despite differences in the background A+T content of their genomes. Worm CNEs overlap many independently identified <it>cis-</it>regulatory elements, and vertebrate CNEs can act as tissue-specific enhancers in transient zebrafish assays. It seems likely, therefore, that human and worm CNEs function analogously as <it>cis</it>-elements that regulate the transcription of a core set of developmental regulatory genes. Consistent with this model, intronic wCNEs are very strongly associated with downstream alternative transcriptional start sites, suggesting that they too probably function as tissue-specific <it>cis</it>-regulatory elements.</p>
            <p>How do CNEs regulate gene-expression? The simplest model is that CNEs encode transcription factor binding sites. In support of this model, we find that wCNEs associated with genes expressed in the pharynx are significantly enriched for a DNA motif that matches the binding-site of the major pharynx specifying transcription factor PHA-4. However, it is still difficult to reconcile the length and level of sequence conservation of CNEs with the known sequence constraints of transcription factor binding sites, especially since conservation of individual transcription factor binding sites has been found unnecessary for conservation of enhancer function (for example, <abbrgrp><abbr bid="B36">36</abbr></abbrgrp>). Therefore, CNEs may represent very dense, potentially overlapping transcription factor binding sites. If this scenario were true, differences in the number of overlapping constraints between different <it>cis</it>-elements would manifest as differences in the degree of sequence conservation of these elements, with CNEs representing the most extreme cases. Indeed, reducing the stringency of the conservation threshold (either by relaxing the similarity search criteria <abbrgrp><abbr bid="B4">4</abbr><abbr bid="B27">27</abbr></abbrgrp> or by comparing less divergent species <abbrgrp><abbr bid="B37">37</abbr></abbrgrp>) often reveals additional or longer non-coding elements. Alternatively, CNEs may also encode some additional regulatory function. For example, it is possible to envisage mechanisms involving sequence recognition between homologous chromosomes (for example, 'transvection' <abbrgrp><abbr bid="B38">38</abbr></abbrgrp>) that would require sequence identity to be maintained between the maternal and paternal genomes.</p>
            <p>Remarkably, of the 190 genes that are associated with CNEs in humans and have orthologs in <it>C. elegans</it>, 60 have orthologs that are also associated with CNEs in <it>C. elegans</it>. This suggests that unrelated CNEs may be associated with a core set of regulatory genes in many divergent animal species. In support of this, 40 of these 60 genes are also orthologous to CNE-associated genes in <it>D. melanogaster</it>. Such an overlap between the sets of CNE-associated genes from three animal phyla is very unlikely to have arisen by chance, and suggests that a core set of developmental regulatory genes may be associated with CNEs across all animal lineages.</p>
            <p>Because of its genetic tractability and reduced intergenic distances, we propose that <it>C. elegans </it>will serve as an excellent model organism for further understanding the mechanism by which CNEs regulate gene expression. The dissection of CNEs in parallel in different animals using both computational and experimental approaches would provide us with valuable insight into the evolution of the regulatory networks that control the development of the metazoan body plan.</p>
            <p>Since many of the genes associated with CNEs encode for transcription factors that control early development, it is possible that CNEs themselves are bound by these transcription factors. Orthologous transcription factors are not only present in most metazoan lineages, but also often have highly conserved DNA-binding domains (for example, the DNA-binding domains of orthologous HOX <abbrgrp><abbr bid="B39">39</abbr></abbrgrp>, FOXA <abbrgrp><abbr bid="B33">33</abbr><abbr bid="B34">34</abbr></abbrgrp> and Brachyury T-box <abbrgrp><abbr bid="B40">40</abbr></abbrgrp> proteins). It is tempting, therefore, to speculate that CNEs might function as enhancers even when tested in different animal lineages. A small number of reporter gene assays testing enhancers of regulatory genes from vertebrates in flies have shown positive results (for example, <abbrgrp><abbr bid="B41">41</abbr><abbr bid="B42">42</abbr><abbr bid="B43">43</abbr></abbrgrp>), indicating that the regulators of these vertebrate enhancers are also present in flies. However, we would not expect alternative CNEs from different animals to drive the same expression patterns, reflecting differences in the body plans of different animal lineages.</p>
         </sec>
         <sec>
            <st>
               <p>CNEs and the evolution of animal body plans</p>
            </st>
            <p>The evolution of <it>cis</it>-regulatory elements is an important driving force in the evolution of gene regulatory networks (GRNs). In the case of multicellular animals, the initial assembly and subsequent modifications of <it>cis</it>-elements for key developmental control genes probably allowed the 're-wiring' of developmental GRNs and, hence, the evolution of new animal body plans <abbrgrp><abbr bid="B44">44</abbr></abbrgrp>. In this way, regulatory genes became associated with alternative sets of <it>cis</it>-elements in different animal lineages and these <it>cis</it>-elements now define the core GRNs of each animal body plan. We propose that CNEs represent the 'hard-wired' sequence traces of these core animal group-specific GRNs. The alternative core GRNs of different animal lineages are reflected in their having alternative CNEs. However, because of their co-evolution from a common metazoan ancestor, the core GRNs of different animal groups often utilize the same regulatory genes. As a result, distinct yet parallel sets of CNEs have become irreversibly associated with the same genes that coordinate core developmental networks in diverse animal groups. Indeed, this evolution of regulatory elements may underlie the astounding diversification of animal body plans that was seen during the Cambrian period approximately 550 million years ago.</p>
         </sec>
      </sec>
      <sec>
         <st>
            <p>Materials and methods</p>
         </st>
         <sec>
            <st>
               <p>Identification of conserved non-coding elements in <it>C. elegans</it></p>
            </st>
            <p>DNA sequences and annotation files for the <it>C. elegans </it>genome (release WS140), the DNA sequence for the <it>C. briggsae </it>genome (release cb25) and the repeat-masked sequence of the <it>C. remanei </it>genome (downloaded on 30 October 2005) were retrieved from WormBase <abbrgrp><abbr bid="B45">45</abbr></abbrgrp>. The sequence of each <it>C. elegans </it>chromosome was split into 500 kb fragments overlapping by 200 bp. We searched for local similarity between each 500 kb sequence fragment from <it>C. elegans </it>against the genome of <it>C</it>. briggsae using MegaBlast (version 2.2.6) <abbrgrp><abbr bid="B46">46</abbr></abbrgrp>. We performed MegaBlast searches with soft masking, e-value threshold of 0.001 and word seed size 100 bp (W100), 75 bp (W75), 50 bp (W50), 40 bp (W40) and 30 bp (W30). Where overlapping regions of the query (<it>C. elegans</it>) sequence matched more than one location in the <it>C. briggsae </it>genome, these regions of the query were merged, resulting in non-overlapping elements. Conserved elements were annotated according to the set of WormBase features provided in Additional data file 7. Elements not overlapping any of these features were marked as 'unannotated' elements and elements within introns of protein-coding genes were only annotated as 'intronic' if they did not overlap any type of exons or repeats (Additional data file 7). Our definition of unannotated and intronic conserved elements is very conservative, so that any amount of overlap between a conserved element and a genomic feature, such as any type of exon, a match to an expressed sequence, a predicted gene, or a repeat is considered sufficient to mask this element as exonic or repetitive. Unannotated and intronic elements were further scanned for missed repeats using RepeatMasker (version 3.0.8, slow/sensitive option, using Crossmatch) <abbrgrp><abbr bid="B47">47</abbr></abbrgrp> using both the <it>C. elegans </it>repeat library distributed with the program and the <it>C. briggsae </it>library downloaded from WormBase <abbrgrp><abbr bid="B18">18</abbr></abbrgrp>. The remaining elements were scanned for missed tRNAs using tRNAscan-SE (v.1.11) <abbrgrp><abbr bid="B48">48</abbr></abbrgrp>. In addition, 36 elements with a BLAST match in Rfam <abbrgrp><abbr bid="B49">49</abbr></abbrgrp>, the microRNA registry database <abbrgrp><abbr bid="B50">50</abbr></abbrgrp> and EMBL expressed sequence tags (downloaded on 21 April 2005) were removed (e-value threshold of 0.0001). We then checked whether the remaining unannotated or intronic conserved elements found in <it>C. elegans </it>and <it>C. briggsae </it>were similarly conserved in the genome of <it>C. remanei </it>using MegaBlast, with soft masking, word seed length of 30 bp and e-value threshold 0.0001. Of the elements conserved between <it>C. elegans </it>and <it>C. briggsae</it>, 69% were also found in <it>C. remanei </it>using the same similarity search criteria. The sequences of the final set of 2,084 wCNEs are provided in Additional data file 8. Finally, we searched the set of 2,084 wCNEs for sequence similarity against the human CNEs (1,373 sequences from Woolfe <it>et al</it>. <abbrgrp><abbr bid="B4">4</abbr></abbrgrp>) using BlastN (version 2.2.6) <abbrgrp><abbr bid="B51">51</abbr></abbrgrp> and found no significant hits (e-value threshold = 0.0001).</p>
            <p>We compared the final set of elements conserved between all three <it>Caenorhabditis </it>species with elements identified as conserved by WABA, a sensitive alignment method designed to find homologous regions between the <it>C. elegans </it>and the <it>C. briggsae </it>genome and annotate them as 'strongly conserved', 'weakly conserved' or 'coding' using information on conservation and the third base 'wobble' of protein-coding sequences <abbrgrp><abbr bid="B12">12</abbr></abbrgrp>. Of all base pairs in wCNEs, 97% are contained within alignments classified as strongly conserved, 0.6% are within alignments classified as coding and 6% are within alignments classified as weakly conserved.</p>
            <p>We also calculated the overlap between wCNEs and elements predicted as conserved using PhastCons. PhastCons <abbrgrp><abbr bid="B14">14</abbr></abbrgrp> is a statistical method that scores sequences in alignments according to how much more likely it is that they are conserved than that they are evolving neutrally, based on a phylogenetic hidden Markov model. Elements predicted to be conserved by PhastCons based on <it>C. elegans</it>-<it>C. briggsae </it>BLASTZ alignments <abbrgrp><abbr bid="B52">52</abbr></abbrgrp> were retrieved through the UCSC Genome Browser (table PhastConsElements).</p>
         </sec>
         <sec>
            <st>
               <p>Genomic distribution and sequence analysis of wCNEs</p>
            </st>
            <p>The clustering of wCNEs along the <it>C. elegans </it>chromosome and the comparison of the distances between CNEs in the human and the <it>C. elegans </it>genome were calculated as described in Woolfe <it>et al</it>. <abbrgrp><abbr bid="B4">4</abbr></abbrgrp>. We assessed the enrichment of wCNEs on chromosome X using a randomization test. We generated 1,000 sets of 2,084 (that is, the same number as the wCNEs) random locations in the <it>C. elegans </it>genome, making sure that the random locations lie within non-coding and non-repetitive regions. The random sets had, on average, 487.3 wCNEs on X (minimum = 432; maximum = 550). Therefore, the enrichment of wCNEs on chromosome X is highly significant (<it>p </it>value &lt; 0.001).</p>
            <p>The A+T nucleotide content in the 200 bp flanking CNEs and the first 15 bp of CNEs were calculated according to Walter <it>et al</it>. <abbrgrp><abbr bid="B22">22</abbr></abbrgrp>. In brief, 215 bp of sequence from one CNE end and 215 bp of reverse complemented sequence from the other CNE end were aligned according to the first position of each CNE. The percentage of A+T nucleotide composition was calculated and plotted for each position along this 215 bp alignment.</p>
         </sec>
         <sec>
            <st>
               <p>Analysis of CNE-associated genes in <it>C. elegans</it>, <it>D. melanogaster </it>and <it>H. sapiens</it></p>
            </st>
            <p>For each of the 2,084 wCNEs, we identified the protein-coding genes with the nearest transcription start site (TSS) according to the WormBase annotation (WS140; Additional data file 9). We assigned 1,241 genes to the wCNEs. As fly CNEs we used the 20,301 intergenic and intronic 'ultraconserved' (uc) elements between <it>D. melanogaster </it>and <it>D. pseudoobscura </it>with size &#8805;50 bp from Glazov <it>et al</it>. <abbrgrp><abbr bid="B13">13</abbr></abbrgrp>. We associated each uc-element to the gene with the nearest transcription start site (according to the fly genome annotation release dm1). We assigned 3,750 genes to fly uc-elements. Similarly, we identified the nearest protein-coding genes to the 1,373 human elements conserved between human and Fugu from Woolfe <it>et al</it>. <abbrgrp><abbr bid="B4">4</abbr></abbrgrp> according to human genome NCBI35 accessed via Ensembl <abbrgrp><abbr bid="B53">53</abbr></abbrgrp> v35 (397 genes) (Additional data file 10).</p>
            <p>The GO annotation files for <it>C. elegans </it>(revision 1.55) and human (revision 1.22) were downloaded directly from the GO website <abbrgrp><abbr bid="B26">26</abbr></abbrgrp>. Only GO terms inferred automatically (GO evidence code IEA) were used in our analysis because of the heavy bias of RNAi phenotypes on the GO annotations of the genes in <it>C. elegans </it>(our unpublished observation). To increase the signal, GO terms were converted to the higher-level GOslim terms and these are the terms we have used in this paper. GOslim term associations and counts for <it>C. elegans </it>and human genes were calculated using the Perl script map2slim from the go-perl package and the generic GOslim (revision 1.116) <abbrgrp><abbr bid="B54">54</abbr></abbrgrp>. Out of 397 human genes and 1,241 <it>C. elegans </it>genes, 274 and 638, respectively, were assigned a GOslim term. We retrieved protein domains for the <it>C. elegans </it>and the human genes from Ensembl. Out of 397 human genes and 1,241 <it>C. elegans </it>genes spatially associated with CNEs, 316 and 877, respectively, were annotated with at least one InterPro domain. To increase the signal, each domain was converted to the top-level parent domain according to the InterPro protein domain annotation using a custom Perl script. The following analysis was carried out for all top-level InterPro domains found in at least ten genes in the human and the <it>C. elegans </it>genomes. For each type of annotation (i.e. each GOsilm term and each InterPro parent domain), we calculated the log odds ratio log((a &#215; b)/(c &#215; d)) , where   a is the number of genes in the CNE-associated gene set with the specific annotation, b is the number of annotated genes without the specific annotation and not in the CNE-associated genes set, c is the number of   genes with the specific annotation but not in the CNE-associated gene set, and d is the number of remaining annotated genes without the specific annotation in the CNE-associated gene set. The log odds ratios, confidence   intercals (CI) and p-values were calculated using the R statistical package <abbrgrp><abbr bid="B55">55</abbr></abbrgrp> (according to a two-tailed test). The <it>p </it>value threshold at the 5% false discovery rate (FDR) cut-off was calculated according to the false discovery rate method by Benjamini and Hochberg <abbrgrp><abbr bid="B56">56</abbr></abbrgrp>. We also carried out a randomization test to check how likely it is to get by chance the same proportion of genes annotated with the GO terms 'transcription factor activity' and 'development' as we did for the wCNE set. To do this, we generated 1,000 sets of 2,084 (that is, the same number as the wCNEs) random locations in the <it>C. elegans </it>genome, making sure that the random locations lie within non-coding and non-repetitive regions. For each random location, we then retrieved the gene with the nearest transcription start site. Finally, for each set, we counted the proportion of genes annotated with the GO terms 'transcription factor activity' and 'development'.</p>
            <p>Orthologous gene clusters were retrieved from Inparanoid (version 4.0) <abbrgrp><abbr bid="B57">57</abbr></abbrgrp>. The Inparanoid dataset contains clusters of orthologous proteins between pairs of genomes. There are 4,558 Inparanoid clusters of orthologous proteins from <it>C. elegans </it>and human that contain 8,846 human proteins and 5,614 <it>C. elegans </it>proteins. Of the human protein-coding genes with <it>C. elegans </it>orthologs and the <it>C. elegans </it>protein-coding genes with human orthologs, 190 and 424, respectively, are associated with CNEs. For <it>D. melanogaster </it>and <it>H. sapiens</it>, there are 5,497 Inparanoid clusters of orthologs that contain 8,960 human proteins and 6,170 <it>D. melanogaster </it>proteins. Of the human protein-coding genes with <it>D. melanogaster </it>orthologs and the <it>D. melanogaster </it>protein-coding genes with human orthologs, 215 and 1,254, respectively, are associated with CNEs/uc-elements. To evaluate the significance of the overlap between CNE-associated genes in human and <it>C. elegans</it>, we performed 1,000 randomizations, randomly picking 424 <it>C. elegans </it>genes from those in the Inparanoid clusters and counting how many of them have an ortholog among the 190 human CNE-associated genes. The same overlap of CNE-associated genes in the two genomes was never seen in 1,000 randomizations. Similarly, we performed 1,000 randomizations, randomly picking 1,254 <it>D. melanogaster </it>genes and counting how many of them have an ortholog among the 215 human CNE-associated genes. Again, the same overlap of CNE-associated genes in the two genomes was never seen in 1,000 randomizations.</p>
         </sec>
         <sec>
            <st>
               <p>Motif discovery in pharyngeal gene-associated CNEs</p>
            </st>
            <p>Genes expressed in the pharynx were identified by microarray analysis in <abbrgrp><abbr bid="B31">31</abbr></abbrgrp>. The 120 wCNEs associated with pharyngeal genes were submitted to a local installation of Weeder (version 1.3) <abbrgrp><abbr bid="B32">32</abbr></abbrgrp>. We performed an 'extra' mode search (that is, looking for motifs 6 bp long with 1 mismatch, 8 bp long with 3 mismatches, 10 bp long with 4 mismatches and 12 bp long with 4 mismatches), looking for motifs in both strands and reporting back 50 motifs. Post-processing of the identified motifs carried out by Weeder returned one 'redundant' motif in the form of a position weight matrix (PWM) and seven high scoring matches of this PWM in the input set of sequences. The significance of the motif identified by Weeder was estimated using the Weeder <it>p </it>value calculator <abbrgrp><abbr bid="B58">58</abbr></abbrgrp>. The high scoring matches of the PWM in the pharyngeal wCNEs are shown in Table <tblr tid="T2">2</tblr>. The sequence logo for this PWM was created using WebLogo <abbrgrp><abbr bid="B59">59</abbr></abbrgrp>.</p>
         </sec>
      </sec>
      <sec>
         <st>
            <p>Additional data files</p>
         </st>
         <p>The following additional data are available with the online version of this paper. Additional data file <supplr sid="S1">1</supplr> is a figure showing the distribution of wCNEs along each chromosome. Additional data file <supplr sid="S2">2</supplr> is a figure showing the distribution of distances between CNEs in both the <it>C. elegans </it>and the <it>H. sapiens </it>genomes and for randomized CNE locations. Additional data file <supplr sid="S3">3</supplr> is a figure showing the distribution of distances between intergenic wCNEs and their nearest genes. Additional data file <supplr sid="S4">4</supplr> is a table showing all the top-level InterPro protein domains significantly enriched in CNE-associated genes compared to the rest of the genes in the <b>(a) </b><it>H. sapiens </it>and <b>(b) </b><it>C. elegans </it>genomes. Additional data file <supplr sid="S5">5</supplr> is a table showing the enrichment for transcription factors among wCNE-associated genes, using two different collections of predicted transcription factors from the <it>C. elegans </it>genome. Additional data file <supplr sid="S6">6</supplr> is a table showing wCNEs overlapping known <it>cis</it>-regulatory elements. Additional data file <supplr sid="S7">7</supplr> is a table showing the annotation features from WormBase that were used to annotate wCNEs. Additional data file <supplr sid="S8">8</supplr> contains the sequences of the 2,084 wCNEs. Additional data file <supplr sid="S9">9</supplr> is a table with the coordinates of the wCNEs in the <it>C. elegans </it>genome, their nearest genes and their human orthologs. Additional data file <supplr sid="S10">10</supplr> is a table with the coordinates of the human CNEs from Woolfe <it>et al</it>. <abbrgrp><abbr bid="B4">4</abbr></abbrgrp> with their nearest genes assigned using the same method as for the wCNEs.</p>
         <suppl id="S1">
            <title>
               <p>Additional data file 1</p>
            </title>
            <caption>
               <p>Distribution of wCNEs along each chromosome</p>
            </caption>
            <text>
               <p>Distribution of wCNEs along each chromosome.</p>
            </text>
            <file name="gb-2007-8-2-r15-S1.pdf">
               <p>Click here for file</p>
            </file>
         </suppl>
         <suppl id="S2">
            <title>
               <p>Additional data file 2</p>
            </title>
            <caption>
               <p>Distribution of distances between CNEs in both the <it>C. elegans </it>and the <it>H. sapiens </it>genomes and for randomized CNE locations</p>
            </caption>
            <text>
               <p>Distribution of distances between CNEs in both the <it>C. elegans </it>and the <it>H. sapiens </it>genomes and for randomized CNE locations.</p>
            </text>
            <file name="gb-2007-8-2-r15-S2.pdf">
               <p>Click here for file</p>
            </file>
         </suppl>
         <suppl id="S3">
            <title>
               <p>Additional data file 3</p>
            </title>
            <caption>
               <p>Distribution of distances between intergenic wCNEs and their nearest genes</p>
            </caption>
            <text>
               <p>Distribution of distances between intergenic wCNEs and their nearest genes.</p>
            </text>
            <file name="gb-2007-8-2-r15-S3.pdf">
               <p>Click here for file</p>
            </file>
         </suppl>
         <suppl id="S4">
            <title>
               <p>Additional data file 4</p>
            </title>
            <caption>
               <p>Top-level InterPro protein domains significantly enriched in CNE-associated genes compared to the rest of the genes in the <it>H. sapiens </it>and <it>C. elegans </it>genomes</p>
            </caption>
            <text>
               <p>Top-level InterPro protein domains significantly enriched in CNE-associated genes compared to the rest of the genes in the (a) <it>H. sapiens </it>and (b) <it>C. elegans </it>genomes.</p>
            </text>
            <file name="gb-2007-8-2-r15-S4.pdf">
               <p>Click here for file</p>
            </file>
         </suppl>
         <suppl id="S5">
            <title>
               <p>Additional data file 5</p>
            </title>
            <caption>
               <p>Enrichment for transcription factors among wCNE-associated genes, using two different collections of predicted transcription factors from the <it>C. elegans </it>genome</p>
            </caption>
            <text>
               <p>Enrichment for transcription factors among wCNE-associated genes, using two different collections of predicted transcription factors from the <it>C. elegans </it>genome.</p>
            </text>
            <file name="gb-2007-8-2-r15-S5.pdf">
               <p>Click here for file</p>
            </file>
         </suppl>
         <suppl id="S6">
            <title>
               <p>Additional data file 6</p>
            </title>
            <caption>
               <p>wCNEs overlapping known <it>cis</it>-regulatory elements</p>
            </caption>
            <text>
               <p>wCNEs overlapping known <it>cis</it>-regulatory elements.</p>
            </text>
            <file name="gb-2007-8-2-r15-S6.pdf">
               <p>Click here for file</p>
            </file>
         </suppl>
         <suppl id="S7">
            <title>
               <p>Additional data file 7</p>
            </title>
            <caption>
               <p>Annotation features from WormBase that were used to annotate wCNEs</p>
            </caption>
            <text>
               <p>Annotation features from WormBase that were used to annotate wCNEs.</p>
            </text>
            <file name="gb-2007-8-2-r15-S7.pdf">
               <p>Click here for file</p>
            </file>
         </suppl>
         <suppl id="S8">
            <title>
               <p>Additional data file 8</p>
            </title>
            <caption>
               <p>Sequences of the 2,084 wCNEs</p>
            </caption>
            <text>
               <p>Sequences of the 2,084 wCNEs.</p>
            </text>
            <file name="gb-2007-8-2-r15-S8.txt">
               <p>Click here for file</p>
            </file>
         </suppl>
         <suppl id="S9">
            <title>
               <p>Additional data file 9</p>
            </title>
            <caption>
               <p>Coordinates of the wCNEs in the <it>C. elegans </it>genome, their nearest genes and their human orthologs</p>
            </caption>
            <text>
               <p>Coordinates of the wCNEs in the <it>C. elegans </it>genome, their nearest genes and their human orthologs.</p>
            </text>
            <file name="gb-2007-8-2-r15-S9.xls">
               <p>Click here for file</p>
            </file>
         </suppl>
         <suppl id="S10">
            <title>
               <p>Additional data file 10</p>
            </title>
            <caption>
               <p>Coordinates of the human CNEs from Woolfe <it>et al</it>. <abbrgrp><abbr bid="B4">4</abbr></abbrgrp> with their nearest genes assigned using the same method as for the wCNEs</p>
            </caption>
            <text>
               <p>Coordinates of the human CNEs from Woolfe <it>et al</it>. <abbrgrp><abbr bid="B4">4</abbr></abbrgrp> with their nearest genes assigned using the same method as for the wCNEs.</p>
            </text>
            <file name="gb-2007-8-2-r15-S10.xls">
               <p>Click here for file</p>
            </file>
         </suppl>
      </sec>
   </bdy>
   <bm>
      <ack>
         <sec>
            <st>
               <p>Acknowledgements</p>
            </st>
            <p>We acknowledge Giulio Pavesi and Chris Mungall for assistance with Weeder and GOslim terms, respectively. BL thanks Andrew Fraser for support and many interesting discussions. TV is supported by a MRC Predoctoral Fellowship.</p>
         </sec>
      </ack>
      <refgrp>
         <bibl id="B1">
            <title>
               <p>Comparative genomics at the vertebrate extremes.</p>
            </title>
            <aug>
               <au>
                  <snm>Boffelli</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Nobrega</snm>
                  <fnm>MA</fnm>
               </au>
               <au>
                  <snm>Rubin</snm>
                  <fnm>EM</fnm>
               </au>
            </aug>
            <source>Nat Rev Genet</source>
            <pubdate>2004</pubdate>
            <volume>5</volume>
            <fpage>456</fpage>
            <lpage>465</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/nrg1350</pubid>
                  <pubid idtype="pmpid" link="fulltext">15153998</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B2">
            <title>
               <p>Ultraconserved elements in the human genome.</p>
            </title>
            <aug>
               <au>
                  <snm>Bejerano</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Pheasant</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Makunin</snm>
                  <fnm>I</fnm>
               </au>
               <au>
                  <snm>Stephen</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Kent</snm>
                  <fnm>WJ</fnm>
               </au>
               <au>
                  <snm>Mattick</snm>
                  <fnm>JS</fnm>
               </au>
               <au>
                  <snm>Haussler</snm>
                  <fnm>D</fnm>
               </au>
            </aug>
            <source>Science</source>
            <pubdate>2004</pubdate>
            <volume>304</volume>
            <fpage>1321</fpage>
            <lpage>1325</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1126/science.1098119</pubid>
                  <pubid idtype="pmpid" link="fulltext">15131266</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B3">
            <title>
               <p>Arrays of ultraconserved non-coding regions span the loci of key developmental genes in vertebrate genomes.</p>
            </title>
            <aug>
               <au>
                  <snm>Sandelin</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Bailey</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Bruce</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Engstrom</snm>
                  <fnm>PG</fnm>
               </au>
               <au>
                  <snm>Klos</snm>
                  <fnm>JM</fnm>
               </au>
               <au>
                  <snm>Wasserman</snm>
                  <fnm>WW</fnm>
               </au>
               <au>
                  <snm>Ericson</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Lenhard</snm>
                  <fnm>B</fnm>
               </au>
            </aug>
            <source>BMC Genomics</source>
            <pubdate>2004</pubdate>
            <volume>5</volume>
            <fpage>99</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">544600</pubid>
                  <pubid idtype="pmpid" link="fulltext">15613238</pubid>
                  <pubid idtype="doi">10.1186/1471-2164-5-99</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B4">
            <title>
               <p>Highly conserved non-coding sequences are associated with vertebrate development.</p>
            </title>
            <aug>
               <au>
                  <snm>Woolfe</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Goodson</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Goode</snm>
                  <fnm>DK</fnm>
               </au>
               <au>
                  <snm>Snell</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>McEwen</snm>
                  <fnm>GK</fnm>
               </au>
               <au>
                  <snm>Vavouri</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Smith</snm>
                  <fnm>SF</fnm>
               </au>
               <au>
                  <snm>North</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Callaway</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Kelly</snm>
                  <fnm>K</fnm>
               </au>
               <etal/>
            </aug>
            <source>PLoS Biol</source>
            <pubdate>2005</pubdate>
            <volume>3</volume>
            <fpage>e7</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">526512</pubid>
                  <pubid idtype="pmpid" link="fulltext">15630479</pubid>
                  <pubid idtype="doi">10.1371/journal.pbio.0030007</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B5">
            <title>
               <p>Conserved non-genic sequences - an unexpected feature of mammalian genomes.</p>
            </title>
            <aug>
               <au>
                  <snm>Dermitzakis</snm>
                  <fnm>ET</fnm>
               </au>
               <au>
                  <snm>Reymond</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Antonarakis</snm>
                  <fnm>SE</fnm>
               </au>
            </aug>
            <source>Nat Rev Genet</source>
            <pubdate>2005</pubdate>
            <volume>6</volume>
            <fpage>151</fpage>
            <lpage>157</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/nrg1527</pubid>
                  <pubid idtype="pmpid" link="fulltext">15716910</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B6">
            <title>
               <p>Defining a genomic radius for long-range enhancer action: duplicated conserved non-coding elements hold the key.</p>
            </title>
            <aug>
               <au>
                  <snm>Vavouri</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>McEwen</snm>
                  <fnm>GK</fnm>
               </au>
               <au>
                  <snm>Woolfe</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Gilks</snm>
                  <fnm>WR</fnm>
               </au>
               <au>
                  <snm>Elgar</snm>
                  <fnm>G</fnm>
               </au>
            </aug>
            <source>Trends Genet</source>
            <pubdate>2006</pubdate>
            <volume>22</volume>
            <fpage>5</fpage>
            <lpage>10</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/j.tig.2005.10.005</pubid>
                  <pubid idtype="pmpid" link="fulltext">16290136</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B7">
            <title>
               <p>Ancient duplicated conserved noncoding elements in vertebrates: a genomic and functional analysis.</p>
            </title>
            <aug>
               <au>
                  <snm>McEwen</snm>
                  <fnm>GK</fnm>
               </au>
               <au>
                  <snm>Woolfe</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Goode</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Vavouri</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Callaway</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Elgar</snm>
                  <fnm>G</fnm>
               </au>
            </aug>
            <source>Genome Res</source>
            <pubdate>2006</pubdate>
            <volume>16</volume>
            <fpage>451</fpage>
            <lpage>465</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1457030</pubid>
                  <pubid idtype="pmpid" link="fulltext">16533910</pubid>
                  <pubid idtype="doi">10.1101/gr.4143406</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B8">
            <title>
               <p>Scanning human gene deserts for long-range enhancers.</p>
            </title>
            <aug>
               <au>
                  <snm>Nobrega</snm>
                  <fnm>MA</fnm>
               </au>
               <au>
                  <snm>Ovcharenko</snm>
                  <fnm>I</fnm>
               </au>
               <au>
                  <snm>Afzal</snm>
                  <fnm>V</fnm>
               </au>
               <au>
                  <snm>Rubin</snm>
                  <fnm>EM</fnm>
               </au>
            </aug>
            <source>Science</source>
            <pubdate>2003</pubdate>
            <volume>302</volume>
            <fpage>413</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1126/science.1088328</pubid>
                  <pubid idtype="pmpid" link="fulltext">14563999</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B9">
            <title>
               <p>A functional survey of the enhancer activity of conserved non-coding sequences from vertebrate Iroquois cluster gene deserts.</p>
            </title>
            <aug>
               <au>
                  <snm>de la Calle-Mustienes</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Feijoo</snm>
                  <fnm>CG</fnm>
               </au>
               <au>
                  <snm>Manzanares</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Tena</snm>
                  <fnm>JJ</fnm>
               </au>
               <au>
                  <snm>Rodriguez-Seguel</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Letizia</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Allende</snm>
                  <fnm>ML</fnm>
               </au>
               <au>
                  <snm>Gomez-Skarmeta</snm>
                  <fnm>JL</fnm>
               </au>
            </aug>
            <source>Genome Res</source>
            <pubdate>2005</pubdate>
            <volume>15</volume>
            <fpage>1061</fpage>
            <lpage>1072</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1182218</pubid>
                  <pubid idtype="pmpid" link="fulltext">16024824</pubid>
                  <pubid idtype="doi">10.1101/gr.4004805</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B10">
            <title>
               <p>Highly conserved regulatory elements around the SHH gene may contribute to the maintenance of conserved synteny across human chromosome 7q36.3.</p>
            </title>
            <aug>
               <au>
                  <snm>Goode</snm>
                  <fnm>DK</fnm>
               </au>
               <au>
                  <snm>Snell</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Smith</snm>
                  <fnm>SF</fnm>
               </au>
               <au>
                  <snm>Cooke</snm>
                  <fnm>JE</fnm>
               </au>
               <au>
                  <snm>Elgar</snm>
                  <fnm>G</fnm>
               </au>
            </aug>
            <source>Genomics</source>
            <pubdate>2005</pubdate>
            <volume>86</volume>
            <fpage>172</fpage>
            <lpage>181</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/j.ygeno.2005.04.006</pubid>
                  <pubid idtype="pmpid" link="fulltext">15939571</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B11">
            <title>
               <p>A distal enhancer and an ultraconserved exon are derived from a novel retroposon.</p>
            </title>
            <aug>
               <au>
                  <snm>Bejerano</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Lowe</snm>
                  <fnm>CB</fnm>
               </au>
               <au>
                  <snm>Ahituv</snm>
                  <fnm>N</fnm>
               </au>
               <au>
                  <snm>King</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Siepel</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Salama</snm>
                  <fnm>SR</fnm>
               </au>
               <au>
                  <snm>Rubin</snm>
                  <fnm>EM</fnm>
               </au>
               <au>
                  <snm>Kent</snm>
                  <fnm>WJ</fnm>
               </au>
               <au>
                  <snm>Haussler</snm>
                  <fnm>D</fnm>
               </au>
            </aug>
            <source>Nature</source>
            <pubdate>2006</pubdate>
            <volume>441</volume>
            <fpage>87</fpage>
            <lpage>90</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/nature04696</pubid>
                  <pubid idtype="pmpid" link="fulltext">16625209</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B12">
            <title>
               <p>Conservation, regulation, synteny, and introns in a large-scale <it>C. briggsae</it>-<it>C. elegans </it>genomic alignment.</p>
            </title>
   