<?xml version='1.0'?>
<!DOCTYPE art SYSTEM 'http://www.biomedcentral.com/xml/article.dtd'>
<art>
   <ui>gb-2007-8-4-r53</ui>
   <ji>GBJ</ji>
   <fm>
      <dochead>Research</dochead>
      <bibl>
         <title>
            <p>Comparative genomics using <it>Fugu </it>reveals insights into regulatory subfunctionalization</p>
         </title>
         <aug>
            <au id="A1" ca="yes">
               <snm>Woolfe</snm>
               <fnm>Adam</fnm>
               <insr iid="I1"/>
               <insr iid="I2"/>
               <email>woolfea@mail.nih.gov</email>
            </au>
            <au id="A2">
               <snm>Elgar</snm>
               <fnm>Greg</fnm>
               <insr iid="I1"/>
               <email>g.elgar@qmul.ac.uk</email>
            </au>
         </aug>
         <insg>
            <ins id="I1">
               <p>School of Biological Sciences, Queen Mary, University of London, Mile End Road, London E1 4NS, UK</p>
            </ins>
            <ins id="I2">
               <p>Genomic Functional Analysis Section, National Human Genome Research Institute, National Institutes of Health, Rockville, MD 20870, USA</p>
            </ins>
         </insg>
         <source>Genome Biology</source>
         <issn>1465-6906</issn>
         <pubdate>2007</pubdate>
         <volume>8</volume>
         <issue>4</issue>
         <fpage>R53</fpage>
         <url>http://genomebiology.com/2007/8/4/R53</url>
         <xrefbib>
            <pubidlist>
               <pubid idtype="pmpid">17428329</pubid>
               <pubid idtype="doi">10.1186/gb-2007-8-4-r53</pubid>
            </pubidlist>
         </xrefbib>
      </bibl>
      <history>
         <rec>
            <date>
               <day>1</day>
               <month>12</month>
               <year>2006</year>
            </date>
         </rec>
         <revrec>
            <date>
               <day>6</day>
               <month>3</month>
               <year>2007</year>
            </date>
         </revrec>
         <acc>
            <date>
               <day>11</day>
               <month>4</month>
               <year>2007</year>
            </date>
         </acc>
         <pub>
            <date>
               <day>11</day>
               <month>04</month>
               <year>2007</year>
            </date>
         </pub>
      </history>
      <cpyrt>
         <year>2007</year>
         <collab>Woolfe and Elgar; licensee BioMed Central Ltd.</collab>
         <note>This is an open access article distributed under the terms of the Creative Commons Attribution License (<url>http://creativecommons.org/licenses/by/2.0</url>), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</note>
      </cpyrt>
      <shorttitle>
         <p>Regulatory subfunctionalization in Fugu'</p>
      </shorttitle>
      <shortabs>
         <p>Fish-mammal genomic alignments were used to compare over 800 conserved non-coding elements that associate with genes that have undergone fish-specific duplication and retention, revealing a pattern of element retention and loss between paralogs indicative of subfunctionalization.</p>
      </shortabs>
      <abs>
         <sec>
            <st>
               <p>Abstract</p>
            </st>
            <sec>
               <st>
                  <p>Background</p>
               </st>
               <p>A major mechanism for the preservation of gene duplicates in the genome is thought to be mediated via loss or modification of <it>cis</it>-regulatory subfunctions between paralogs following duplication (a process known as regulatory subfunctionalization). Despite a number of gene expression studies that support this mechanism, no comprehensive analysis of regulatory subfunctionalization has been undertaken at the level of the distal <it>cis</it>-regulatory modules involved. We have exploited fish-mammal genomic alignments to identify and compare more than 800 conserved non-coding elements (CNEs) that associate with genes that have undergone fish-specific duplication and retention.</p>
            </sec>
            <sec>
               <st>
                  <p>Results</p>
               </st>
               <p>Using the abundance of duplicated genes within the <it>Fugu </it>genome, we selected seven pairs of teleost-specific paralogs involved in early vertebrate development, each containing clusters of CNEs in their vicinity. CNEs present around each <it>Fugu </it>duplicated gene were identified using multiple alignments of orthologous regions between single-copy mammalian orthologs (representing the ancestral locus) and each fish duplicated region in turn. Comparative analysis reveals a pattern of element retention and loss between paralogs indicative of subfunctionalization, the extent of which differs between duplicate pairs. In addition to complete loss of specific regulatory elements, a number of CNEs have been retained in both regions but may be responsible for more subtle levels of subfunctionalization through sequence divergence.</p>
            </sec>
            <sec>
               <st>
                  <p>Conclusion</p>
               </st>
               <p>Comparative analysis of conserved elements between duplicated genes provides a powerful approach for studying regulatory subfunctionalization at the level of the regulatory elements involved.</p>
            </sec>
         </sec>
      </abs>
   </fm>
   <meta>
      <classifications>
         <classification type="BMC" subtype="man_spc_id" id="30010008">Evolution</classification>
         <classification type="BMC" subtype="man_spc_id" id="30010010">Genome studies</classification>
      </classifications>
   </meta>
   <bdy>
      <sec>
         <st>
            <p>Background</p>
         </st>
         <p>Gene duplication is thought to be a major driving force in evolutionary innovation by providing material from which novel gene functions and expression patterns may arise. Duplicated genes have been shown to be present in all eukaryotic genomes currently sequenced <abbrgrp><abbr bid="B1">1</abbr></abbrgrp> and are thought to arise by tandem, chromosomal or whole genome duplication events. Unless the duplication event is immediately advantageous (for example, by gene dosage increasing evolutionary fitness), the gene pair will exhibit functional redundancy, allowing one of the pair to accumulate mutations without affecting key functions. Because deleterious mutations are thought to occur much more commonly than neutral or advantageous ones, the classic model for the evolutionary fate of duplicated genes <abbrgrp><abbr bid="B2">2</abbr><abbr bid="B3">3</abbr></abbrgrp> predicts the degeneration of one of the copies to a pseudogene as the most likely outcome (a process known as non-functionalization). Less commonly, a mutation will be advantageous, allowing one of the gene duplicates to evolve a new function (a process known as neo-functionalization). Therefore, the classic model predicts that these two competing outcomes will result in the elimination of most duplicated genes. However, several studies suggest that the proportion of duplicated genes retained in vertebrate genomes is much higher than is predicted by this model <abbrgrp><abbr bid="B4">4</abbr><abbr bid="B5">5</abbr><abbr bid="B6">6</abbr></abbrgrp>. This has led to the suggestion of an alternative model whereby complementary degenerative mutations in independent subfunctions of each gene copy permits their preservation in the genome, as both copies of the gene are now required to recapitulate the full range of functions present in the single ancestral gene. This was formalized in the Duplication-Degeneration-Complementation (DDC) model <abbrgrp><abbr bid="B7">7</abbr></abbrgrp> in a process referred to as subfunctionalization.</p>
         <p>The key novelty of the DDC model is that, rather than attributing different expression patterns of duplicated genes to the acquisition of novel functions, they are attributed to a partial (complementary) loss of function in each duplicate. In combination they retain the complete function of the pleiotropic original gene, but neither of them alone is sufficient to provide full functionality. For this model to be viable, the subfunctions of the gene are required to be independent so that mutations in one subfunction will not affect the other. The modular nature of many eukaryotic protein-coding sequences as well as <it>cis</it>-regulatory modules (CRMs), such as enhancers or silencers <abbrgrp><abbr bid="B8">8</abbr></abbrgrp>, means both can act as subfunctions or components of subfunctions of the gene in subfunctionalization. CRMs are <it>cis</it>-acting DNA sequences, up to several hundred bases in length, thought to be composed of clustered combinatorial binding sites for large numbers of transcription factors that together actuate a regulatory response for one or more genes <abbrgrp><abbr bid="B9">9</abbr></abbrgrp>. The larger number of independently mutable units represented by CRMs, the small size and rapid turnover of transcription factor binding sites, as well as observations that, for many gene duplicates, changes that occur between paralogs are due to changes in expression rather than protein function has led a number of researchers to emphasize that important evolutionary changes might occur primarily at the level of gene regulation <abbrgrp><abbr bid="B10">10</abbr><abbr bid="B11">11</abbr></abbrgrp>. Consequently, subfunctionalization is thought most likely to occur by complementary degenerative mutations within regulatory elements.</p>
         <p>Teleost fish provide an excellent system to study the DDC model in vertebrates due to the presence of extra gene duplicates that derive from a whole genome duplication event early in the evolution of ray-finned fishes 300-350 million years ago <abbrgrp><abbr bid="B12">12</abbr><abbr bid="B13">13</abbr><abbr bid="B14">14</abbr><abbr bid="B15">15</abbr><abbr bid="B16">16</abbr><abbr bid="B17">17</abbr></abbrgrp> This provides the opportunity for comparative analyses of gene duplicates in fish against a single ortholog in tetrapod lineages such as mammals. In particular, for analyses involving important developmentally associated genes, these 'single copies' represent as close as possible the ancestral gene from which the fish duplicates descended, since such genes are often highly conserved in sequence and function throughout vertebrates. We therefore refer to fish-specific duplicate genes as 'co-orthologs' (a term previously used in <abbrgrp><abbr bid="B18">18</abbr></abbrgrp>) as each copy is co-orthologous to the single homolog in tetrapods.</p>
         <p>A number of studies on fish duplicated genes have identified cases of subfunctionalization at both the regulatory and protein level. For instance, analysis of the <it>synapsin-Timp </it>genes in the pufferfish <it>Fugu rubripes </it>identified a case of protein subfunctionalization where two isoforms of the <it>SYN </it>gene expressed in human are expressed as two separate genes in <it>Fugu </it><abbrgrp><abbr bid="B19">19</abbr></abbrgrp>. A number of functional studies on the shared and divergent expression patterns of developmental co-orthologs in fish have also been carried out, for example, <it>eng2 </it><abbrgrp><abbr bid="B20">20</abbr></abbrgrp>, <it>sox9 </it><abbrgrp><abbr bid="B18">18</abbr></abbrgrp> and <it>runx2 </it><abbrgrp><abbr bid="B21">21</abbr></abbrgrp>. In each case, partitioning of ancestral expression domains for each co-ortholog compared to the single (ancestral representative) gene in mammals was observed via gene expression studies, supporting a process of regulatory subfunctionalization along the lines of the DDC model. Work on identifying the regulatory elements involved has so far been limited to those responsible for divergent expression within the well-studied Hox genes. Santini <it>et al</it>. <abbrgrp><abbr bid="B22">22</abbr></abbrgrp>, through comparison to the single tetrapod Hox cluster, identified a number of conserved elements in fish-specific Hox clusters. These appeared to be partitioned between clusters, suggesting they may be responsible for their divergent expression. In addition, the zebrafish <it>hoxb1a </it>and <it>hoxb1b </it>genes, co-orthologs of the <it>HOXB1 </it>gene in mammals and birds, were found to exhibit complementary degeneration of two <it>cis</it>-regulatory elements identified upstream and downstream of the gene, consistent with the DDC model <abbrgrp><abbr bid="B23">23</abbr></abbrgrp>. Similarly, Postlethwait <it>et al</it>. <abbrgrp><abbr bid="B24">24</abbr></abbrgrp> carried out a comparative genomic analysis of the regions surrounding two zebrafish co-orthologs, <it>eng2a </it>and <it>eng2b</it>, against the single human ortholog <it>EN2 </it>and found one conserved non-coding element partitioned in each copy, together with a number of elements conserved in both. Both co-orthologs have overlapping expression in the midbrain-hindbrain border and jaw muscles, but <it>eng2a </it>is expressed in the somites and <it>eng2b </it>is expressed in the anterior hindbrain (both of which are expression domains found in the single mammalian ortholog). Hence, according to the DDC model, they hypothesized that sequences conserved in both co-orthologs represent regulatory elements responsible for overlapping expression domains, whilst conserved sequences specific to each gene are candidates for regulatory elements that drive expression to domains present in the single mammalian ortholog but now partitioned between co-orthologs. Despite these isolated examples, evidence for the DDC model, by way of identifying the regulatory elements responsible, remains limited.</p>
         <p>Comparison of non-coding genomic sequence across extreme evolutionary distances such as that between fish and mammals to identify regions that remain conserved has proved powerful in identifying sequences likely to be vertebrate-specific distal CRMs (see <abbrgrp><abbr bid="B25">25</abbr></abbrgrp> for a review). <it>Fugu</it>-mammal conserved non-coding elements (CNEs), identified genome-wide, cluster almost exclusively in the vicinity of genes implicated in transcriptional regulation and early development (termed <it>trans-dev </it>genes) with little or no conservation in non-coding sequence outside of these regions; a finding confirmed by a number of recent studies <abbrgrp><abbr bid="B25">25</abbr><abbr bid="B26">26</abbr><abbr bid="B27">27</abbr><abbr bid="B28">28</abbr><abbr bid="B29">29</abbr><abbr bid="B30">30</abbr><abbr bid="B31">31</abbr></abbrgrp>. Furthermore, a majority of those CNEs tested <it>in vivo </it>drive expression of a reporter gene in a temporal and spatial specific manner that often overlaps the endogenous expression pattern of the nearby <it>trans-dev </it>gene, confirming this association and their likely role as critical CRMs for these genes <abbrgrp><abbr bid="B26">26</abbr><abbr bid="B29">29</abbr><abbr bid="B32">32</abbr><abbr bid="B33">33</abbr><abbr bid="B34">34</abbr><abbr bid="B35">35</abbr><abbr bid="B36">36</abbr></abbrgrp>. The tight association of CNEs with <it>trans-dev </it>genes is likely the result of the fundamental nature of developmental gene regulatory networks involved in correct spatial-temporal patterning of the vertebrate body plan <abbrgrp><abbr bid="B26">26</abbr><abbr bid="B37">37</abbr></abbrgrp>.</p>
         <p><it>Fugu</it>-mammal CNEs, enriched for putative CRMs, therefore provide an excellent class of sequences through which to test the DDC model further. In addition, a study has found that at least 6.6% of the <it>Fugu </it>genome is represented by fish-specific duplicate genes <abbrgrp><abbr bid="B15">15</abbr></abbrgrp>, making <it>Fugu </it>an attractive genome in which to identify and analyze regulatory elements involved in subfunctionalization of fish co-orthologs. Transcription factors and genes involved in development and cellular differentiation appear to be overrepresented within duplicated genes in fish genomes <abbrgrp><abbr bid="B38">38</abbr></abbrgrp>, improving the chances of identifying suitable candidates. Here, by taking an approach similar to Postlethwait <it>et al</it>. <abbrgrp><abbr bid="B24">24</abbr></abbrgrp>, we carried out alignments of genomic sequence around seven pairs of <it>Fugu </it>developmental co-orthologs against a number of single mammalian orthologous regions in order to investigate whether differential presence of conserved elements between co-orthologs is consistent with the DDC model of regulatory subfunctionalization.</p>
      </sec>
      <sec>
         <st>
            <p>Results</p>
         </st>
         <sec>
            <st>
               <p>Identification of co-orthologs in the <it>Fugu </it>genome</p>
            </st>
            <p>Studies into fish-specific duplicated genes have identified a number of examples in the <it>Fugu </it>genome (for example, <abbrgrp><abbr bid="B15">15</abbr><abbr bid="B39">39</abbr></abbrgrp>). As with most genes in general, few of these <it>Fugu </it>specific duplicates have CNEs in their vicinity. Suitable gene candidates for study of CNE evolution between teleost-specific gene paralogs were initially identified using 2,330 CNEs derived from a whole-genome comparison of the non-coding portions of the human and <it>Fugu </it>genome <abbrgrp><abbr bid="B29">29</abbr></abbrgrp>. CNE clusters that mapped to the vicinity of a single human genomic region but were derived from two non-contiguous <it>Fugu </it>scaffolds were considered further. We selected seven genomic regions in human that fitted this criterion, each containing clusters of CNEs in the vicinity of a single gene implicated in developmental regulation: <it>BCL11A </it>(transcription factor B-cell lymphoma/leukemia 11A), <it>EBF1 </it>(early B-cell factor 1), <it>FIGN </it>(fidgetin), <it>PAX2 </it>(paired box transcription factor Pax2), <it>SOX1 </it>(HMG box transcription factor Sox1), <it>UNC4.1 </it>(homeobox gene Unc4.1) and <it>ZNF503 </it>(zinc-finger gene Znf503). Some of these genes have relatively well characterized roles in early development, such as <it>PAX2 </it>(which plays critical roles in eye, ear, central nervous system and urogenital tract development <abbrgrp><abbr bid="B40">40</abbr><abbr bid="B41">41</abbr><abbr bid="B42">42</abbr></abbrgrp>, <it>SOX1 </it>(involved in neural and lens development <abbrgrp><abbr bid="B43">43</abbr><abbr bid="B44">44</abbr></abbrgrp>, <it>BCL11A </it>(thought to play important roles in leukaemogenesis and haematopoiesis <abbrgrp><abbr bid="B45">45</abbr></abbrgrp>) and <it>EBF1 </it>(important for B-cell, neuronal and adipocyte development <abbrgrp><abbr bid="B46">46</abbr><abbr bid="B47">47</abbr></abbrgrp>. <it>FIGN</it>, <it>UNC4.1 </it>and <it>ZNF503 </it>are less well characterized, although studies of their orthologs in mouse or rat indicate important roles in retinal, skeletal and neuronal development <abbrgrp><abbr bid="B48">48</abbr><abbr bid="B49">49</abbr><abbr bid="B50">50</abbr><abbr bid="B51">51</abbr></abbrgrp>.</p>
            <p>For each CNE cluster region in the human genome, we identified homologs to the human <it>trans-dev </it>protein on each <it>Fugu </it>scaffold, suggesting the presence of co-orthologous genes. To confirm this, we carried out a phylogeny of these protein sequences together with tetrapod orthologs and all available co-orthologs from the zebrafish genome. In addition, two outgroups utilizing the closest in-paralog as well as an invertebrate ortholog were included in each alignment to help resolve the phylogeny (Figure <figr fid="F1">1</figr>). In all cases where a close paralog could be identified, the <it>Fugu </it>co-ortholog candidates branch with strong bootstrap values with tetrapod orthologs of the target <it>trans-dev </it>gene, rather than the closest paralog, confirming these genes are true co-orthologs. Furthermore, for all phylogenies, the <it>Fugu </it>and zebrafish/medaka sequences branch together after the split with tetrapods, confirming they derive from a fish-specific duplication event. In only one out of three cases (<it>pax2</it>) where two co-orthologous proteins could also be identified in zebrafish does each <it>Fugu </it>copy branch directly with each zebrafish copy, indicating their proteins have followed similar evolutionary paths (Figure <figr fid="F1">1d</figr>). In contrast, the other two cases (<it>sox1 </it>and <it>unc4.1</it>) exhibit a different topology in that both zebrafish co-orthologs are more similar to one of the <it>Fugu </it>co-orthologs than the other (although weak bootstrap values for the fish <it>unc4.1 </it>may suggest alternative phylogenies). This is most likely due to species-specific asymmetrical rates of evolution seen between many genes in teleost fish <abbrgrp><abbr bid="B52">52</abbr></abbrgrp>, as well as elevated rates of evolution in duplicated genes in general, and pufferfish in particular <abbrgrp><abbr bid="B38">38</abbr></abbrgrp>, which may have obscured the true phylogenies in these cases. The given names of the <it>Fugu </it>co-orthologs used in this study (see Materials and methods for more details on nomenclature), their location in the <it>Fugu </it>genome and protein sequence accession codes can be found in Table <tblr tid="T1">1</tblr>.</p>
            <fig id="F1">
               <title>
                  <p>Figure 1</p>
               </title>
               <caption>
                  <p>Phylogenies of seven <it>Fugu </it>co-orthologs</p>
               </caption>
               <text>
                  <p>Phylogenies of seven <it>Fugu </it>co-orthologs. <it>Fugu </it>(fr) co-ortholog protein sequences are highlighted by red boxes and named according to scaffold number they were located on (for example, frS86 = scaffold_86). Zebrafish (dr) or stickleback (ga) sequences are highlighted by green boxes and uncharacterized proteins named after the SwissProt ID or the chromosome they are located on. Bootstrap values are indicated at each node. Other tetrapod sequences included: human (hs), mouse (mm), rat (rn), dog (cf) and chicken (gg). Invertebrate outgroups are shaded orange and contain sequences from the following species: <it>Ciona intestinalis </it>(ci), <it>Drosophila melanogaster </it>(dm) and <it>Caenhoribditis elegans </it>(ce). Trees: <b>(a) </b><it>BCL11A </it>using the closest paralog <it>BCL11B </it>as a comparator. <b>(b) </b><it>EBF1 </it>using the closest paralog <it>EBF3 </it>as a comparator. <b>(c) </b><it>FIGN </it>using the closest paralog <it>FIGN1L </it>as a comparator. <b>(d) </b><it>PAX2 </it>using one of its two closest paralogs <it>PAX5 </it>as a comparator. <b>(e) </b><it>SOX1 </it>using its closest paralog <it>SOX3 </it>as a comparator. <b>(f) </b><it>UNC4.1 </it>has no known closely related paralogs. <b>(g) </b><it>ZNF503 </it>using its closest paralog <it>ZNF703 </it>as a comparator.</p>
               </text>
               <graphic file="gb-2007-8-4-r53-1"/>
            </fig>
            <tbl id="T1">
               <title>
                  <p>Table 1</p>
               </title>
               <caption>
                  <p>Co-ortholog nomenclature and genomic locations in the <it>Fugu genome</it></p>
               </caption>
               <tblbdy cols="6">
                  <r>
                     <c ca="left">
                        <p>Human gene*</p>
                     </c>
                     <c ca="left">
                        <p>Co-ortholog name<sup>&#8224;</sup></p>
                     </c>
                     <c ca="left">
                        <p><it>Fugu </it>scaffold (S) location (kb)<sup>&#8225;</sup></p>
                     </c>
                     <c ca="center">
                        <p>Length (kb)<sup>&#167;</sup></p>
                     </c>
                     <c ca="center">
                        <p>Prop 'N's (%)<sup>&#182;</sup></p>
                     </c>
                     <c ca="left">
                        <p><it>Fugu </it>protein accession code<sup>&#165;</sup></p>
                     </c>
                  </r>
                  <r>
                     <c cspan="6">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <it>BCL11A</it>
                        </p>
                     </c>
                     <c ca="left">
                        <p>
                           <it>bcl11a.1</it>
                        </p>
                     </c>
                     <c ca="left">
                        <p><b>S113</b>: 140.8-518.9</p>
                     </c>
                     <c ca="center">
                        <p>378.1</p>
                     </c>
                     <c ca="center">
                        <p>2.98</p>
                     </c>
                     <c ca="left">
                        <p>NEWSINFRUP00000142044</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>
                           <it>bcl11a.2</it>
                        </p>
                     </c>
                     <c ca="left">
                        <p><b>S62</b>: 603.7-740.4</p>
                     </c>
                     <c ca="center">
                        <p>136.7</p>
                     </c>
                     <c ca="center">
                        <p>0.18</p>
                     </c>
                     <c ca="left">
                        <p>NEWSINFRUP00000144873</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <it>EBF1</it>
                        </p>
                     </c>
                     <c ca="left">
                        <p>
                           <it>ebf1.1</it>
                        </p>
                     </c>
                     <c ca="left">
                        <p><b>S97</b>: 400.4-483.3</p>
                     </c>
                     <c ca="center">
                        <p>82.9</p>
                     </c>
                     <c ca="center">
                        <p>0.82</p>
                     </c>
                     <c ca="left">
                        <p>NEWSINFRUP00000127762</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>
                           <it>ebf1.2</it>
                        </p>
                     </c>
                     <c ca="left">
                        <p><b>S71</b>: 999.3-1,091.7</p>
                     </c>
                     <c ca="center">
                        <p>92.4</p>
                     </c>
                     <c ca="center">
                        <p>1.90</p>
                     </c>
                     <c ca="left">
                        <p>NEWSINFRUP00000148373</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <it>FIGN</it>
                        </p>
                     </c>
                     <c ca="left">
                        <p>
                           <it>fign.1</it>
                        </p>
                     </c>
                     <c ca="left">
                        <p><b>S36</b>: 382.6-486.8</p>
                     </c>
                     <c ca="center">
                        <p>104.2</p>
                     </c>
                     <c ca="center">
                        <p>0.16</p>
                     </c>
                     <c ca="left">
                        <p>NEWSINFRUP00000153680</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>
                           <it>fign.2</it>
                        </p>
                     </c>
                     <c ca="left">
                        <p><b>S46</b>: 126.9-219.9</p>
                     </c>
                     <c ca="center">
                        <p>93</p>
                     </c>
                     <c ca="center">
                        <p>0.39</p>
                     </c>
                     <c ca="left">
                        <p>NEWSINFRUP00000177971</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <it>PAX2</it>
                        </p>
                     </c>
                     <c ca="left">
                        <p>
                           <it>pax2.1</it>
                        </p>
                     </c>
                     <c ca="left">
                        <p><b>S86</b>: 541.7-669.8</p>
                     </c>
                     <c ca="center">
                        <p>128.1</p>
                     </c>
                     <c ca="center">
                        <p>0.29</p>
                     </c>
                     <c ca="left">
                        <p>-</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>
                           <it>pax2.2</it>
                        </p>
                     </c>
                     <c ca="left">
                        <p><b>S59</b>: 768.9-898.3</p>
                     </c>
                     <c ca="center">
                        <p>132.7</p>
                     </c>
                     <c ca="center">
                        <p>3.59</p>
                     </c>
                     <c ca="left">
                        <p>-</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <it>SOX1</it>
                        </p>
                     </c>
                     <c ca="left">
                        <p>
                           <it>sox1.1</it>
                        </p>
                     </c>
                     <c ca="left">
                        <p><b>S42</b>: 1,020-1,105</p>
                     </c>
                     <c ca="center">
                        <p>85</p>
                     </c>
                     <c ca="center">
                        <p>1.49</p>
                     </c>
                     <c ca="left">
                        <p>[Swiss-Prot: Q6WNU3_FUGRU]</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>
                           <it>sox1.2</it>
                        </p>
                     </c>
                     <c ca="left">
                        <p><b>S313</b>: 107.2-174.9</p>
                     </c>
                     <c ca="center">
                        <p>67.7</p>
                     </c>
                     <c ca="center">
                        <p>8.9</p>
                     </c>
                     <c ca="left">
                        <p>[Swiss-Prot: Q6WNU2_FUGRU]</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <it>UNC4.1</it>
                        </p>
                     </c>
                     <c ca="left">
                        <p>
                           <it>unc4.1.1</it>
                        </p>
                     </c>
                     <c ca="left">
                        <p><b>S15</b>: 761.1-825.5</p>
                     </c>
                     <c ca="center">
                        <p>61</p>
                     </c>
                     <c ca="center">
                        <p>0.32</p>
                     </c>
                     <c ca="left">
                        <p>NEWSINFRUP00000154395</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>
                           <it>unc4.1.2</it>
                        </p>
                     </c>
                     <c ca="left">
                        <p><b>S40</b>: 1,435-1,537</p>
                     </c>
                     <c ca="center">
                        <p>102</p>
                     </c>
                     <c ca="center">
                        <p>0.96</p>
                     </c>
                     <c ca="left">
                        <p>NEWSINFRUG00000161008</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <it>ZNF503</it>
                        </p>
                     </c>
                     <c ca="left">
                        <p>
                           <it>znf503.1</it>
                        </p>
                     </c>
                     <c ca="left">
                        <p><b>S86</b>: 7-220</p>
                     </c>
                     <c ca="center">
                        <p>213</p>
                     </c>
                     <c ca="center">
                        <p>3.64</p>
                     </c>
                     <c ca="left">
                        <p>NEWSINFRUP00000181530</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>
                           <it>znf503.2</it>
                        </p>
                     </c>
                     <c ca="left">
                        <p><b>S59</b>, <b>S29 </b>(all)</p>
                     </c>
                     <c ca="center">
                        <p>148.5</p>
                     </c>
                     <c ca="center">
                        <p>3.22</p>
                     </c>
                     <c ca="left">
                        <p>NEWSINFRUP00000181454</p>
                     </c>
                  </r>
               </tblbdy>
               <tblfn>
                  <p>*Name of human gene ortholog. <sup>&#8224;</sup>Nomenclature of novel <it>Fugu </it>co-orthologs. <sup>&#8225;</sup>Location and extent of <it>Fugu </it>genomic scaffold used in multiple alignment. <sup>&#167;</sup>Length of <it>Fugu </it>genomic region used in multiple alignment. <sup>&#182;</sup>Proportion of <it>Fugu </it>genomic region that is made up of unfinished sequence (that is, runs of 'N's). <sup>&#165;</sup>The protein accession code for each co-ortholog. These were derived either from Ensembl (v40.4b) or from SwissProt. Protein sequences for <it>pax2.1 </it>and <it>pax2.2 </it>were incomplete in both Ensembl and SwissProt and were reconstructed using alignments of full-length amino acid sequences from other species.</p>
               </tblfn>
            </tbl>
         </sec>
         <sec>
            <st>
               <p>CNE distribution and changes in genomic environment around <it>Fugu </it>co-orthologs</p>
            </st>
            <p>CNEs were independently identified within each <it>Fugu </it>co-orthologous region by carrying out a combination of multiple and pairwise alignment with the same orthologous sequence from human, mouse and rat (the entire dataset from this study can be accessed and queried through the web-based CONDOR database <abbrgrp><abbr bid="B53">53</abbr></abbrgrp>). The regions in which CNEs were located for each co-ortholog together with surrounding gene environment can be seen in Figure <figr fid="F2">2</figr>.</p>
            <fig id="F2">
               <title>
                  <p>Figure 2</p>
               </title>
               <caption>
                  <p>Genomic environment around <it>Fugu </it>co-orthologs in comparison to the human ortholog</p>
               </caption>
               <text>
                  <p>Genomic environment around <it>Fugu </it>co-orthologs in comparison to the human ortholog. Diagrammatic representation of the genomic environment around <it>Fugu </it>co-orthologs and human orthologs of: <b>(a) </b><it>BCL11A</it>, <b>(b) </b><it>EBF1</it>, <b>(c) </b><it>FIGN</it>, <b>(d) </b><it>PAX2</it>, <b>(e) </b><it>SOX1</it>, <b>(f) </b><it>UNC4.1 </it>and <b>(g) </b><it>ZNF503</it>. For each gene, the top two lines represent the genic environment around each of the <it>Fugu </it>co-orthologs whilst the third line represents the genic environment around the human ortholog. Regions are not drawn to scale and are representative only. Human chromosome locations and <it>Fugu </it>scaffold IDs are stated to the left of each graphic. <it>Fugu </it>scaffold IDs can be cross-referenced for their exact location through Table 1. All annotation was retrieved from Ensembl <it>Fugu </it>(v36.4) and Human (v.36.35i). Only genes that are conserved in both <it>Fugu </it>and human are shown. Reference <it>trans-dev </it>genes are colored in red and are always orientated in 5'&#8594;3' orientation. Surrounding genes in <it>Fugu </it>are marked in blue and in human in green. The names of neighboring <it>Fugu </it>homologs that share conserved synteny with human (but not necessarily the same relative order or orientation) are highlighted in an orange box. Genes orientated in the same direction as the reference <it>trans-dev </it>gene are located above the line and those orientated in the opposite direction are below the line. Yellow triangles represent the positions of the furthest CNEs upstream and downstream in each genomic sequence and delineate the region in which CNEs were identified.</p>
               </text>
               <graphic file="gb-2007-8-4-r53-2"/>
            </fig>
            <p>All but one of the CNE regions in human are located in gene-poor regions termed 'gene deserts' that flank or surround the <it>trans-dev </it>gene and are characteristic of regions thought to contain large numbers of <it>cis</it>-regulatory elements <abbrgrp><abbr bid="B30">30</abbr></abbrgrp>. These gene deserts appear to have been conserved to some degree in both <it>Fugu </it>copies (albeit in a highly compact form). For example, a large gene desert of approximately 2.2 Mb is located downstream of <it>BCL11A </it>up to the ubiquitin ligase gene <it>FANCL </it>in human, and similar (compacted) versions of this gene desert are present in both <it>Fugu </it>regions, although downstream of <it>bcl11a.2 </it>it is almost a quarter of the size compared to the same region in <it>bcl11a.1 </it>(98 kb versus 380 kb). In the majority of regions under study (five out of seven), CNEs extend purely within these large intergenic regions directly flanking or within the introns of the <it>trans-dev </it>gene. In those regions in which CNEs extend beyond or within the genes neighboring the <it>trans-dev </it>gene (that is, <it>bcl11a.1</it>, <it>znf503.1 </it>and <it>znf503.2</it>) the gene order and orientation between <it>Fugu </it>and human has remained largely conserved, spanning three to five genes, something that is relatively rare within the <it>Fugu </it>genome <abbrgrp><abbr bid="B54">54</abbr><abbr bid="B55">55</abbr></abbrgrp>. This may be due to functional constraints on these regions whereby it is necessary to maintain the CRM and associated gene in <it>cis </it><abbrgrp><abbr bid="B34">34</abbr><abbr bid="B56">56</abbr></abbrgrp>. For the remaining co-orthologous regions the degree of synteny varies widely. For instance, neither <it>Fugu pax2 </it>region has conserved gene order with the human genome. Two orthologs of <it>NDUFB8 </it>and <it>HIF1AN </it>(upstream of human <it>PAX2</it>) are partitioned and rearranged so that <it>hif1an </it>is downstream of <it>pax2.1 </it>and <it>ndufb8 </it>is downstream of <it>pax2.2 </it>(Figure <figr fid="F2">2</figr>).</p>
            <p>The preservation of 98.5% of the CNEs (796/811) as well as both <it>trans-dev </it>genes in the same orientation and order along the sequence between human and <it>Fugu</it>, in contrast to the rearrangement of surrounding genes, confirms the likelihood that the CNEs and <it>trans-dev </it>genes identified are associated with each other.</p>
         </sec>
         <sec>
            <st>
               <p>Pattern of CNE retention/partitioning between co-orthologs</p>
            </st>
            <p>The DDC model for the retention of gene duplicates over evolution states that following duplication, genes undergo complementary degenerative loss of subfunctions or, on the regulatory level, expression domains. Based on the assumption that CNEs represent putative autonomous CRMs that control gene expression to one or more specific expression domains, we would predict that this process of regulatory subfunctionalization would involve the degeneration or loss of these elements between gene duplicates so that the ancestral CRMs were to some degree partitioned between the two genes. We identified 811 CNEs in total for all 14 regions in <it>Fugu </it>with lengths ranging from 30-562 bp (mean = 117 bp, median = 85 bp) and human-<it>Fugu </it>percent identities ranging from 60-94% (mean = 74%). CNEs from each co-ortholog were defined as 'overlapping' if there was conservation between them to at least part of the same single sequence in human. CNEs that were conserved between human and only one <it>Fugu </it>co-ortholog with no significant overlap to CNEs in the counterpart co-ortholog were defined as 'distinct'. Figure <figr fid="F3">3</figr> illustrates the definition of overlapping and distinct CNEs identified in a multiple alignment between <it>Fugu </it>regions around <it>pax2.1 </it>and <it>pax2.2</it>, against the reference human <it>PAX2 </it>region.</p>
            <fig id="F3">
               <title>
                  <p>Figure 3</p>
               </title>
               <caption>
                  <p>VISTA plot of an MLAGAN alignment of orthologous regions surrounding two <it>pax2 </it>co-orthologs in <it>Fugu </it>(Fr) and <it>Pax2 </it>in chicken (Gg), rat (Rn) and human</p>
               </caption>
               <text>
                  <p>VISTA plot of an MLAGAN alignment of orthologous regions surrounding two <it>pax2 </it>co-orthologs in <it>Fugu </it>(Fr) and <it>Pax2 </it>in chicken (Gg), rat (Rn) and human. The baseline is 268 kb of human sequence. Conservation between human and each sequence is shown as a peak. Peaks that represent conservation in a non-coding region of at least 65% over 40 bp are shaded pink with coding exons shaded purple and peaks located within untranslated regions shaded light-blue. All CNEs conserved in at least one of the <it>Fugu </it>co-orthologs are color-coded. CNEs in both <it>Fugu </it>co-orthologs that overlap the same region in human are shaded yellow while CNEs that are 'distinct' (or conserved solely) in <it>pax2.1 </it>are shaded red and CNEs distinct to <it>pax2.2 </it>are shaded green. Peaks marked with a double-headed arrow are conserved in <it>Fugu </it>in the opposite orientation (and therefore do not show up in the VISTA plot). A number of the CNEs around <it>PAX2 </it>are also duplicated CNEs (dCNEs) that are located elsewhere in the genome in the vicinity of <it>PAX2 </it>paralogs. CNEs marked with an orange box have another dCNE family member in the vicinity of <it>PAX5 </it>and the CNE marked with a blue box has a dCNE family member conserved upstream of <it>PAX8</it>.</p>
               </text>
               <graphic file="gb-2007-8-4-r53-3"/>
            </fig>
            <p>Similar to other <it>trans-dev </it>gene regions identified previously (for example, <abbrgrp><abbr bid="B26">26</abbr></abbrgrp>), the co-orthologs under study have highly variable numbers of CNEs conserved in their vicinity, ranging from 11 CNEs in <it>sox1.2 </it>to 156 in <it>znf503.1 </it>(Figure <figr fid="F4">4</figr>). Comparison of the overall number of CNEs conserved between co-orthologous copies revealed three sets, <it>bcl11a</it>.<it>1/2</it>, <it>ebf1.1/2 </it>and <it>znf503.1/.2</it>, that have notably different overall numbers of CNEs located in their vicinity, indicating a large-scale loss of elements in one co-ortholog compared to its counterpart since duplication (Figure <figr fid="F4">4</figr>). In the cases of <it>bcl11a.1/2 </it>and <it>znf503.1/2</it>, this large-scale asymmetrical loss of elements in one co-ortholog copy correlates to a large decrease in genomic sequence within the same region (Additional data file 2). Many of the co-orthologs have also undergone substantial partitioning of elements, as indicated by the large proportion of the identified CNEs classified as 'distinct' in each co-ortholog. For example, <it>fign.1 </it>and <it>fign.2 </it>have a similar number of CNEs in their vicinity (47 and 50, respectively) but 42% and 56% of these CNEs, respectively, are distinct to each co-ortholog. The extent of distinct CNEs as a proportion of total CNEs differs significantly between sets of co-orthologs, ranging from 24.5% (13/53) in <it>pax2.1 </it>to 83% (34/41) in <it>ebf1.1 </it>(Figure <figr fid="F4">4</figr>). For co-orthologs of <it>BCL11A </it>and <it>EBF1 </it>the majority of CNEs in both genes are distinct. Only in co-orthologs of <it>PAX2 </it>are the majority of CNEs in both genes found to be overlapping (Figures <figr fid="F3">3</figr> and <figr fid="F4">4</figr>), suggesting a high level of retention of regulatory domains in both genes since duplication. In the majority of gene pairs, namely co-orthologs of <it>FIGN</it>, <it>SOX1</it>, <it>UNC4.1 </it>and <it>ZNF503</it>, one copy has the majority of its CNEs as distinct while the other has a majority of its CNEs overlapping with that of its counterpart co-ortholog, suggesting an asymmetrical rate of element partition.</p>
            <fig id="F4">
               <title>
                  <p>Figure 4</p>
               </title>
               <caption>
                  <p>Proportion of CNEs around each <it>Fugu </it>co-ortholog that overlap or are distinct to sequences in mammals compared to CNEs identified in its counterpart co-ortholog</p>
               </caption>
               <text>
                  <p>Proportion of CNEs around each <it>Fugu </it>co-ortholog that overlap or are distinct to sequences in mammals compared to CNEs identified in its counterpart co-ortholog. Each bar represents the total number of CNEs identified around each co-ortholog with a proportion of that total colored as overlapping (light purple) or distinct (maroon) CNEs.</p>
               </text>
               <graphic file="gb-2007-8-4-r53-4"/>
            </fig>
            <p>The accuracy of these results depends heavily on ensuring that the loss of elements in one co-ortholog is the result of subfunctionalization rather than lack of sequence coverage in the genomic sequence. The proportion of 'N's (sections of unfinished sequence) within each <it>Fugu </it>genomic sequence can be seen in Table <tblr tid="T1">1</tblr>. We found that only one of the gene regions, <it>sox1.2</it>, contains a significant proportion of unfinished sequence (8.9%), suggesting some of the CNEs defined as 'distinct' in <it>sox1.1 </it>may have overlapping counterparts in <it>sox1.2</it>. However, closer examination of the positioning of the unfinished sequence reveals that the vast majority occurs in a region easily defined by two flanking overlapping CNEs that contains just a single distinct CNE in its counterpart co-ortholog. The region in <it>sox1.2 </it>potentially containing counterparts to most of the distinct CNEs in <it>sox1.1 </it>contains less than 3% unfinished sequence, suggesting most, if not all, of these distinct CNEs are defined correctly. Without 100% finished sequence in all cases it is, of course, possible that a small proportion of the CNEs identified as distinct in these co-orthologs may have an overlapping counterpart within unfinished sequence, but given the high levels of finished sequence in most of the gene regions, this is unlikely to account for a significant number.</p>
         </sec>
         <sec>
            <st>
               <p>Evolution of overlapping CNEs since duplication</p>
            </st>
            <p>Overlapping CNEs comprise a large proportion and, in some cases, the majority of CNEs identified around many of the gene pairs and have, therefore, remained to some extent under positive selection in both co-orthologs. The distribution of lengths and percent identities for 381 overlapping CNEs versus 430 distinct CNEs is significantly different for both lengths (<it>p </it>&lt; 1 &#215; 10<sup>-16</sup>) and percent identities (<it>p </it>= 1.1<sup>-8</sup>). Overlapping CNEs have significantly higher average lengths (mean = 149.6 bp, median = 116.1 bp) than distinct CNEs (mean = 87.6 bp, median = 62 bp) as well as slightly higher percent-identities (mean = 75.2% and median = 75% for overlapping versus mean = 72.4% and median = 71.7% for distinct). Only 4 of the distinct CNEs overlap to some degree but by less than the arbitrary 20 bp cut-off required for CNEs to be defined as overlapping. Removing these leaves the mean lengths and percent-identities virtually unchanged, confirming that the cut-off did not significantly bias the distribution of distinct elements towards smaller elements.</p>
            <p>We studied two aspects to gauge evolutionary changes occurring in these elements since duplication: changes in element length and changes in substitution rate between overlapping CNEs in <it>Fugu</it>.</p>
         </sec>
         <sec>
            <st>
               <p>CNE length</p>
            </st>
            <p>A total of 182 pairs of overlapping CNEs were identified across all co-ortholog pairs with a one-to-one relationship. The length of the overlap in the human sequence between co-orthologous CNEs ranged from 24-460 bp (mean = 107.5 bp &#177; 2.27 standard error of the mean). For each overlapping pair, we calculated the proportion of the overlapping sequence as a function of the full length <it>Fugu</it>-human conserved sequence in each co-ortholog. We found 62% of the pairs to have undergone significant degeneration in element length in one of the copies compared to its counterpart (Figures <figr fid="F5">5</figr> and <figr fid="F6">6</figr>); 30% of pairs overlapped over the majority of both elements, suggesting little evolution of element length since duplication, and approximately 8% have undergone a significant level of degeneration in element length in both copies at their edges. These results suggest the process of subfunctionalization may also be occurring, at least in some of these cases, through the partial loss of function in both copies, allowing gene preservation through quantitative complementation (as suggested in <abbrgrp><abbr bid="B7">7</abbr></abbrgrp>). It is also possible that sequence loss could causes changes in module function through the change in binding site combinations present. In genes such as <it>pax2.1 </it>and <it>pax2.2 </it>that have the majority of their CNEs overlapping in both genes, this presents an additional mechanism by which both copies may be preserved. In addition to overlapping CNEs that have undergone evolution at their edges, 29 overlapping CNEs have undergone evolution at the centre of the element, essentially creating a split element (that is, a CNE in one co-ortholog overlaps two or more CNEs from the other co-ortholog).</p>
            <fig id="F5">
               <title>
                  <p>Figure 5</p>
               </title>
               <caption>
                  <p>Proportion of each CNE sequence that overlaps the counterpart co-ortholog CNE</p>
               </caption>
               <text>
                  <p>Proportion of each CNE sequence that overlaps the counterpart co-ortholog CNE. Main graph: for each overlapping pair of co-orthologous CNEs (involving just two sequences), the proportion of the full length of each CNE (P1-P2) made up by the overlap was calculated using the human sequence as the reference. The larger of the two proportions was always plotted as P1 to simplify analysis. Inset bar chart: summary of the number of overlapping CNE pairs falling into three main proportion categories: P1 &#8805; 0.8, P2 &#8805; 0.8 - pairs that overlapped over the majority of both elements, suggesting little evolution of element length since duplication; P1 &#8805; 0.8, P2 &lt; 0.8 - pairs that have undergone significant degeneration in element length in one of the copies compared to its counterpart; P1 &lt; 0.8, P2 &lt; 0.8 - pairs that have undergone a level of degeneration in element length in both copies at their edges.</p>
               </text>
               <graphic file="gb-2007-8-4-r53-5"/>
            </fig>
            <fig id="F6">
               <title>
                  <p>Figure 6</p>
               </title>
               <caption>
                  <p>Significant change in element length and substitution rate in overlapping CNEs upstream of <it>unc4.1.1 </it>and <it>unc4.1.2</it></p>
               </caption>
               <text>
                  <p>Significant change in element length and substitution rate in overlapping CNEs upstream of <it>unc4.1.1 </it>and <it>unc4.1.2</it>. <b>(a) </b>CNEs (filled blue boxes) were identified around each <it>Fugu </it>co-ortholog <it>unc4.1.1 </it>(A1, top) and <it>unc4.1.2 </it>(A2, bottom) (gene exons are shown in the coding sequence (CDS) track as filled red boxes). The scale at the top represents positions along the <it>Fugu </it>sequence used in the multiple alignment. Two CNEs, highlighted in pink boxes, one upstream of <it>Fugu unc4.1.1 </it>(CRCNEAC00031954 [53], referred to as CNE_A1) and one upstream of <it>unc4.1.2 </it>(CRCNEAC00032205 [53], referred to as CNE_A2) are conserved to part of the same sequence in human upstream of <it>UNC4.1</it>. The overlap region is 126 bp in length and encompasses all of the CNE_A2 but only 35% of CNE_A1 (which is 360 bp long), indicating a significant loss of element length in CNE_A2. <b>(b) </b>A relative rate test of the <it>Fugu </it>CNEs across the overlapping region using human as the outgroup reveals a highly significant number of independent substitutions (26) in CNE_A2 with no independent substitutions in CNE_A1 (<it>p </it>&lt; 0.001). This suggests CNE_A1 is likely to have retained the ancestral function while CNE_A2 may have evolved to have a different function.</p>
               </text>
               <graphic file="gb-2007-8-4-r53-6"/>
            </fig>
         </sec>
         <sec>
            <st>
               <p>CNE sequence evolution</p>
            </st>
            <p>Overlapping CNEs are conserved to the same human sequence across the length of the overlap. However, it is possible that elements have undergone differential evolution, with one element containing a significantly greater number of independent substitutions than the other, indicative of either subfunctionalization or neofunctionalization. To measure whether the sequence of one CNE has diverged faster than its counterpart, we used the Tajima relative rate test <abbrgrp><abbr bid="B57">57</abbr></abbrgrp> with the human sequence as the outgroup (or ancestral) sequence. The Tajima relative rate test measures the significance in the difference of independent substitutions in each sequence relative to the outgroup sequence using a chi-squared statistic (see Additional file 3 for the results of relative rate tests for all overlapping CNEs). The percentages of overlapping CNEs that show a statistically significant difference in substitution rate in one copy over another range from 17% in <it>sox1 </it>to 26% in <it>znf503 </it>(Table <tblr tid="T2">2</tblr>). One of the most significant examples within this set was found in a pair of CNEs upstream of co-orthologs of <it>UNC4.1 </it>and can be seen in Figure <figr fid="F6">6</figr>. These results suggest that a substantial number of the elements appear to have undergone an asymmetrical rate of evolution since duplication, something we would expect under the DDC model. Alternatively, if these changes were positively selected it may indicate a process of neofunctionalization whereby co-orthologs have evolved novel regulatory patterns to that of the ancestral copy.</p>
            <tbl id="T2">
               <title>
                  <p>Table 2</p>
               </title>
               <caption>
                  <p>Tajima relative rate tests of overlapping co-orthologous CNE in <it>Fugu</it></p>
               </caption>
               <tblbdy cols="6">
                  <r>
                     <c ca="left">
                        <p>Gene region*</p>
                     </c>
                     <c ca="center">
                        <p>No. of overlapping pairs<sup>&#8224;</sup></p>
                     </c>
                     <c ca="center">
                        <p>No. of CNE pairs with <it>p </it>> 0.05<sup>&#8225;</sup></p>
                     </c>
                     <c cspan="2" ca="center">
                        <p>No. of CNE pairs with <it>p </it>&#8804; 0.05<sup>&#167;</sup></p>
                     </c>
                     <c ca="center">
                        <p>% of CNE pairs with <it>p </it>&lt; 0.05<sup>&#182;</sup></p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c cspan="2">
                        <hr/>
                     </c>
                     <c>
                        <p/>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c ca="center">
                        <p>Co-ortholog 1</p>
                     </c>
                     <c ca="center">
                        <p>Co-ortholog 2</p>
                     </c>
                     <c>
                        <p/>
                     </c>
                  </r>
                  <r>
                     <c cspan="6">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <it>BCL11A</it>
                        </p>
                     </c>
                     <c ca="center">
                        <p>20</p>
                     </c>
                     <c ca="center">
                        <p>15</p>
                     </c>
                     <c ca="center">
                        <p>1</p>
                     </c>
                     <c ca="center">
                        <p>4</p>
                     </c>
                     <c ca="center">
                        <p>25</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <it>EBF1</it>
                        </p>
                     </c>
                     <c ca="center">
                        <p>7</p>
                     </c>
                     <c ca="center">
                        <p>5</p>
                     </c>
                     <c ca="center">
                        <p>1</p>
                     </c>
                     <c ca="center">
                        <p>1</p>
                     </c>
                     <c ca="center">
                        <p>21</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <it>FIGN</it>
                        </p>
                     </c>
                     <c ca="center">
                        <p>28</p>
                     </c>
                     <c ca="center">
                        <p>21</p>
                     </c>
                     <c ca="center">
                        <p>5</p>
                     </c>
                     <c ca="center">
                        <p>2</p>
                     </c>
                     <c ca="center">
                        <p>25</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <it>PAX2</it>
                        </p>
                     </c>
                     <c ca="center">
                        <p>43</p>
                     </c>
                     <c ca="center">
                        <p>34</p>
                     </c>
                     <c ca="center">
                        <p>4</p>
                     </c>
                     <c ca="center">
                        <p>5</p>
                     </c>
                     <c ca="center">
                        <p>21</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <it>SOX1</it>
                        </p>
                     </c>
                     <c ca="center">
                        <p>6</p>
                     </c>
                     <c ca="center">
                        <p>5</p>
                     </c>
                     <c ca="center">
                        <p>0</p>
                     </c>
                     <c ca="center">
                        <p>1</p>
                     </c>
                     <c ca="center">
                        <p>17</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <it>UNC4.1</it>
                        </p>
                     </c>
                     <c ca="center">
                        <p>20</p>
                     </c>
                     <c ca="center">
                        <p>15</p>
                     </c>
                     <c ca="center">
                        <p>0</p>
                     </c>
                     <c ca="center">
                        <p>5</p>
                     </c>
                     <c ca="center">
                        <p>25</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <it>ZNF503</it>
                        </p>
                     </c>
                     <c ca="center">
                        <p>84</p>
                     </c>
                     <c ca="center">
                        <p>62</p>
                     </c>
                     <c ca="center">
                        <p>5</p>
                     </c>
                     <c ca="center">
                        <p>17</p>
                     </c>
                     <c ca="center">
                        <p>26</p>
                     </c>
                  </r>
               </tblbdy>
               <tblfn>
                  <p>*Gene region. <sup>&#8224;</sup>Total number of overlapping CNEs within gene region. <sup>&#8225;</sup>Numbers of overlapping CNE pairs with no significant difference in substitution rates (that is, <it>p </it>values of > 0.05). <sup>&#167;</sup>The number of overlapping CNEs that exhibit a significant difference in substitution rate (that is, <it>p </it>value &#8804; 0.05) in the CNE sequence in the vicinity of one co-ortholog over that in the other. <sup>&#182;</sup>The percentage of overlapping CNEs with significantly different substitution rates in either co-ortholog as a proportion of the total number of overlapping CNEs.</p>
               </tblfn>
            </tbl>
         </sec>
         <sec>
            <st>
               <p>A history of duplications: some co-orthologous CNEs were duplicated in ancient events at the origin of vertebrates</p>
            </st>
            <p>In addition to being involved in a teleost-specific duplication event, a number of the CNEs identified around the <it>trans-dev </it>genes in this study have been previously retained from ancient duplications thought to have occurred at the origin of vertebrates. While the majority of CNEs are single copy in the human genome, a recent study identified 124 families of CNEs genome-wide that have more than one copy across all available vertebrate genomes and are referred to as 'duplicated CNEs' (dCNEs) <abbrgrp><abbr bid="B29">29</abbr></abbrgrp>. dCNEs are associated with nearby <it>trans-dev </it>paralogs and a number have been shown to act as enhancers that drive <it>in vivo </it>reporter-gene expression to similar domains <abbrgrp><abbr bid="B29">29</abbr></abbrgrp>. The absence of these sequences in non-vertebrate chordate genomes and their association with paralogs that arose from whole-genome duplication events at the origin of vertebrates <abbrgrp><abbr bid="B58">58</abbr></abbrgrp> places their origins sometime prior to this event more than 550 million years ago. The conservation of these elements over such extreme evolutionary distances suggests they play critical roles in the regulation of paralogs that have since undergone neofunctionalization. We found 30 non-redundant human CNEs (conserved to 52 co-orthologous CNEs in <it>Fugu</it>) to be dCNEs in the vicinity of one or more paralogs of the nearby <it>trans-dev </it>gene (Table <tblr tid="T3">3</tblr>). This further confirms the tight association of these CNEs with their nearby <it>trans-dev </it>genes as dCNEs resolve the CNE-gene association more clearly <abbrgrp><abbr bid="B59">59</abbr></abbrgrp>. These dCNEs were identified in five of the seven co-orthologous regions with some dCNEs associated with more than one paralog (for example, <it>PAX2 </it>associated dCNEs located in the vicinity of <it>PAX5 </it>and <it>PAX8</it>; Table <tblr tid="T3">3</tblr>; Figure <figr fid="F3">3</figr>). 80% of the co-ortholog CNEs identified as dCNEs (42/52) are conserved in both co-ortholog regions in <it>Fugu</it>, a two-fold enrichment (<it>p </it>&lt; 0.001) over the expected number given the overall proportions of overlapping and distinct elements in the CNE dataset.</p>
            <tbl id="T3">
               <title>
                  <p>Table 3</p>
               </title>
               <caption>
                  <p>Co-ortholog CNEs that are also conserved in the vicinity of <it>trans-dev </it>paralogs in the human genome</p>
               </caption>
               <tblbdy cols="4">
                  <r>
                     <c ca="left">
                        <p>Gene region</p>
                     </c>
                     <c ca="left">
                        <p>Co-ortholog 1</p>
                     </c>
                     <c ca="left">
                        <p>Co-ortholog 2</p>
                     </c>
                     <c ca="left">
                        <p>Gene paralog in the vicinity of the dCNE(s)</p>
                     </c>
                  </r>
                  <r>
                     <c cspan="4">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <it>BCL11A</it>
                        </p>
                     </c>
                     <c ca="left">
                        <p>CRCNE00002445</p>
                     </c>
                     <c ca="left">
                        <p>CRCNE00004614</p>
                     </c>
                     <c ca="left">
                        <p>
                           <it>BCLL1B</it>
                        </p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>CRCNE00002557</p>
                     </c>
                     <c ca="left">
                        <p>-</p>
                     </c>
                     <c>
                        <p/>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>CRCNE00002548</p>
                     </c>
                     <c ca="left">
                        <p>CRCNE00004648</p>
                     </c>
                     <c>
                        <p/>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>CRCNE00002544</p>
                     </c>
                     <c ca="left">
                        <p>CRCNE0004643</p>
                     </c>
                     <c>
                        <p/>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>CRCNE0004644</p>
                     </c>
                     <c>
                        <p/>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>CRCNE00002540</p>
                     </c>
                     <c ca="left">
                        <p>-</p>
                     </c>
                     <c>
                        <p/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <it>EBF1</it>
                        </p>
                     </c>
                     <c ca="left">
                        <p>CRCNE00010771</p>
                     </c>
                     <c ca="left">
                        <p>-</p>
                     </c>
                     <c ca="left">
                        <p>
                           <it>EFB3</it>
                        </p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>-</p>
                     </c>
                     <c ca="left">
                        <p>CRCNE00010818</p>
                     </c>
                     <c>
                        <p/>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>CRCNE00000027</p>
                     </c>
                     <c ca="left">
                        <p>CRCNE00010823</p>
                     </c>
                     <c>
                        <p/>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>CRCNE00010778</p>
                     </c>
                     <c ca="left">
                        <p>CRCNE00010827</p>
                     </c>
                     <c>
                        <p/>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>CRCNE00010787</p>
                     </c>
                     <c ca="left">
                        <p>-</p>
                     </c>
                     <c>
                        <p/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <it>EBF1</it>
                        </p>
                     </c>
                     <c ca="left">
                        <p>CRCNE00010772</p>
                     </c>
                     <c ca="left">
                        <p>CRCNE00010820</p>
                     </c>
                     <c ca="left">
                        <p>
                           <it>EBF1/2/3/4</it>
                        </p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <it>PAX2</it>
                        </p>
                     </c>
                     <c ca="left">
                        <p>CRCNE00000064</p>
                     </c>
                     <c ca="left">
                        <p>CRCNE00000133</p>
                     </c>
                     <c ca="left">
                        <p>
                           <it>PAX8</it>
                        </p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <it>PAX2</it>
                        </p>
                     </c>
                     <c ca="left">
                        <p>CRCNE00000071</p>
                     </c>
                     <c ca="left">
                        <p>CRCNE00000147</p>
                     </c>
                     <c ca="left">
                        <p>
                           <it>PAX5</it>
                        </p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>CRCNE00000090</p>
                     </c>
                     <c ca="left">
                        <p>CRCNE00000165</p>
                     </c>
                     <c>
                        <p/>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>CRCNE00000092</p>
                     </c>
                     <c ca="left">
                        <p>CRCNE00000167</p>
                     </c>
                     <c>
                        <p/>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>CRCNE00000099</p>
                     </c>
                     <c ca="left">
                        <p>CRCNE00000174</p>
                     </c>
                     <c>
                        <p/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <it>SOX1</it>
                        </p>
                     </c>
                     <c ca="left">
                        <p>-</p>
                     </c>
                     <c ca="left">
                        <p>CRCNE00001926</p>
                     </c>
                     <c ca="left">
                        <p>
                           <it>SOX2</it>
                        </p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <it>ZNF503</it>
                        </p>
                     </c>
                     <c ca="left">
                        <p>CRCNE00010112</p>
                     </c>
                     <c ca="left">
                        <p>CRCNE00004977</p>
                     </c>
                     <c ca="left">
                        <p>
                           <it>ZNF703</it>
                        </p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>CRCNE00010147</p>
                     </c>
                     <c ca="left">
                        <p>CRCNE00004994</p>
                     </c>
                     <c>
                        <p/>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>CRCNE00010126</p>
                     </c>
                     <c ca="left">
                        <p>CRCNE00005024</p>
                     </c>
                     <c>
                        <p/>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>CRCNE00010167</p>
                     </c>
                     <c ca="left">
                        <p>-</p>
                     </c>
                     <c>
                        <p/>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>CRCNE00010170</p>
                     </c>
                     <c ca="left">
                        <p>-</p>
                     </c>
                     <c>
                        <p/>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>CRCNE00010187</p>
                     </c>
                     <c ca="left">
                        <p>-</p>
                     </c>
                     <c>
                        <p/>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>CRCNE00010180</p>
                     </c>
                     <c ca="left">
                        <p>-</p>
                     </c>
                     <c>
                        <p/>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>CRCNE00010176</p>
                     </c>
                     <c ca="left">
                        <p>CRCNE00005013</p>
                     </c>
                     <c>
                        <p/>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>CRCNE00005015</p>
                     </c>
                     <c>
                        <p/>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>CRCNE00010165</p>
                     </c>
                     <c ca="left">
                        <p>CRCNE00005011</p>
                     </c>
                     <c>
                        <p/>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>CRCNE00010161</p>
                     </c>
                     <c ca="left">
                        <p>CRCNE00005008</p>
                     </c>
                     <c>
                        <p/>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>CRCNE00010046</p>
                     </c>
                     <c ca="left">
                        <p>CRCNE00004906</p>
                     </c>
                     <c>
                        <p/>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>CRCNE00010156</p>
                     </c>
                     <c ca="left">
                        <p>CRCNE00005003</p>
                     </c>
                     <c>
                        <p/>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>CRCNE00010120</p>
                     </c>
                     <c ca="left">
                        <p>CRCNE00004986</p>
                     </c>
                     <c>
                        <p/>
                     </c>
                  </r>
               </tblbdy>
               <tblfn>
                  <p>CNEs in each co-ortholog are referred to by their CONDOR database identifiers [53]. Each CNE was considered duplicated if the human sequence they are conserved to shows significant hit to a sequence elsewhere in the genome through BLAST. Any gene in the vicinity (&lt;1.5 Mb away) of the BLAST hit that is paralogous to genes within a window of the same size around the query CNE is shown in the final column.</p>
               </tblfn>
            </tbl>
         </sec>
      </sec>
      <sec>
         <st>
            <p>Discussion</p>
         </st>
         <p>Recent studies show there are a surprisingly large number of duplicated genes present in the genomes of all organisms that cannot be accounted for by the classic models of nonfunctionalization and neofunctionalization. The presence of large numbers of duplicated genes within the genomes of teleost fish, now widely presumed to have undergone a whole genome duplication event around 300-350 million years ago, provide an excellent opportunity for comparative studies to test the DDC model. Prior to the availability of large-scale genomic sequences, the ability to study regulatory subfunctionalization through identifying the regulatory elements responsible was limited due to a lack of appropriate identification strategies. The discovery of thousands of CNEs conserved across the vertebrate lineage, highly enriched for sequences likely to be distal <it>cis</it>-regulatory modules, allowed us to develop a strategy to begin to uncover this. We identified potential gene candidates that contain both CNEs in their vicinity and are likely to derive from fish-specific duplication events using data from the initial whole genome comparison of the <it>Fugu </it>and human genomes. CNEs that cluster in the same location in human but derive from two separate locations in the <it>Fugu </it>genome strongly indicate the presence of co-orthologous regions. We selected seven clusters of CNEs in the human genome, each in the vicinity of a single <it>trans-dev </it>gene that fulfilled these criteria. For each of these genes, we recreated a phylogeny using protein sequences identified in each <it>Fugu </it>region, confirming the genes are both orthologs (co-orthologs) of the single mammalian copy, and all topologies confirmed the genes derive from a duplication event following the split between ray-finned and lobe-finned fishes. This relationship was further confirmed by comparison of the genic environment around the <it>trans-dev </it>gene in both <it>Fugu </it>regions to that of the single region in human. Conserved gene order extends, in many cases, to one or more genes upstream and/or downstream of each co-ortholog, indicating a shared ancestral origin, although in several instances the neighboring genes have been partitioned between the co-orthologous regions, undergone rearrangement or have been lost.</p>
         <p>The process of subfunctionalization, as described in the DDC model, is defined as the fixation of complementary loss of subfunctions that result in the joint preservation of duplicate loci <abbrgrp><abbr bid="B7">7</abbr></abbrgrp>. For regulatory subfunctionalization, a subfunction may represent expression of a gene in a specific tissue, cell lineage or temporal stage. For genes with complex regulation these subfunctions are controlled by one or a combination of <it>cis</it>-regulatory modules. A proportion of these can be predicted through the comparative genomic approaches outlined in this study and for the purposes of this discussion are assumed to be represented by CNEs. Under the DDC model, subfunctionalization is thought to occur by two different routes: qualitative and quantitative <abbrgrp><abbr bid="B7">7</abbr></abbrgrp>. Under qualitative subfunctionalization one duplicate copy undergoes one or more complete loss-of-subfunction mutations with the second copy subsequently acquiring null-mutations for a different set of subfunctions. Thus, each copy is required to recapitulate the full set of ancestral subfunctions. Under this route, a conserved element representing an independent <it>cis</it>-regulatory module that undergoes a null-mutation in one of the gene copies will no longer be under selective constraint, and will be 'lost' (that is, will not be detectable by sequence conservation) through the accumulation of degenerative mutations over evolution. This process should, therefore, be evident by the effective partitioning of conserved elements between co-orthologs. In contrast, quantitative subfunctionalization is more subtle and results from the fixation of reduction-of-expression mutations in both duplicates <abbrgrp><abbr bid="B7">7</abbr></abbrgrp>. Here, both regulatory modules must be maintained in the genome once the summed activity for a particular subfunction in both copies has been reduced to the original level in the single ancestral gene.</p>
         <p>By comparing each <it>Fugu </it>co-ortholog with its single orthologous region in mammals, we attempted to identify those 'ancestral' <it>cis</it>-regulatory modules present in the mammalian copy that are retained in only one of the <it>Fugu </it>copies and those that have to some extent been retained in both copies. This approach is particularly appropriate for early developmental regulators for which function and regulation are likely to be highly constrained across all vertebrates and the mammalian gene represents as close as possible the ancestral pre-duplication state. The probability of the preservation of gene duplicates through subfunctionalization is also assumed to be higher in genes with complex regulation that contain large numbers of independently mutable CRMs <abbrgrp><abbr bid="B60">60</abbr></abbrgrp>, a view reinforced by the overrepresentation of genes involved in development and cellular differentiation found in fish-specific gene duplicates <abbrgrp><abbr bid="B38">38</abbr></abbrgrp>. As with <it>trans-dev </it>genes across the genome, overall numbers of CNEs between gene pairs differs substantially and is likely to reflect differences in regulatory complexity or the extent to which regulation in these genes has been conserved across vertebrate evolution. All seven pairs of co-orthologs contained a number of CNEs that have been partitioned into each co-ortholog along the lines of the qualitative subfunctionalization model. The extent to which CNEs have been partitioned between co-orthologs, however, varies widely. For some co-ortholog pairs, such as <it>bcl11a </it>and <it>ebf1</it>, the majority of CNEs appear to be completely partitioned between the two genes, whilst for other pairs, such as <it>pax2</it>, only a relatively small proportion is. In the DDC model, following initial subfunctionalization, the process of null-mutation fixation in persisting redundant subfunctions is thought to be random, leaving a roughly equal number of subfunctions in each gene copy. Although this appears to be true for the <it>fign</it>, <it>pax2 and unc4.1 </it>co-orthologs, partitioning in <it>bcl11a.1/.2 </it>and <it>ebf1.1/.2 </it>is highly asymmetrical. In both of these cases there is relatively little overlap between the complement of CNEs associated with each co-ortholog pair. It is possible, therefore, that the loss of some CNEs in one co-ortholog may have consequences for further loss of elements in that gene. CRMs may not all be functionally autonomous and may interact together to actuate their regulatory role <abbrgrp><abbr bid="B61">61</abbr></abbrgrp>. The degeneration of one or more integral CRMs from a co-ortholog could accelerate further degeneration of other CRMs that are functionally dependant on them. Under this scenario, a gene duplicate may undergo substantial loss of elements, possibly influencing further asymmetrical loss.</p>
         <p>In addition to CNEs that have undergone full partitioning between co-orthologs, some pairs have also retained a number of overlapping CNEs that have been preserved to some extent in both copies. For co-orthologs such as <it>pax2.1/pax2.2</it>, this type of CNE constitutes the majority of CNEs located around these genes, and is a common feature in most of the other co-ortholog pairs. Overlapping CNEs appear, in general, to be longer than distinct CNEs, although at this stage the relationship (if any) between element size/conservation and its functional importance/regulatory complexity is still unknown. While some of these elements have remained virtually identical in length since duplication, others have undergone major changes both at the edges and at the core of one of the elements, suggesting information loss in one or both copies. A significant proportion of overlapping CNEs have also undergone asymmetrical rates of substitution, suggesting one copy retained the ancestral function while the other was free to evolve, possibly to a novel function.</p>
         <p>What explanations could account for this high level of CNE retention observed between co-orthologs? The first is the possibility that some CNEs have undergone quantitative subfunctionalization. Here, degenerative mutations in both CRMs lead to a partial loss of subfunction in each element (such as a reduction in the level of expression in a specific tissue) rather than complete loss. Therefore, both elements must be maintained in the genome once the summed activity for a particular subfunction in both copies has been reduced to the original level of the ancestral gene <abbrgrp><abbr bid="B7">7</abbr><abbr bid="B60">60</abbr></abbrgrp>. Alternatively, some of these elements may function as silencers or insulators or play roles in chromatin remodeling, ensuring correct regulatory compartmentalization and control of both gene duplicates. Another explanation could be the possible inter-relations of <it>cis</it>-regulatory modules. As previously mentioned, there is a possibility that some CRMs are interrelated and act in concert to perform their function. It is possible, therefore, that the loss of a CRM critical for the function of other CRMs could lead to large-scale loss in one gene copy. For example, the partitioning of two CRMs that are functionally independent but both dependent on another CRM for correct function would lead to the retention of that critical CRM in both gene copies. Finally, it is possible that although both elements have retained general sequence identity, small nucleotide changes between the elements (such as those seen in asymmetrically evolving CNEs) may have substantial consequences for element function. Indeed in a recent pioneering study, T&#252;mpel <it>et al</it>. <abbrgrp><abbr bid="B62">62</abbr></abbrgrp> pinpointed subtle sequence changes within well-defined enhancers responsible for divergent expression of <it>hoxa2 </it>co-orthologs (<it>hoxa2</it>(<it>a</it>) and <it>hoxa2</it>(<it>b</it>)) in <it>Fugu</it>. Sequences of the enhancers responsible for full expression of <it>HOXA2 </it>in mice and chicken within the hindbrain were found to be generally conserved in both <it>Fugu </it>copies. Nevertheless, it was demonstrated that a small number of base-pair differences between <it>hoxa2</it>(<it>a</it>) and <it>hoxa2</it>(<it>b</it>) enhancers within several known transcription factor binding sites was sufficient to erase enhancer activity and was shown to be responsible for the lack of expression of <it>hoxa2</it>(<it>a</it>) within certain expression domains. However, the authors could not explain why the non-functional enhancers in <it>hoxa2</it>(<it>a</it>) had remained partially conserved, although they postulated they may have regulatory roles in expression domains not covered by the survey. This study highlights the power of correlating known expression differences between co-orthologs with comparative sequence analysis, especially with previous knowledge of the binding sites involved. It also highlights, as functional assays on more ancient duplicated CNEs have demonstrated <abbrgrp><abbr bid="B29">29</abbr></abbrgrp>, that sequence similarity may not always extend to functional similarity. Indeed, it is equally plausible that some of the sequence evolution seen between some overlapping CNEs is indicative of neofunctionalization. It would be of great interest for future studies to correlate any novel expression of teleost co-orthologs compared to other vertebrate homologs with changes in these elements. Finally, CNEs may represent several independent or overlapping CRMs and the loss of sequences within the CNE may be due to loss of just one of the CRMs, constituting a form of qualitative subfunctionalization</p>
         <p>The pattern of CNE retention and evolution in these <it>Fugu </it>co-orthologs is certainly consistent with both mechanisms of subfunctionalization inherent in the DDC model. However, the extent of the contribution of each mechanism to subfunctionalization is different for each gene pair. This could be a consequence of each co-ortholog pair having followed a different evolutionary path after duplication; each under a number of different selective pressures depending on their expression and/or function as well as the influence of stochastic evolutionary events following a relaxation of evolutionary constraint due to genetic redundancy. It is clear, therefore, that confirmation of regulatory subfunctionalization in these gene pairs will require both the characterization of expression patterns for both co-orthologs (through approaches such as <it>in situ </it>hybridization) and confirmation of the regulatory potential of their surrounding CNEs through rapid <it>in vivo </it>reporter-gene assays. Currently, due to the limitations of <it>Fugu </it>as an experimental model, none of the expression profiles for the genes in this study have been characterized, which could be used to assess the extent and type of regulatory change these gene duplicates have undergone. In the more commonly used zebrafish experimental model organism gene expression profiles of two gene-pairs from this study, <it>pax2 </it>and <it>sox1</it>, have been characterized. Expression patterns of <it>PAX2 </it>co-orthologs of <it>PAX2 </it>(<it>Pax2a </it>and <it>Pax2b</it>) <abbrgrp><abbr bid="B63">63</abbr></abbrgrp> are highly similar, although absence of <it>Pax2b </it>expression in the developing kidney as well as differences in temporal expression confirms they have undergone a level of regulatory differentiation. This appears to corroborate the pattern of element retention/partitioning seen in <it>Fugu pax2 </it>co-orthologs, where the majority of CNEs are largely conserved in both copies with a smaller number of elements partitioned between each gene. This suggests that, similar to their zebrafish homologs, they may have largely overlapping expression domains and have undergone a more subtle form of quantitative subfunctionalization through changes in their temporal expression. A recent survey of expression of the SOX B family of genes identified a level of regulatory differentiation between the zebrafish <it>sox1a </it>and <it>sox1b </it>co-orthologs <abbrgrp><abbr bid="B64">64</abbr></abbrgrp>. The main differences in expression are temporal (for example, <it>sox1a </it>expressed in the lens a number of hours before <it>sox1b</it>), although there are also spatial differences with <it>sox1a </it>expression initiated in hindbrain and forebrain whereas <it>sox1b </it>initiates only in the forebrain. The overall expression patterns of these co-orthologs correspond closely to <it>SOX1 </it>expression in other vertebrates, indicating that changes in their expression are due to subfunctionalization rather than neofunctionalization. Our study reveals that at least half of all CNEs identified around <it>sox1 </it>co-orthologs have been partitioned, indicative of a level of subfunctionalization, while only one of the overlapping CNEs has undergone a significant level of substitution; a possible reflection on the lack of neofunctionalization in these genes. As patterns of subfunctionalization are known to occur differently between fish species, it remains for further studies of the <it>pax2 </it>and <it>sox1 </it>co-orthologs in <it>Fugu </it>to discover whether expression differences are similar to those observed in zebrafish.</p>
         <p>The majority of regions in this study, in addition to containing CNEs derived from a teleost-specific duplication have counterparts elsewhere in the genome that derive from more ancient vertebrate-specific duplications, reflecting the complex duplication histories of their associated genes. The fact that most of these sequences are retained not only between co-orthologs (for example, <it>bcl11a.1 and bcl11a.2</it>) but also between out-paralogs (for example, <it>BCL11A </it>and <it>BCL11B</it>) spanning over half a billion years of evolution is an indicator of the potentially critical nature of these sequences to the regulation of these genes. In addition, this dataset provides many candidates for further functional studies on the evolution of these sequences and the implications of these changes on their neofunctionalized paralogs and subfunctionalized co-ortholog targets.</p>
         <p>The role the teleost-specific genome duplication has played in the evolution of this lineage remains unclear. It is now generally accepted that the genome duplication event(s) that occurred at the origin of vertebrates played a major role in species diversity and, in particular, the huge increase in vertebrate morphological complexity <abbrgrp><abbr bid="B65">65</abbr><abbr bid="B66">66</abbr></abbrgrp>. In contrast, the more recent teleost specific genome duplication does not appear to have had the same effect, with arguably less complexity in the teleost anterior-posterior axis than in tetrapods <abbrgrp><abbr bid="B5">5</abbr></abbrgrp>. Speciation in teleosts though is unmatched among descendants of other vertebrate lineages, with over 22,000 known species, making up over half of all extant vertebrates species <abbrgrp><abbr bid="B67">67</abbr></abbrgrp>. This has led to suggestions that the genome duplication event may be directly responsible <abbrgrp><abbr bid="B14">14</abbr><abbr bid="B68">68</abbr></abbrgrp>. Indeed, evidence presented in a review by Taylor <it>et al</it>. <abbrgrp><abbr bid="B13">13</abbr></abbrgrp> indicates that ployploidized members of the Salmon family and Catostomidae (sucker fish) exhibit higher degrees of speciation than members of the same family that remain diploid. Subfunctionalization has been proposed as a likely mechanism for this increased rate of speciation since differential resolution of subfunctions in multiple gene pairs would lead to reproductive incompatibility due to a reduction in hybrid fitness <abbrgrp><abbr bid="B69">69</abbr></abbrgrp>. Evidence for such lineage-specific subfunction partitioning has been demonstrated for a small number of genes (for example, divergent expression of <it>sox9 </it>in stickleback compared to zebrafish <abbrgrp><abbr bid="B18">18</abbr></abbrgrp>), but large-scale studies will be required to resolve the degree of subfunctionalization that took place before and after divergence within the teleost lineage. Furthermore, if lineage speciation is driven by differential subfunctionalization, we might expect the pattern of CRM evolution and partition/retention for the <it>Fugu </it>genes discussed here to be different to those in other fish species. The recent release of a number of divergent draft teleost genomes, including those of zebrafish, medaka and stickleback, should allow further studies in this direction. Furthermore, the approach and analysis used in this study can be extended for use in any situation where genomic regions surrounding duplicated genes can be compared to an orthologous region that has remained single copy. This may be particularly useful for inter-teleost comparisons, where co-ortholog genes have been differentially retained since the whole-genome duplication prior to the teleost radiation.</p>
      </sec>
      <sec>
         <st>
            <p>Conclusion</p>
         </st>
         <p>Regulatory subfunctionalization is considered to be a major mechanism for the retention of gene duplicates in the genome. This work provides the first large-scale identification and analysis of putative <it>cis</it>-regulatory elements through comparative genomics between duplicated genes using the <it>Fugu </it>genome as a model. Using seven pairs of fish-specific gene duplicates we showed that all pairs have undergone a level of element partition consistent with one of the main mechanisms proposed for regulatory subfunctionalization. In addition, the regulatory elements in this study may have undergone more subtle levels of subfunctionalization through differential loss of element content and asymmetrical rates of substitution. In addition to presenting this work as an analysis in its own right, the methods in this study can be extended to any similar study in which regions derived from an intra-genomic duplication can be compared to one or more related genomes in which the orthologous region has remained single-copy.</p>
      </sec>
      <sec>
         <st>
            <p>Materials and methods</p>
         </st>
         <sec>
            <st>
               <p>Identification of CNE-containing co-orthologous regions in the <it>Fugu </it>genome</p>
            </st>
            <p>An initial set of 2,330 CNEs with little or no evidence of transcription or RNA secondary structure were identified using a whole-genome comparison of the <it>Fugu </it>(assembly v4) and human genomes (assembly v.36) as described in <abbrgrp><abbr bid="B29">29</abbr></abbrgrp>. CNEs in the human genome were grouped into clusters so that each CNE was no more than 400 kb in distance from another CNE in the cluster. Clusters of CNEs in the human genome made up of hits from two non-contiguous <it>Fugu </it>scaffolds (that is, two separate locations in the <it>Fugu </it>genome) were considered further. Previously, we reported that the genes found closest to CNEs are statistically over-represented for Gene Ontology (GO) annotations <abbrgrp><abbr bid="B70">70</abbr></abbrgrp> relating to transcriptional regulation and/or development (<it>trans-dev</it>) <abbrgrp><abbr bid="B26">26</abbr></abbrgrp>. Genes within each of these clusters in the human genome (including the closest gene either side of the cluster) were considered to be <it>trans-dev </it>if they contained any of these over-represented GO annotations. To avoid ambiguities in associating CNEs to genes, we selected only those regions containing a single <it>trans-dev </it>gene within the CNE cluster. Ten pairs of <it>Fugu </it>scaffolds conformed to these criteria. Seven regions containing the largest number of CNEs in addition to well defined orthologous sequence within each <it>Fugu </it>region (that is, those that contain genes neighboring the CNE cluster) were selected for further analysis. These scaffolds contain CNEs that are conserved in the vicinity of the following <it>trans-dev </it>genes in the human genome: <it>BCL11A</it>, <it>EBF1</it>, <it>FIGN</it>, <it>PAX2</it>, <it>SOX1</it>, <it>UNC4.1 </it>and <it>ZNF503</it>. <it>Fugu </it>protein sequences from the corresponding orthologs were obtained from Ensembl Compara (v36), except in the case of <it>PAX2 </it>where only partial sequences were present. In these cases, tBLASTn searches using known protein sequences from zebrafish <it>pax2 </it>co-orthologs <it>pax2.1 </it>(SPTR: PAX2_BRARE) and <it>pax2.2 </it>(SPTR: O93370) were used to identify the <it>Fugu </it>protein sequence.</p>
            <p>To verify that these genes were phylogenetically co-orthologous to mammalian copies we carried out multiple alignments of each pair of <it>Fugu </it>proteins together with available orthologs from human, mouse, rat, dog and chicken using CLUSTALW (v1.83) <abbrgrp><abbr bid="B71">71</abbr></abbrgrp> downloaded from Ensembl Compara (v36) unless otherwise stated. In addition we used all available orthologs from zebrafish. Two of these are previously experimentally characterized co-orthologs and were downloaded from the SwissProt protein database (<it>pax2.1</it>, PAX2_BRARE; <it>pax2.2</it>, O93370; <it>sox1a</it>, Q4V997; <it>sox1b</it>, Q2Z1R2). The remaining novel zebrafish orthologs were downloaded using Ensembl Compara. In the cases of <it>BCL11A</it>, <it>FIGN </it>and <it>ZNF503 </it>only a single ortholog was identified by Ensembl. In the case of <it>EBF1</it>, no zebrafish ortholog could be identified, and was, therefore, replaced by a single ortholog from the stickleback genome. The closest available invertebrate ortholog of each gene in either <it>Ciona </it>(ci), <it>Drosophila </it>(dm) or <it>Caenorhabditis elegans </it>(ce) was used as an outgroup (<it>BCL11A</it>, <it>LD11946p </it>(dm); <it>EBF1</it>, coe (ci); <it>FIGN</it>, <it>CBG21866 </it>(ce); <it>PAX2</it>, <it>pax258 </it>(ci); <it>SOX1</it>, <it>soxNRA </it>(dm); <it>UNC4.1</it>, <it>unc4 </it>(ci); and <it>ZNF503</it>, <it>noc </it>(dm)).</p>
            <p>The closest paralog from human, mouse, rat, dog, chicken and <it>Fugu </it>in each case was also included as an outgroup in each alignment (that is, <it>BCL11A:BCL11B</it>, <it>FIGN:FIGN1L</it>, <it>PAX2:PAX5</it>, <it>SOX1:SOX3</it>, <it>EBF1:EBF3</it>, <it>ZNF503:ZNF703</it>). The closest related paralog was defined as the highest significantly scoring non-ortholog high scoring pair (HSP) in a BLASTp search of the human protein sequence against the SwissProt/trEMBL nr protein database. No closely related paralog could be identified for <it>UNC4.1</it>. A phylogenetic tree was created from each alignment using the neighbor joining (NJ) method and 1,000 bootstrap replicates using MEGA v3.1 <abbrgrp><abbr bid="B72">72</abbr></abbrgrp>.</p>
         </sec>
         <sec>
            <st>
               <p><it>Fugu </it>co-ortholog gene nomenclature</p>
            </st>
            <p><it>F. rubripes </it>is not a good experimental model organism due to difficulties in captive breeding and experimental manipulation. Consequently, few of its genes have been experimentally validated. Most gene predictions in the <it>Fugu </it>genome therefore remain novel, uncharacterized and have no current gene name. For the purposes of this study we decided upon a naming scheme for the <it>Fugu </it>co-orthologs that uses the same name as the human ortholog (for example, <it>SOX1 </it>= <it>sox1</it>) together with a number that denotes the specific co-ortholog (for example, <it>sox1.1/sox1.2</it>). This naming convention is similar to that used in early studies of zebrafish co-orthologs (for example, <it>pax2.1 </it><abbrgrp><abbr bid="B63">63</abbr></abbrgrp>) but which has now been superseded by a naming convention using letters (for example, <it>pax2a</it>) (ZFIN gene nomenclature guidelines <abbrgrp><abbr bid="B73">73</abbr></abbrgrp>). We therefore used a number-based nomenclature to distinguish <it>Fugu </it>co-orthologs from zebrafish co-orthologs. For those genes in which zebrafish co-orthologs had previously been characterized (<it>pax2a/pax2b</it>, <it>sox1a/sox1b</it>) we named <it>Fugu </it>equivalents by their phylogenetic similarity to these characterized zebrafish genes as ascertained through phylogeny. So, as an example, <it>PAX2 </it>co-orthologs were identified on <it>Fugu </it>scaffolds 86 and 59 (assembly v4; Figure <figr fid="F1">1d</figr>). The phylogeny identified the protein encoded on S86 as closest to zebrafish <it>pax2a </it>and that encoded on S59 as closest to <it>pax2b </it>(Figure <figr fid="F1">1c</figr>); therefore, the gene on S86 was named <it>pax2.1 </it>and the gene on S59 was named <it>pax2.2</it>. The rest of the co-orthologous sets that did not have characterized zebrafish equivalents were assigned names randomly. It is important to note this nomenclature is used purely to distinguish the genes and has no functional significance.</p>
         </sec>
         <sec>
            <st>
               <p>Identification of CNEs in <it>Fugu </it>co-orthologous regions</p>
            </st>
            <p>CNE clusters derived from the whole-genome alignment were used to define the extent of sequence in both human and <it>Fugu </it>for use in more sensitive multiple alignments. Regions up to the next known gene from the most peripheral CNEs in each cluster were extracted in both human and <it>Fugu </it>using the Ensembl API <abbrgrp><abbr bid="B74">74</abbr></abbrgrp>. Special attention was paid to include the same orthologous region between co-orthologous pairs to ensure equivalent comparison. In situations where the full extent of the region could not be identified in one of the co-orthologs due to the location of the region at the end of a scaffold (for example, scaffold_86, <it>znf503.1</it>; Additional data file 1), only CNEs identified up to the same orthologous region (estimated by the presence of nearby genes) in the second co-ortholog were used for comparative analyses. Orthologous sequences corresponding to each human region were similarly extracted in mouse (assembly v34) and rat (assembly v3.4). All genomic sequences were orientated prior to alignment so that the <it>trans-dev </it>gene was in positive orientation and masked for known repeats and low complexity regions using RepeatMasker and the relevant species-specific repeat library. Multiple alignments for the discovery of conserved sub-sequences located in the same relative order and orientation were carried out using the MLAGAN alignment toolkit <abbrgrp><abbr bid="B75">75</abbr></abbrgrp> with translated anchoring and the phylogenetic guide tree '((human (mouse rat)) fugu)'. Pairwise glocal alignments to uncover conserved elements that may have undergone rearrangements (and are, therefore, no longer in the same relative order along the sequence) or inversions between <it>Fugu </it>and all other organisms utilised in the MLAGAN alignment were carried out in a pairwise fashion using Shuffle-LAGAN <abbrgrp><abbr bid="B76">76</abbr></abbrgrp> with default parameters. Each pair of <it>Fugu </it>co-orthologous regions was aligned to the same orthologous mammalian sequence. CNEs were identified from the alignments using the VISTA program <abbrgrp><abbr bid="B77">77</abbr></abbrgrp> as regions with at least 65% identity over 40 bp using <it>Fugu </it>as the baseline sequence. CNEs were filtered further to include only those that were conserved in human and at least one rodent.</p>
         </sec>
         <sec>
            <st>
               <p>Identification of overlapping and distinct CNEs between <it>Fugu </it>co-orthologous regions</p>
            </st>
            <p>The human sequence was used as a reference in order to ascertain whether CNEs identified from each co-ortholog region overlapped the same human sequence (termed 'overlapping') or were conserved in only one co-ortholog (termed 'distinct'). CNEs between co-orthologs were considered overlapping if the conserved sequence overlapped the same position in the human genome by at least 20 bp. <it>Fugu </it>CNEs that were defined as 'distinct' to one co-ortholog were used as query sequences against the alternative <it>Fugu </it>co-orthologous genomic region using the CHAOS local aligner <abbrgrp><abbr bid="B75">75</abbr></abbrgrp> on both strands with the following parameters: word length 10, score cut-off 10, degeneracy tolerance 1, rescoring cut-off 1,000, and BLAST-like extension on. Resulting alignments were filtered to retain only those with at least 65% identity over 40 bp.</p>
         </sec>
         <sec>
            <st>
               <p>Evolution of overlapping CNEs</p>
            </st>
            <sec>
               <st>
                  <p>Element length</p>
               </st>
               <p>To ensure equivalent comparison, the length of the human CNE was used when measuring changes in element length between CNEs conserved in both co-orthologs. For each pair of overlapping CNEs with a one-to-one relationship (that is, each CNE overlapped one other CNE), the proportion (<it>P</it>) of the length of the overlap compared to the full length of each CNE was calculated using:</p>
               <p>
                  <display-formula><it>P </it>= <it>ov/len</it></display-formula>
               </p>
               <p>where <it>ov </it>is the length of the overlap between co-orthologous CNE (in bp) and <it>len </it>is the full length of the CNE. Values of <it>P </it>that tend towards 1 indicate all or the majority of the element is contained within the overlap while those tending towards 0 indicate only a small proportion of the element is contained within the overlapping region.</p>
            </sec>
            <sec>
               <st>
                  <p>Sequence evolution</p>
               </st>
               <p>To compare the evolutionary rates of <it>Fugu </it>co-orthologous copies against the single human copy (representing the ancestral sequence) we used all 'overlapping' co-orthologous CNEs. For those CNEs that did not have a one-to-one relationship (for example, in cases where two or more CNEs in one region overlapped a single CNE in another) we treated each individual overlap region independently. Multiple alignments were created for each co-ortholog CNE individually together with orthologous sequence from human, mouse, rat, dog and chicken (where available) to produce the best mapping of orthologous bases. The human sequence from the overlap detected was used to extract corresponding sequence within each multiple alignment for each co-orthologous <it>Fugu </it>copy together with those of orthologous sequences from the other vertebrates. These sequences were then realigned together using DIALIGN (v2.2) <abbrgrp><abbr bid="B78">78</abbr></abbrgrp> and all gapped positions removed using the Gblocks program (v0.91b) <abbrgrp><abbr bid="B79">79</abbr></abbrgrp>. A Tajima relative rate test of each pair of <it>Fugu </it>co-orthologous copies against the single human sequence was carried out as described in <abbrgrp><abbr bid="B57">57</abbr></abbrgrp>. Only sequences that showed at least 4 independent changes in one of the elements and a <it>p </it>value &#8804; 0.05 were considered to have undergone significant change.</p>
            </sec>
            <sec>
               <st>
                  <p>Identification of CNEs duplicated at the origin of vertebrates</p>
               </st>
               <p>All human CNE sequences were searched against the human genome using BLAST with sensitive parameters (word size 8, mismatch penalty -1) to identify CNEs that have more than a single match (e-value &#8804; 1 &#215; 10<sup>-4</sup>) in the human genome. Paralogs were identified within 1.5 Mb of each hit using the method set out in <abbrgrp><abbr bid="B29">29</abbr></abbrgrp>. The probability of the enrichment for overlapping CNEs within the dCNE set was calculated using a &#967;<sup>2 </sup>test with expected numbers for each type of CNE (overlapping versus distinct) calculated from the proportion of each within the whole CNE dataset (381:430 = 0.469:0.531).</p>
            </sec>
         </sec>
      </sec>
      <sec>
         <st>
            <p>Additional data files</p>
         </st>
         <p>The following additional data are available with the online version of this paper. Additional data file <supplr sid="S1">1</supplr> is a comparison of the CNEs and genic environment between <it>Fugu </it>co-orthologs of <it>znf503.1 </it>and <it>znf503.2</it>. Additional data file <supplr sid="S2">2</supplr> is a bar chart showing that changes in the number of CNEs between co-orthologs correlates with changes in the size of the genomic region in which they are identified. Additional data file <supplr sid="S3">3</supplr> is a full table of results of the relative rate tests for all overlapping co-orthologous CNEs.</p>
         <suppl id="S1">
            <title>
               <p>Additional data file 1</p>
            </title>
            <caption>
               <p>Comparison of the CNEs and genic environment between <it>Fugu </it>co-orthologs of <it>znf503.1 </it>and <it>znf503.2</it></p>
            </caption>
            <text>
               <p>Comparison of the CNEs and genic environment between <it>Fugu </it>co-orthologs of <it>znf503.1 </it>and <it>znf503.2</it></p>
            </text>
            <file name="gb-2007-8-4-r53-S1.doc">
               <p>Click here for file</p>
            </file>
         </suppl>
         <suppl id="S2">
            <title>
               <p>Additional data file 2</p>
            </title>
            <caption>
               <p>Changes in the number of CNEs between co-orthologs correlates with changes in the size of the genomic region in which they are identified</p>
            </caption>
            <text>
               <p>Changes in the number of CNEs between co-orthologs correlates with changes in the size of the genomic region in which they are identified</p>
            </text>
            <file name="gb-2007-8-4-r53-S2.doc">
               <p>Click here for file</p>
            </file>
         </suppl>
         <suppl id="S3">
            <title>
               <p>Additional data file 3</p>
            </title>
            <caption>
               <p>Results of the relative rate tests for all overlapping co-orthologous CNEs</p>
            </caption>
            <text>
               <p>Results of the relative rate tests for all overlapping co-orthologous CNEs</p>
            </text>
            <file name="gb-2007-8-4-r53-S3.xls">
               <p>Click here for file</p>
            </file>
         </suppl>
      </sec>
   </bdy>
   <bm>
      <ack>
         <sec>
            <st>
               <p>Acknowledgements</p>
            </st>
            <p>We would like to thank Debbie Goode, Gayle McEwen and Wally Gilks for useful discussions during the writing of this manuscript, Lucinda Fell for help in formatting the figures and Remo Sanges for advice and use of his CHAOS parser. This work was funded by the Medical Research Council.</p>
         </sec>
      </ack>
      <refgrp>
         <bibl id="B1">
            <title>
               <p>The evolutionary fate and consequences of duplicate genes.</p>
            </title>
            <aug>
               <au>
                  <snm>Lynch</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Conery</snm>
                  <fnm>JS</fnm>
               </au>
            </aug>
            <source>Science</source>
            <pubdate>2000</pubdate>
            <volume>290</volume>
            <fpage>1151</fpage>
            <lpage>1155</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1126/science.290.5494.1151</pubid>
                  <pubid idtype="pmpid" link="fulltext">11073452</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B2">
            <aug>
               <au>
                  <snm>Ohno</snm>
                  <fnm>S</fnm>
               </au>
            </aug>
            <source>Evolution by Gene Duplication</source>
            <publisher>Heidelberg: Springer-Verlag</publisher>
            <pubdate>1970</pubdate>
         </bibl>
         <bibl id="B3">
            <title>
               <p>Evolution of genetic redundancy.</p>
            </title>
            <aug>
               <au>
                  <snm>Nowak</snm>
                  <fnm>MA</fnm>
               </au>
               <au>
                  <snm>Boerlijst</snm>
                  <fnm>MC</fnm>
               </au>
               <au>
                  <snm>Cooke</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Smith</snm>
                  <fnm>JM</fnm>
               </au>
            </aug>
            <source>Nature</source>
            <pubdate>1997</pubdate>
            <volume>388</volume>
            <fpage>167</fpage>
            <lpage>171</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/40618</pubid>
                  <pubid idtype="pmpid" link="fulltext">9217155</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B4">
            <title>
               <p>Comparable rates of gene loss and functional divergence after genome duplications early in vertebrate evolution.</p>
            </title>
            <aug>
               <au>
                  <snm>Nadeau</snm>
                  <fnm>JH</fnm>
               </au>
               <au>
                  <snm>Sankoff</snm>
                  <fnm>D</fnm>
               </au>
            </aug>
            <source>Genetics</source>
            <pubdate>1997</pubdate>
            <volume>147</volume>
            <fpage>1259</fpage>
            <lpage>1266</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1208249</pubid>
                  <pubid idtype="pmpid" link="fulltext">9383068</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B5">
            <title>
               <p>Zebrafish hox clusters and vertebrate genome evolution.</p>
            </title>
            <aug>
               <au>
                  <snm>Amores</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Force</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Yan</snm>
                  <fnm>YL</fnm>
               </au>
               <au>
                  <snm>Joly</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Amemiya</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Fritz</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Ho</snm>
                  <fnm>RK</fnm>
               </au>
               <au>
                  <snm>Langeland</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Prince</snm>
                  <fnm>V</fnm>
               </au>
               <au>
                  <snm>Wang</snm>
                  <fnm>YL</fnm>
               </au>
               <etal/>
            </aug>
            <source>Science</source>
            <pubdate>1998</pubdate>
            <volume>282</volume>
            <fpage>1711</fpage>
            <lpage>1714</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1126/science.282.5394.1711</pubid>
                  <pubid idtype="pmpid" link="fulltext">9831563</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B6">
            <title>
               <p>Evolution of duplicate genes in a tetraploid animal, <it>Xenopus laevis</it>.</p>
            </title>
            <aug>
               <au>
                  <snm>Hughes</snm>
                  <fnm>MK</fnm>
               </au>
               <au>
                  <snm>Hughes</snm>
                  <fnm>AL</fnm>
               </au>
            </aug>
            <source>Mol Biol Evol</source>
            <pubdate>1993</pubdate>
            <volume>10</volume>
            <fpage>1360</fpage>
            <lpage>1369</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">8277859</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B7">
            <title>
               <p>Preservation of duplicate genes by complementary, degenerative mutations.</p>
            </title>
            <aug>
               <au>
                  <snm>Force</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Lynch</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Pickett</snm>
                  <fnm>FB</fnm>
               </au>
               <au>
                  <snm>Amores</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Yan</snm>
                  <fnm>YL</fnm>
               </au>
               <au>
                  <snm>Postlethwait</snm>
                  <fnm>J</fnm>
               </au>
            </aug>
            <source>Genetics</source>
            <pubdate>1999</pubdate>
            <volume>151</volume>
            <fpage>1531</fpage>
            <lpage>1545</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1460548</pubid>
                  <pubid idtype="pmpid" link="fulltext">10101175</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B8">
            <title>
               <p>Logic functions of the genomic cis-regulatory code.</p>
            </title>
            <aug>
               <au>
                  <snm>Istrail</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Davidson</snm>
                  <fnm>EH</fnm>
               </au>
            </aug>
            <source>Proc Natl Acad Sci USA</source>
            <pubdate>2005</pubdate>
            <volume>102</volume>
            <fpage>4954</fpage>
            <lpage>4959</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">555989</pubid>
                  <pubid idtype="pmpid" link="fulltext">15788531</pubid>
                  <pubid idtype="doi">10.1073/pnas.0409624102</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B9">
            <title>
               <p>Cis-regulatory control circuits in development.</p>
            </title>
            <aug>
               <au>
                