<?xml version='1.0'?>
<!DOCTYPE art SYSTEM 'http://www.biomedcentral.com/xml/article.dtd'>
<art>
   <ui>1471-2164-9-174</ui>
   <ji>1471-2164</ji>
   <fm>
      <dochead>Research article</dochead>
      <bibl>
         <title>
            <p>Non-random retention of protein-coding overlapping genes in Metazoa</p>
         </title>
         <aug>
            <au id="A1" ce="yes">
               <snm>Sold&#224;</snm>
               <fnm>Giulia</fnm>
               <insr iid="I1"/>
               <email>giulia.solda@unimi.it</email>
            </au>
            <au id="A2" ce="yes">
               <snm>Suyama</snm>
               <fnm>Mikita</fnm>
               <insr iid="I2"/>
               <email>mikita@genome.med.kyoto-u.ac.jp</email>
            </au>
            <au id="A3">
               <snm>Pelucchi</snm>
               <fnm>Paride</fnm>
               <insr iid="I3"/>
               <email>paride.pelucchi@itb.cnr.it</email>
            </au>
            <au id="A4">
               <snm>Boi</snm>
               <fnm>Silvia</fnm>
               <insr iid="I1"/>
               <email>boi.silvia@gmail.com</email>
            </au>
            <au id="A5">
               <snm>Guffanti</snm>
               <fnm>Alessandro</fnm>
               <insr iid="I3"/>
               <email>alessandro.guffanti@itb.cnr.it</email>
            </au>
            <au id="A6">
               <snm>Rizzi</snm>
               <fnm>Ermanno</fnm>
               <insr iid="I3"/>
               <email>ermanno.rizzi@itb.cnr.it</email>
            </au>
            <au id="A7">
               <snm>Bork</snm>
               <fnm>Peer</fnm>
               <insr iid="I4"/>
               <email>bork@embl.de</email>
            </au>
            <au id="A8">
               <snm>Tenchini</snm>
               <mnm>Luisa</mnm>
               <fnm>Maria</fnm>
               <insr iid="I1"/>
               <email>marialuisa.tenchini@unimi.it</email>
            </au>
            <au id="A9" ca="yes">
               <snm>Ciccarelli</snm>
               <mi>D</mi>
               <fnm>Francesca</fnm>
               <insr iid="I5"/>
               <insr iid="I6"/>
               <email>francesca.ciccarelli@ifom-ieo-campus.it</email>
            </au>
         </aug>
         <insg>
            <ins id="I1">
               <p>Department of Biology and Genetics for Medical Sciences, University of Milan, 20133 Milan, Italy</p>
            </ins>
            <ins id="I2">
               <p>Center for Genomic Medicine, Kyoto University Graduate School of Medicine, Konoe-cho, Yoshida, Sakyo-ku, 606-8501 Kyoto, Japan</p>
            </ins>
            <ins id="I3">
               <p>Institute of Biomedical Technologies, National Research Council, Via Fantoli 16/15, 20138 Milan, Italy</p>
            </ins>
            <ins id="I4">
               <p>European Molecular Biology Laboratory, Meyerhofstr.1, 69012 Heidelberg, Germany</p>
            </ins>
            <ins id="I5">
               <p>Department of Experimental Oncology, European Institute of Oncology, Via Ripamonti 435, 20141 Milan, Italy</p>
            </ins>
            <ins id="I6">
               <p>FIRC Institute of Molecular Oncology Foundation, Via Adamello 16, 20139 Milan, Italy</p>
            </ins>
         </insg>
         <source>BMC Genomics</source>
         <issn>1471-2164</issn>
         <pubdate>2008</pubdate>
         <volume>9</volume>
         <issue>1</issue>
         <fpage>174</fpage>
         <url>http://www.biomedcentral.com/1471-2164/9/174</url>
         <xrefbib>
            <pubidlist>
               <pubid idtype="pmpid">18416813</pubid>
               <pubid idtype="doi">10.1186/1471-2164-9-174</pubid>
            </pubidlist>
         </xrefbib>
      </bibl>
      <history>
         <rec>
            <date>
               <day>29</day>
               <month>10</month>
               <year>2007</year>
            </date>
         </rec>
         <acc>
            <date>
               <day>16</day>
               <month>4</month>
               <year>2008</year>
            </date>
         </acc>
         <pub>
            <date>
               <day>16</day>
               <month>4</month>
               <year>2008</year>
            </date>
         </pub>
      </history>
      <cpyrt>
         <year>2008</year>
         <collab>Sold&#224; et al; licensee BioMed Central Ltd.</collab>
         <note>This is an Open Access article distributed under the terms of the Creative Commons Attribution License (<url>http://creativecommons.org/licenses/by/2.0</url>), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</note>
      </cpyrt>
      <abs>
         <sec>
            <st>
               <p>Abstract</p>
            </st>
            <sec>
               <st>
                  <p>Background</p>
               </st>
               <p>Although the overlap of transcriptional units occurs frequently in eukaryotic genomes, its evolutionary and biological significance remains largely unclear. Here we report a comparative analysis of overlaps between genes coding for well-annotated proteins in five metazoan genomes (human, mouse, zebrafish, fruit fly and worm).</p>
            </sec>
            <sec>
               <st>
                  <p>Results</p>
               </st>
               <p>For all analyzed species the observed number of overlapping genes is always lower than expected assuming functional neutrality, suggesting that gene overlap is negatively selected. The comparison to the random distribution also shows that retained overlaps do not exhibit random features: antiparallel overlaps are significantly enriched, while overlaps lying on the same strand and those involving coding sequences are highly underrepresented. We confirm that overlap is mostly species-specific and provide evidence that it frequently originates through the acquisition of terminal, non-coding exons. Finally, we show that overlapping genes tend to be significantly co-expressed in a breast cancer cDNA library obtained by 454 deep sequencing, and that different overlap types display different patterns of reciprocal expression.</p>
            </sec>
            <sec>
               <st>
                  <p>Conclusion</p>
               </st>
               <p>Our data suggest that overlap between protein-coding genes is selected against in Metazoa. However, when retained it may be used as a species-specific mechanism for the reciprocal regulation of neighboring genes. The tendency of overlaps to involve non-coding regions of the genes leads to the speculation that the advantages achieved by an overlapping arrangement may be optimized by evolving regulatory non-coding transcripts.</p>
            </sec>
         </sec>
      </abs>
   </fm>
   <bdy>
      <sec>
         <st>
            <p>Background</p>
         </st>
         <p>The occurrence of overlapping genes in higher eukaryotes has long been considered a rare event <abbrgrp><abbr bid="B1">1</abbr><abbr bid="B2">2</abbr></abbrgrp>, but the completion of genome sequencing efforts and whole-transcriptome analyses have instead revealed that mammalian genomes harbor a high number of overlapping transcriptional units <abbrgrp><abbr bid="B3">3</abbr><abbr bid="B4">4</abbr><abbr bid="B5">5</abbr><abbr bid="B6">6</abbr><abbr bid="B7">7</abbr><abbr bid="B8">8</abbr></abbrgrp>. The majority of detected overlaps occurs between genes transcribed from opposite strands of the same genomic locus and often involves non-coding RNAs <abbrgrp><abbr bid="B6">6</abbr><abbr bid="B9">9</abbr><abbr bid="B10">10</abbr><abbr bid="B11">11</abbr><abbr bid="B12">12</abbr><abbr bid="B13">13</abbr><abbr bid="B14">14</abbr></abbrgrp>. These antisense transcripts participate in a number of cellular processes, such as genomic imprinting, X chromosome inactivation, alternative splicing, gene silencing and methylation, RNA editing and translation <abbrgrp><abbr bid="B15">15</abbr><abbr bid="B16">16</abbr><abbr bid="B17">17</abbr><abbr bid="B18">18</abbr><abbr bid="B19">19</abbr><abbr bid="B20">20</abbr></abbrgrp>. Comparatively, very little is known about overlapping genes lying on the same DNA strand, apart from a few cases reported in the literature <abbrgrp><abbr bid="B21">21</abbr><abbr bid="B22">22</abbr><abbr bid="B23">23</abbr><abbr bid="B24">24</abbr></abbrgrp>. Overlap is estimated to involve around 10% of protein-coding genes <abbrgrp><abbr bid="B13">13</abbr><abbr bid="B25">25</abbr></abbrgrp>, raising to 20%&#8211;60% when non-coding RNAs are included <abbrgrp><abbr bid="B6">6</abbr><abbr bid="B8">8</abbr><abbr bid="B9">9</abbr><abbr bid="B10">10</abbr><abbr bid="B12">12</abbr><abbr bid="B14">14</abbr><abbr bid="B26">26</abbr><abbr bid="B27">27</abbr></abbrgrp>. Despite their abundance, the origin and evolution of overlapping genes in eukaryotes remain unclear, and different comparative studies have often led to discordant results <abbrgrp><abbr bid="B6">6</abbr><abbr bid="B12">12</abbr><abbr bid="B13">13</abbr><abbr bid="B14">14</abbr><abbr bid="B25">25</abbr></abbrgrp>. The inclusion of non-coding RNAs and poorly annotated transcripts in these analyses, together with protein-coding genes, may have contributed to the conflicting results, as protein-coding genes and functional non-coding RNAs evolve differently <abbrgrp><abbr bid="B28">28</abbr></abbrgrp>. In order to investigate the evolution of gene overlap in Metazoa we decided to use a dataset restricted to well-annotated protein-coding genes. We retrieved overlapping protein-coding genes in 5 representative species (<it>Homo sapiens</it>, <it>Mus musculus</it>, <it>Danio rerio</it>, <it>Drosophila melanogaster </it>and <it>Caenorhabditis elegans</it>), and compared the observed cases with a random distribution expected in case of functional neutrality. We identified features and conservation of protein-coding overlapping genes, and inferred possible mechanisms responsible for overlap formation. Finally, to evaluate the possible relationship between overlap and gene expression, we analyzed the expression of our set of overlapping genes in a human breast cancer cDNA library derived by 454 deep sequencing.</p>
      </sec>
      <sec>
         <st>
            <p>Results and Discussion</p>
         </st>
         <sec>
            <st>
               <p>Non-random retention of protein-coding overlapping genes in Metazoa</p>
            </st>
            <p>The sequences of known protein-coding genes for five fully sequenced metazoan genomes (<it>H. sapiens, M. musculus, D. rerio, D. melanogaster, C. elegans</it>) were retrieved from several sources (RefSeq v.10, UCSC mm7 assembly, WormBase WS140, Flybase r4.2, Riken Fantom 3.0). From each dataset, we filtered splice variants and removed non-coding transcripts, pseudogenes and purely computational gene predictions, and mapped each cDNA on the corresponding genome to extract the Overlapping Gene Clusters (OGCs). OGCs were detected when there was partial or total overlap between the genomic coordinates of two or more genes. Gene boundaries were defined as the start and the end of the longest transcript (the complete list and features of OGCs are provided in Additional files <supplr sid="S1">1</supplr> and <supplr sid="S2">2</supplr>). Our selection criteria allowed the detection of OGCs laying both on the same (parallel) and on opposite (antiparallel) DNA strand (Figure <figr fid="F1">1</figr>). Although we started from restrictive datasets, our estimates of overlapping protein-coding genes (Table <tblr tid="T1">1</tblr>) were consistent with previous analyses in human, mouse and Drosophila <abbrgrp><abbr bid="B13">13</abbr><abbr bid="B27">27</abbr><abbr bid="B29">29</abbr><abbr bid="B30">30</abbr><abbr bid="B31">31</abbr></abbrgrp>. According to our results, overlap involves 4&#8211;8% of protein-coding genes, with the exception of Drosophila, where the percentage of OGCs is higher (26.2%, Table <tblr tid="T1">1</tblr>).</p>
            <suppl id="S1">
               <title>
                  <p>Additional file 1</p>
               </title>
               <text>
                  <p>Features of the unique RefSeq genes used for the analysis.</p>
               </text>
               <file name="1471-2164-9-174-S1.pdf">
                  <p>Click here for file</p>
               </file>
            </suppl>
            <suppl id="S2">
               <title>
                  <p>Additional file 2</p>
               </title>
               <text>
                  <p><b>Datasets of overlapping gene clusters in five metazoan genomes</b>. The file is formatted with one worksheet for each species analyzed. For each dataset the cluster number, the number of component and the list of RefSeq Accession numbers of all components are reported.</p>
               </text>
               <file name="1471-2164-9-174-S2.xls">
                  <p>Click here for file</p>
               </file>
            </suppl>
            <tbl id="T1">
               <title>
                  <p>Table 1</p>
               </title>
               <caption>
                  <p>Overlapping genes in five Metazoa.</p>
               </caption>
               <tblbdy cols="11">
                  <r>
                     <c ca="center">
                        <p>
                           <b>Species</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>Total Genes</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>Unique Genes</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>Observed OGCs</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>Expected OGCs (SD)</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>Observed OG Pairs</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>Expected OG Pairs (SD)</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>Observed OGs</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>Observed OGs (%)</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>Expected OGs (SD)</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>Expected OGs (%)</b>
                        </p>
                     </c>
                  </r>
                  <r>
                     <c cspan="11">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>
                           <it>Hs</it>
                        </p>
                     </c>
                     <c ca="center">
                        <p>23073</p>
                     </c>
                     <c ca="center">
                        <p>17794</p>
                     </c>
                     <c ca="center">
                        <p>663</p>
                     </c>
                     <c ca="center">
                        <p>2374.1</p>
                        <p>(27.7)</p>
                     </c>
                     <c ca="center">
                        <p>749</p>
                     </c>
                     <c ca="center">
                        <p>4954.0</p>
                        <p>(65.2)</p>
                     </c>
                     <c ca="center">
                        <p>1409</p>
                     </c>
                     <c ca="center">
                        <p>7.9</p>
                     </c>
                     <c ca="center">
                        <p>6630.8</p>
                        <p>(47.2)</p>
                     </c>
                     <c ca="center">
                        <p>37.3</p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>
                           <it>Mm</it>
                        </p>
                     </c>
                     <c ca="center">
                        <p>17970</p>
                     </c>
                     <c ca="center">
                        <p>17040</p>
                     </c>
                     <c ca="center">
                        <p>656</p>
                     </c>
                     <c ca="center">
                        <p>2112.9</p>
                        <p>(27.9)</p>
                     </c>
                     <c ca="center">
                        <p>662</p>
                     </c>
                     <c ca="center">
                        <p>4293.7</p>
                        <p>(71.3)</p>
                     </c>
                     <c ca="center">
                        <p>1400</p>
                     </c>
                     <c ca="center">
                        <p>8.2</p>
                     </c>
                     <c ca="center">
                        <p>5873.7</p>
                        <p>(68.8)</p>
                     </c>
                     <c ca="center">
                        <p>34.5</p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>
                           <it>Dr</it>
                        </p>
                     </c>
                     <c ca="center">
                        <p>6672</p>
                     </c>
                     <c ca="center">
                        <p>6506</p>
                     </c>
                     <c ca="center">
                        <p>108</p>
                     </c>
                     <c ca="center">
                        <p>396.9</p>
                        <p>(14.7)</p>
                     </c>
                     <c ca="center">
                        <p>155</p>
                     </c>
                     <c ca="center">
                        <p>524.2</p>
                        <p>(19.5)</p>
                     </c>
                     <c ca="center">
                        <p>262</p>
                     </c>
                     <c ca="center">
                        <p>4.0</p>
                     </c>
                     <c ca="center">
                        <p>899.1</p>
                        <p>(30.6)</p>
                     </c>
                     <c ca="center">
                        <p>13.8</p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>
                           <it>Dm</it>
                        </p>
                     </c>
                     <c ca="center">
                        <p>18768</p>
                     </c>
                     <c ca="center">
                        <p>13416</p>
                     </c>
                     <c ca="center">
                        <p>1505</p>
                     </c>
                     <c ca="center">
                        <p>2172.3</p>
                        <p>(8.1)</p>
                     </c>
                     <c ca="center">
                        <p>2022</p>
                     </c>
                     <c ca="center">
                        <p>7483.1</p>
                        <p>(44.0)</p>
                     </c>
                     <c ca="center">
                        <p>3514</p>
                     </c>
                     <c ca="center">
                        <p>26.2</p>
                     </c>
                     <c ca="center">
                        <p>7876.0</p>
                        <p>(34.9)</p>
                     </c>
                     <c ca="center">
                        <p>58.7</p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>
                           <it>Ce</it>
                        </p>
                     </c>
                     <c ca="center">
                        <p>21124</p>
                     </c>
                     <c ca="center">
                        <p>19359</p>
                     </c>
                     <c ca="center">
                        <p>404</p>
                     </c>
                     <c ca="center">
                        <p>3615.6</p>
                        <p>(32.2)</p>
                     </c>
                     <c ca="center">
                        <p>494</p>
                     </c>
                     <c ca="center">
                        <p>8653.1</p>
                        <p>(80.7)</p>
                     </c>
                     <c ca="center">
                        <p>898</p>
                     </c>
                     <c ca="center">
                        <p>4.6</p>
                     </c>
                     <c ca="center">
                        <p>10442.9</p>
                        <p>(54.3)</p>
                     </c>
                     <c ca="center">
                        <p>53.9</p>
                     </c>
                  </r>
               </tblbdy>
               <tblfn>
                  <p>Unique Genes refer to the actual number of sequences used in the analysis, after filtering for splice variants. For each species, the counts of overlapping genes (OGs), overlapping gene pairs (OG pairs), and overlapping gene clusters (OGCs) coming from both real data and random simulations are shown. In the latter case the average number over ten simulations is reported together with the standard deviation (SD). Abbreviations: <it>Hs</it>, Homo sapiens;<it>Mm</it>, Mus musculus;<it>Dr</it>, Danio rerio;<it>Dm</it>, Drosophila melanogaster; <it>Ce</it>, Caenorabditis elegans.</p>
               </tblfn>
            </tbl>
            <fig id="F1">
               <title>
                  <p>Figure 1</p>
               </title>
               <caption>
                  <p>Classes of overlapping genes</p>
               </caption>
               <text>
                  <p><b>Classes of overlapping genes</b>. OGC classification was based on the overlap extent (complete or partial) and on the reciprocal direction of transcription of the involved genes (same or opposite strand). Convergent overlaps involve the 3' termini of both genes, while divergent overlaps involve the 5' ends (UTR and/or CDS). Complete overlap occurs when the entire sequence of one gene is contained within another gene. In nested OGCs one gene lies completely within an intron of the other, while embedded genes can share more than one intron or exon.</p>
               </text>
               <graphic file="1471-2164-9-174-1"/>
            </fig>
            <p>We compared the observed data on overlapping genes to a null model that simulates the distribution of expected events in case of neutrality. For each species, we re-assigned random positions to the individual genes within each chromosome and counted the resulting number of overlaps.</p>
            <p>In all species the overall number of observed OGCs was significantly lower than randomly expected (Table <tblr tid="T1">1</tblr>), suggesting selection against the retention of overlap as a general mechanism of gene arrangement. There are at least two reasons possibly explaining the counter selection of gene overlap in Metazoa. First, each mutation occurring within the overlapping regions would affect two or more sequences at the same time, and would likely reduce the ability of the involved genes to become optimally adapted <abbrgrp><abbr bid="B32">32</abbr></abbrgrp>. Second, overlap can result in transcriptional <abbrgrp><abbr bid="B33">33</abbr><abbr bid="B34">34</abbr></abbrgrp> or translational <abbrgrp><abbr bid="B35">35</abbr></abbrgrp> interference between overlapping reading frames. Both these reasons help to explain why OGCs formed by several genes, as well as those involving coding sequences, are particularly selected against (see below).</p>
            <p>Although overlap of protein-coding genes is generally counterselected, some classes of overlap are preferentially retained. Comparison to random expectation showed that observed OGCs display a non-random distribution in terms of their abundance, reciprocal orientation, and overlap pattern (Table <tblr tid="T1">1</tblr> and Figure <figr fid="F2">2</figr>).</p>
            <fig id="F2">
               <title>
                  <p>Figure 2</p>
               </title>
               <caption>
                  <p>Comparative analysis of OGCs in Metazoa</p>
               </caption>
               <text>
                  <p><b>Comparative analysis of OGCs in Metazoa</b>. For all species, the bar corresponding to each analyzed feature of the observed overlapping gene sets is followed by the bar corresponding to the random expectation. Since the simulations were repeated ten times, the corresponding standard deviation is associated to the random bars. <b>A</b>. OGC composition. OGCs were analyzed on the basis of the number of genes composing each cluster. The OGCs with more than 4 components are 5 in human, 11 in mouse, 5 in zebra fish, 48 in fly and 7 in worm. <b>B</b>. Type of overlap. Occurrence of partial and complete overlaps in both 2-component and multicomponent OGCs. <b>C</b>. Gene reciprocal arrangement. Distribution of OGCs according to the overlap type (refer to Figure 1). <b>D</b>. Features of the overlapping regions. The plot reports the number of overlaps involving coding sequence for one (CDS/UTR or CDS/intron overlaps) or both genes and the number of overlap involving only noncoding sequence (UTR/UTR and UTR/intron).</p>
               </text>
               <graphic file="1471-2164-9-174-2"/>
            </fig>
            <p>While the number of random OGCs varied according to the different gene density of the analyzed species (Table <tblr tid="T1">1</tblr> and Additional file <supplr sid="S2">2</supplr>), this tendency was not maintained in the observed data. Observed OGCs in human and mouse were around 4&#8211;5 times less than expected, while they were ~2 times less in fly and ~12 times less in worm. In agreement with our observation, a remarkable abundance of antisense transcripts in fly and a paucity in worm have been recently reported <abbrgrp><abbr bid="B12">12</abbr><abbr bid="B14">14</abbr></abbrgrp>. The different rates of overlapping genes in fly and worm could be due to species-specific features. The higher proportion of overlapping genes in fly might be partly explained by the high gene density and the extended UTR length (Additional File <supplr sid="S1">1</supplr>). The low number of OGCs in worm may be instead a consequence of the presence of operons, which involve at least 15% of <it>C. elegans </it>genes <abbrgrp><abbr bid="B36">36</abbr></abbrgrp>. Each operon contains from two to eight genes which are cotranscribed from the same strand as a polycistronic RNA and trans-spliced <abbrgrp><abbr bid="B36">36</abbr></abbrgrp>. It is conceivable that such feature might place a constraint on the plasticity of the worm genome, disfavoring the retention of specific overlap types, such as antiparallel and partial arrangements. Similar genomic constraint has been recently proposed to explain the paucity of duplicated genes in operons <abbrgrp><abbr bid="B37">37</abbr></abbrgrp>.</p>
            <p>In all genomes except zebrafish, OGCs formed by two genes occurred at a frequency significantly higher than expected (Figure <figr fid="F2">2A</figr>). In addition, OGCs in human, mouse, and fly were mostly formed by antiparallel convergent pairs which overlapped only partially, while in zebrafish and more markedly in worm nested overlaps were preferred (Figures <figr fid="F2">2B</figr> and <figr fid="F2">2C</figr>). However, the results in zebrafish should be taken carefully, since they are probably affected by the poor coverage of the corresponding gene set. Likewise, the annotation of 5' and 3' untranslated regions appears particularly incomplete in worm (Additional file <supplr sid="S1">1</supplr>), which may contribute to an underestimation of some overlap classes (<it>i. e</it>. partial overlap, CDS/UTR and UTR/UTR overlaps, Figure <figr fid="F2">2</figr>). In all species overlaps between genes lying on the same strand and those sharing coding regions are strongly selected against (Figures <figr fid="F2">2C</figr> and <figr fid="F2">2D</figr>). Overlap between UTRs is preferentially retained in all organisms, while the overlap between coding regions and introns is common in zebrafish, drosophila and worm (Figure <figr fid="F2">2D</figr>). The non-random features of observed OGCs suggest that different overlap types are under different selective pressures. The retention of specific overlapping classes might be allowed when it provides selective advantages: in the case of genes on opposite strands the advantage could be represented by antisense regulation. Human, mouse and fly are significantly enriched in overlapping pairs potentially able to form antisense, which include all antiparallel overlaps sharing exons (<it>H. sapiens </it>55%, p &lt; 0.001; <it>M. musculus </it>58%, p &lt; 0.001; <it>D. melanogaster </it>53%, p &lt; 0.001, chi-squared test). This result suggests that, at least in these species, positive selection might act to preserve antisense regulation. It cannot be excluded, however, that part of the positive effect could be a consequence of the negative selection towards parallel and CDS/CDS overlaps.</p>
         </sec>
         <sec>
            <st>
               <p>Poor evolutionary conservation of OGCs in Metazoa</p>
            </st>
            <p>We next evaluated the conservation of OGCs across metazoan evolution by verifying both the presence of orthologous genes and the overlap conservation. For each pair of analyzed species, we assigned pairwise orthology for all sequence entries, extracted the orthologs involved in OGCs, and verified whether the overlapping arrangement was conserved (Figure <figr fid="F3">3A</figr>). Most overlapping genes in one species had their corresponding orthologs in the others (Figure <figr fid="F3">3B</figr>), but very few overlaps were maintained (Figure <figr fid="F3">3C</figr>). In total, ~40% of human OGCs were also present in mouse -a higher percentage than previous estimates (6.6&#8211;17%) <abbrgrp><abbr bid="B6">6</abbr><abbr bid="B13">13</abbr><abbr bid="B33">33</abbr><abbr bid="B38">38</abbr></abbrgrp>, but lower than the rate of orthologous genes between the two species (75.6%).</p>
            <fig id="F3">
               <title>
                  <p>Figure 3</p>
               </title>
               <caption>
                  <p>Conservation of overlapping genes and OGCs within Metazoa</p>
               </caption>
               <text>
                  <p><b>Conservation of overlapping genes and OGCs within Metazoa</b>. <b>A</b>. Schematic representation of the procedure for detecting the conservation of overlapping genes (red spots) and OGCs (red pairs) between two species. The same pipeline was applied to each pair of species considered in the analysis. <b>B</b>. Pairwise conservation of overlapping genes within Metazoans. In the first column, the numbers in brackets represent the total number of overlapping genes for that species. <b>C</b>. Pairwise conservation of OGCs within Metazoa.</p>
               </text>
               <graphic file="1471-2164-9-174-3"/>
            </fig>
            <p>Among OGCs conserved between human and mouse, the antiparallel arrangement was represented the most (~88%), highlighting again the tendency to maintain possible sense-antisense regulation. Interestingly, convergent and nested antiparallel arrangements were significantly enriched in the conserved set (chi-square = 22.47, p = 2.14e-6 and chi-square = 23.55, p = 1.2e-6, respectively), when compared to divergent overlaps (Table <tblr tid="T2">2</tblr>). This result supports previous observations that 3'-3' (convergent) overlapping pairs are significantly more conserved than 5'5' (divergent) ones, and indicates a prevalent role for 3'UTRs in antisense regulation <abbrgrp><abbr bid="B14">14</abbr><abbr bid="B39">39</abbr></abbrgrp>.</p>
            <tbl id="T2">
               <title>
                  <p>Table 2</p>
               </title>
               <caption>
                  <p>Overlapping genes conservation between human and mouse.</p>
               </caption>
               <tblbdy cols="4">
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="right">
                        <p>
                           <b>Total human OG pairs</b>
                        </p>
                     </c>
                     <c ca="right">
                        <p>
                           <b>Human-mouse conserved OG pairs</b>
                        </p>
                     </c>
                     <c ca="right">
                        <p>
                           <b>Conservation rate (%)</b>
                        </p>
                     </c>
                  </r>
                  <r>
                     <c cspan="4">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <b>OG Pairs</b>
                        </p>
                     </c>
                     <c ca="right">
                        <p>749</p>
                     </c>
                     <c ca="right">
                        <p>282</p>
                     </c>
                     <c ca="right">
                        <p>37.65</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <it>Partial</it>
                        </p>
                     </c>
                     <c ca="right">
                        <p>
                           <it>476</it>
                        </p>
                     </c>
                     <c ca="right">
                        <p>
                           <it>172</it>
                        </p>
                     </c>
                     <c ca="right">
                        <p>
                           <it>36.13</it>
                        </p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>partial convergent</p>
                     </c>
                     <c ca="right">
                        <p>328</p>
                     </c>
                     <c ca="right">
                        <p>153</p>
                     </c>
                     <c ca="right">
                        <p>46.65</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>partial divergent</p>
                     </c>
                     <c ca="right">
                        <p>115</p>
                     </c>
                     <c ca="right">
                        <p>14</p>
                     </c>
                     <c ca="right">
                        <p>12.17</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>partial parallel</p>
                     </c>
                     <c ca="right">
                        <p>33</p>
                     </c>
                     <c ca="right">
                        <p>5</p>
                     </c>
                     <c ca="right">
                        <p>15.15</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <it>Complete</it>
                        </p>
                     </c>
                     <c ca="right">
                        <p>
                           <it>273</it>
                        </p>
                     </c>
                     <c ca="right">
                        <p>
                           <it>110</it>
                        </p>
                     </c>
                     <c ca="right">
                        <p>
                           <it>40.29</it>
                        </p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>nested antiparallel</p>
                     </c>
                     <c ca="right">
                        <p>152</p>
                     </c>
                     <c ca="right">
                        <p>79</p>
                     </c>
                     <c ca="right">
                        <p>51.97</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>nested parallel</p>
                     </c>
                     <c ca="right">
                        <p>75</p>
                     </c>
                     <c ca="right">
                        <p>22</p>
                     </c>
                     <c ca="right">
                        <p>29.33</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>embedded antiparallel</p>
                     </c>
                     <c ca="right">
                        <p>16</p>
                     </c>
                     <c ca="right">
                        <p>3</p>
                     </c>
                     <c ca="right">
                        <p>18.75</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>embedded parallel</p>
                     </c>
                     <c ca="right">
                        <p>30</p>
                     </c>
                     <c ca="right">
                        <p>6</p>
                     </c>
                     <c ca="right">
                        <p>20.00</p>
                     </c>
                  </r>
               </tblbdy>
               <tblfn>
                  <p>Conservation of overlapping gene (OG) pairs according to their reciprocal arrangement.</p>
               </tblfn>
            </tbl>
            <p>Parallel OGCs did not show any significant enrichment in the conserved set (Table <tblr tid="T2">2</tblr>). Since same-strand overlaps are strongly selected against (Table <tblr tid="T1">1</tblr>), we investigated whether the ones that are conserved are more likely to be functional. Indeed, we found that several parallel OGCs conserved between human and mouse might be functionally related on the basis of the available literature data (Additional data file <supplr sid="S3">3</supplr>).</p>
            <suppl id="S3">
               <title>
                  <p>Additional file 3</p>
               </title>
               <text>
                  <p><b>Parallel overlaps conserved between human and mouse</b>. We manually reviewed the main literature on the genes involved in parallel OGCs conserved between human and mouse to look for possible functional links. We signed as 'Not Known' the cases where either the transcripts correspond to not yet annotated genes, or no functional link can be derived from the available literature.</p>
               </text>
               <file name="1471-2164-9-174-S3.xls">
                  <p>Click here for file</p>
               </file>
            </suppl>
            <p>Although the vast majority of overlap is not conserved over long evolutionary distances, we found evidence of few ancient overlaps. Overall, three OGCs were conserved between Ecdysozoa (nematodes and arthropods) and Deuterostomia (vertebrates). Interestingly, the only OGC that is conserved from <it>C. elegans </it>to human was lost in arthropods, while two different OGCs are conserved from <it>D. melanogaster </it>to human. All of these OGCs are formed of two genes with a nested antiparallel arrangement. One of the two clusters conserved in <it>D. melanogaster </it>(Cluster 77, Additional File <supplr sid="S2">2</supplr>) involves the synapsin (<it>Syn</it>) and an inhibitor of metalloproteinase (<it>Timp</it>) genes. According to the model proposed for the evolution of the <it>Syn-Timp </it>cluster <abbrgrp><abbr bid="B40">40</abbr></abbrgrp>, the locus containing the ancestral nested genes has undergone gene duplications and losses in vertebrates, followed by function partitioning among the resulting paralogs. A comparable succession of events is compatible also with the evolution of the only OGC conserved between vertebrates and worm (Cluster 371, Additional File <supplr sid="S2">2</supplr>). In this case, the ancestral OGC locus seems to have undergone duplication after the split between Protostomia and Deuterostomia, followed by function partitioning among the resulting paralogs (Additional file <supplr sid="S4">4</supplr>).</p>
            <suppl id="S4">
               <title>
                  <p>Additional file 4</p>
               </title>
               <text>
                  <p>Phylogenetic analysis of the OGC conserved between nematodes and vertebrates.</p>
               </text>
               <file name="1471-2164-9-174-S4.pdf">
                  <p>Click here for file</p>
               </file>
            </suppl>
            <p>The poor evolutionary conservation of gene overlap in Metazoa suggests that its occurrence is species-specific. Such species-specificity was not due to a recent origin of the overlapping genes, as previously suggested <abbrgrp><abbr bid="B2">2</abbr><abbr bid="B13">13</abbr><abbr bid="B32">32</abbr></abbrgrp>. We found that most overlapping genes in one species had orthologs in the other species, although they did not overlap (Figures <figr fid="F3">3B</figr> and <figr fid="F3">3C</figr>). In addition, 30.2% of human overlapping genes and 25.8% of mouse overlapping genes remained physically adjacent in the compared genome, although the superimposition was lost (see below).</p>
            <p>There are examples of functional processes whose poor conservation during evolution is part of their functional role, alternative splicing being the most striking one <abbrgrp><abbr bid="B41">41</abbr></abbrgrp>. Although approximately two-thirds of human genes are alternatively spliced <abbrgrp><abbr bid="B42">42</abbr></abbrgrp>, only 10&#8211;20% of them conserve the spliced exons in the orthologous genes in mouse <abbrgrp><abbr bid="B43">43</abbr></abbrgrp>. Hence we can propose a species-specific usage of gene overlap similarly to what seems to happen for alternative splicing <abbrgrp><abbr bid="B41">41</abbr></abbrgrp>.</p>
         </sec>
         <sec>
            <st>
               <p>Gene structure modifications associated with overlap formation</p>
            </st>
            <p>In order to infer possible mechanisms for overlap formation, we compared the gene structure (gene length and exon number) of conserved and non-conserved overlapping genes in human and mouse. In particular, we analyzed the gene structure of human and mouse overlapping genes whose orthologs lie adjacent (<it>i. e</it>. without any gene between them) but do not overlap in the other genome. We found that 226 human overlapping gene pairs (corresponding to 30.2% of the total) and 171 mouse overlapping gene pairs (25.8% of the total) had orthologs that do not overlap but remain adjacent in the genome of the other species (Table <tblr tid="T3">3</tblr>). The 226 human overlapping gene pairs were significantly longer (<it>z</it>' = 2.53, <it>p </it>= 5.7e-3, Mann-Whitney <it>U</it>-test <abbrgrp><abbr bid="B44">44</abbr></abbrgrp>, Table <tblr tid="T3">3</tblr>) and had more exons (<it>z</it>' = 2.72, <it>p </it>= 3.3e-3) than the mouse orthologs, when compared to the set of conserved overlapping genes (Table <tblr tid="T3">3</tblr>). Similarly, the 171 human orthologs of mouse overlapping gene pairs were shorter (z' = 2.95, p = 1.6e-3) and were formed with fewer exons (z' = 2.28, p = 1.1e-2) than the conserved overlapping pairs. In addition, non-conserved overlapping gene pairs tended to significantly overlap in their UTRs for both human (chi-square = 23.4, p = 1.3e-6) and mouse (chi-square = 24.2, p = 8.9e-7), when compared to the conserved set (Table <tblr tid="T3">3</tblr>).</p>
            <tbl id="T3">
               <title>
                  <p>Table 3</p>
               </title>
               <caption>
                  <p>Gene structure comparison between human and mouse.</p>
               </caption>
               <tblbdy cols="9">
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c cspan="2" ca="center">
                        <p>
                           <b>OG Pairs Conserved in Hs and Mm (282)</b>
                        </p>
                     </c>
                     <c cspan="2" ca="center">
                        <p>
                           <b>Human OG Pairs Adjacent in Mm (226)</b>
                        </p>
                     </c>
                     <c cspan="2" ca="center">
                        <p>
                           <b>Mouse OG Pairs Adjacent in Hs (171)</b>
                        </p>
                     </c>
                     <c cspan="2" ca="center">
                        <p>
                           <b>Non-Overlapping Genes</b>
                        </p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c cspan="8">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>
                           <b>Human</b>
                        </p>
                     </c>
                     <c ca="left">
                        <p>
                           <b>Mouse</b>
                        </p>
                     </c>
                     <c ca="left">
                        <p>
                           <b>Human</b>
                        </p>
                     </c>
                     <c ca="left">
                        <p>
                           <b>Mouse</b>
                        </p>
                     </c>
                     <c ca="left">
                        <p>
                           <b>Human</b>
                        </p>
                     </c>
                     <c ca="left">
                        <p>
                           <b>Mouse</b>
                        </p>
                     </c>
                     <c ca="left">
                        <p>
                           <b>Human (16385)</b>
                        </p>
                     </c>
                     <c ca="left">
                        <p>
                           <b>Mouse (15640)</b>
                        </p>
                     </c>
                  </r>
                  <r>
                     <c cspan="9">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <b>Average Gene Length</b>
                        </p>
                     </c>
                     <c ca="left">
                        <p>68.5 kb</p>
                     </c>
                     <c ca="left">
                        <p>58.6 kb</p>
                     </c>
                     <c ca="left">
                        <p>49.3 kb</p>
                     </c>
                     <c ca="left">
                        <p>31.4 kb</p>
                     </c>
                     <c ca="left">
                        <p>35.5 kb</p>
                     </c>
                     <c ca="left">
                        <p>31.4 kb</p>
                     </c>
                     <c ca="left">
                        <p>55.4 kb</p>
                     </c>
                     <c ca="left">
                        <p>39.6 kb</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <b>Average Exon Number</b>
                        </p>
                     </c>
                     <c ca="left">
                        <p>12.4</p>
                     </c>
                     <c ca="left">
                        <p>12.2</p>
                     </c>
                     <c ca="left">
                        <p>11.3</p>
                     </c>
                     <c ca="left">
                        <p>10.8</p>
                     </c>
                     <c ca="left">
                        <p>11.45</p>
                     </c>
                     <c ca="left">
                        <p>11.4</p>
                     </c>
                     <c ca="left">
                        <p>10.2</p>
                     </c>
                     <c ca="left">
                        <p>9.0</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <b>UTR Overlap</b>
                        </p>
                     </c>
                     <c ca="left">
                        <p>174</p>
                     </c>
                     <c ca="left">
                        <p>200</p>
                     </c>
                     <c ca="left">
                        <p>184</p>
                     </c>
                     <c ca="left">
                        <p>-</p>
                     </c>
                     <c ca="left">
                        <p>-</p>
                     </c>
                     <c ca="left">
                        <p>154</p>
                     </c>
                     <c ca="left">
                        <p>-</p>
                     </c>
                     <c ca="left">
                        <p>-</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <b>CDS Overlap</b>
                        </p>
                     </c>
                     <c ca="left">
                        <p>108</p>
                     </c>
                     <c ca="left">
                        <p>82</p>
                     </c>
                     <c ca="left">
                        <p>42</p>
                     </c>
                     <c ca="left">
                        <p>-</p>
                     </c>
                     <c ca="left">
                        <p>-</p>
                     </c>
                     <c ca="left">
                        <p>16</p>
                     </c>
                     <c ca="left">
                        <p>-</p>
                     </c>
                     <c ca="left">
                        <p>-</p>
                     </c>
                  </r>
               </tblbdy>
               <tblfn>
                  <p>The number of overlapping gene pairs in each analyzed dataset is reported in brackets. CDS overlaps refer to the overlapping genes whose CDS coordinates are superimposed, while UTR overlaps refer to those cases where the gene coordinates (calculated from transcript start to transcript end) are superimposed.</p>
               </tblfn>
            </tbl>
            <p>The structural analysis of orthologs of human and mouse overlapping genes that remain adjacent but lack the superimposition shows that the overlap formation is frequently associated with an increase in gene size and exon number. We therefore suggest that the overlap between adjacent genes may originate by species-specific acquisition of additional, non-coding exons. In agreement with our results, most of the loci analyzed by the ENCODE consortium were found to possess distal 5' non-coding exons which map into neighboring genes and tend to be tissue- or cell-line-specific <abbrgrp><abbr bid="B45">45</abbr></abbrgrp>.</p>
         </sec>
         <sec>
            <st>
               <p>Expression patterns of overlapping gene pairs</p>
            </st>
            <p>In order to evaluate whether the presence of overlap is a mechanism for regulation of gene expression, we used the human OGC dataset to cross-examine a human breast cancer transcriptome obtained by massive pyrosequencing <abbrgrp><abbr bid="B46">46</abbr></abbrgrp>. To be able to detect the expression of transcripts normally expressed at low levels, we used a normalized cDNA library (see Methods). For this reason, our analysis is mostly qualitative and aims to detect the reciprocal expression of genes involved in overlap. Although global gene expression can result quantitatively altered by the tumorous condition, a significant modification in the pattern of reciprocal expression between overlapping genes is unlikely. We defined three patterns of reciprocal expression: co-expression, when both genes were represented in the library; discordant expression, for OG pairs in which expression is observed for only one gene in the pair; and no expression, for OGs whose expression was not detected. Figure <figr fid="F4">4</figr> shows the frequencies of these three expression patterns in the breast cancer library, by grouping the OG pairs according to the type of overlap.</p>
            <fig id="F4">
               <title>
                  <p>Figure 4</p>
               </title>
               <caption>
                  <p>Analysis of the co-ordinate expression of human overlapping genes</p>
               </caption>
               <text>
                  <p><b>Analysis of the co-ordinate expression of human overlapping genes</b>. Expression patterns of human overlapping genes on the basis of their reciprocal arrangement.</p>
               </text>
               <graphic file="1471-2164-9-174-4"/>
            </fig>
            <p>The observed rate of co-expression in the whole dataset was 27.6%, while the percentage of discordant expressed OGs was 42.5%. Taking into account the overall coverage of known genes in our cDNA library, the co-expression rate is four times higher than expected by the random probability of having any two genes expressed at the same time in the library (7.3%). Therefore, OGs showed a significant tendency to be co-expressed (upper cumulative distribution function, p = 6.7e-102). It should be noted that we obtained significant co-expression even though we removed all sequences mapping to more than one gene in the same cluster (see Methods). Such filtering step likely led to an underestimation of the level of co-expression of overlapping genes, but it did not influence the final result. By contrast, the percentage of discordantly expressed genes is not significantly different from random expectation (upper cumulative distribution function, p = 0.043). Previous studies reported higher co-expression rates, ranging from 35.1% to 44.9% <abbrgrp><abbr bid="B10">10</abbr><abbr bid="B47">47</abbr></abbrgrp>, with the differences likely due to experimental design (<it>i. e</it>. differences in the starting dataset) and in the number of analyzed tissues.</p>
            <p>Considering the different overlapping arrangements, we also observed that co-expression was significantly higher for both convergent (chi-square = 4.69, p= 3.03e-2) and divergent OGs (chi-square = 4.28, p= 3.85e-2), when compared to the frequency of the complete overlaps. On the opposite, we observed no statistically significant differences among overlapping arrangements when considering discordantly expressed OGs. Taken together, these results further support the hypothesis that gene overlap might be used to co-ordinate expression of adjacent genes.</p>
         </sec>
      </sec>
      <sec>
         <st>
            <p>Conclusion</p>
         </st>
         <p>Our work shows for the first time that overlap between protein coding genes, although widespread, is counterselected during Metazoan evolution. We also show that overlap retention does not occur randomly, since it preferentially involves gene pairs lying on opposite DNA strands and sharing non-coding regions. The features of retained OGCs suggest a likely role for overlap in the reciprocal regulation of neighboring genes. The evidence that OGs are significantly co-expressed in the breast cancer transcriptome further supports this hypothesis. In addition, the poor conservation of overlap during evolution, and the fact that formation/loss of the overlapping arrangement is related to changes in gene structure, mostly occurring within non-coding regions, points to this as a species-specific mechanism. As non-coding regions generally have fewer constraints on their primary sequence, the tendency to confine the overlap to non-coding regions may achieve co-regulation without forcing two functional protein-coding genes to co-evolve. We might speculate that this tendency would ultimately result in the evolution of overlapping non-coding transcripts optimized for the regulation of their protein-coding partner.</p>
      </sec>
      <sec>
         <st>
            <p>Methods</p>
         </st>
         <sec>
            <st>
               <p>Overlapping gene detection</p>
            </st>
            <p>The RefSeq cDNA sets <abbrgrp><abbr bid="B48">48</abbr></abbrgrp> for five organisms (<it>H. sapiens</it>, <it>M. musculus</it>, <it>D. rerio</it>, <it>D. melanogaster</it>, and <it>C. elegans</it>) were downloaded from the UCSC ftp site (RefSeq v.10, March 2005) <abbrgrp><abbr bid="B49">49</abbr></abbrgrp>. We also retrieved mouse cDNAs from the RIKEN database (Fantom 3.0) and the UCSC collection of mouse cDNAs (Mm7 assembly), while for fly and worm we used Flybase (FlyBase r4.2) and Wormbase (WormBase WS140), respectively <abbrgrp><abbr bid="B50">50</abbr><abbr bid="B51">51</abbr><abbr bid="B52">52</abbr><abbr bid="B53">53</abbr></abbrgrp>.</p>
            <p>The genomic position of each sequence was mapped on the corresponding genome by using BLAT <abbrgrp><abbr bid="B54">54</abbr></abbrgrp> (human Build 35; mouse Build 34; zebra fish Zv4; fly Release 4; worm WS120). The pairs of genes whose genomic coordinates partially or totally overlap were extracted and grouped in OGCs. Filters were adopted to avoid (a) splice variants of the same gene, and (b) artifacts due to the position mapping. We considered each pair of cDNAs sharing three or more exons as splice variants of the same gene if more than 20% of the exon number overlapped. In the case of cDNAs with two or less exons, we considered them as splice variants if at least one residue overlapped at the exon level. For each group of predicted splice variants, only the longest gene was taken as gene representative. Artifacts such as the inclusion of the mRNA poly-A in the gene mapping were avoided by excluding all the 3' exons composed of more than 70% of one single nucleotide.</p>
         </sec>
         <sec>
            <st>
               <p>Statistical null model for the overlap formation</p>
            </st>
            <p>For all five species analyzed, the gene positions of the unique gene sets were randomly reassigned within the corresponding chromosomes with no constraints in the type of overlaps, the reciprocal arrangement, and the number of genes per cluster. The analysis was repeated for 10 rounds and the resulting number of overlapping genes, overlapping gene pairs, and overlapping gene clusters were counted at each round. The average number was considered for comparison with the observed dataset. Features of the OGCs, such as the reciprocal arrangement, the component distribution and the type of overlapping region were also analyzed.</p>
            <p>The fraction of overlaps that results in sense/antisense complementarity at the mRNA level were calculated by extracting all overlap that occur on opposite strands and involve exons of both genes. The statistical significance of the difference between the observed and the random set was assessed by applying a chi-squared test (degree of freedom = 1) to the resulting 2 &#215; 2 contingency matrix <abbrgrp><abbr bid="B44">44</abbr></abbrgrp>.</p>
         </sec>
         <sec>
            <st>
               <p>Benchmark</p>
            </st>
            <p>To test the specificity of the data produced, we performed a manual analysis of the <it>D. rerio </it>dataset (108 OGCs). No obvious false positive due to the methodology could be found. The sensitivity of our method was assessed by benchmarking the derived set against an extensive collection of overlapping genes previously reported. We included 8 independent large-scale screenings of human antisense transcripts/nested genes <abbrgrp><abbr bid="B9">9</abbr><abbr bid="B13">13</abbr><abbr bid="B27">27</abbr><abbr bid="B29">29</abbr><abbr bid="B30">30</abbr><abbr bid="B55">55</abbr><abbr bid="B56">56</abbr><abbr bid="B57">57</abbr></abbrgrp> and about 100 experimental studies on specific overlapping gene pairs (Additional files <supplr sid="S5">5</supplr> and <supplr sid="S6">6</supplr>). OGCs reported in the literature with no match in our dataset were checked manually. The main reasons for the lack of coverage were due to the selection criteria (<it>i. e</it>. we deliberately excluded pseudogenes or non-coding RNAs which were instead included in some large-scale screenings). Only 5 cases were found to be false negatives, giving an estimate specificity of 99%.</p>
            <suppl id="S5">
               <title>
                  <p>Additional file 5</p>
               </title>
               <text>
                  <p><b>Literature overview of human overlapping genes</b>. The numbering for the literature references refers to Additional file 6.</p>
               </text>
               <file name="1471-2164-9-174-S5.xls">
                  <p>Click here for file</p>
               </file>
            </suppl>
            <suppl id="S6">
               <title>
                  <p>Additional file 6</p>
               </title>
               <text>
                  <p><b>Additional bibliographic references</b>. Document providing all the literature references cited in the Additional data files.</p>
               </text>
               <file name="1471-2164-9-174-S6.pdf">
                  <p>Click here for file</p>
               </file>
            </suppl>
         </sec>
         <sec>
            <st>
               <p>Orthology assignment</p>
            </st>
            <p>The orthology relationships between the overlapping genes in the five analyzed species were assessed by using a two-step procedure (Figure <figr fid="F3">3A</figr>). First, for all pairs of species we carried out all-against-all tBLASTx <abbrgrp><abbr bid="B58">58</abbr></abbrgrp> between the corresponding cDNA sets. The best reciprocal hits between two species were assigned as orthologous genes. Secondly, we derived orthologous overlapping genes by extracting all overlapping genes conserved between each pair of species.</p>
         </sec>
         <sec>
            <st>
               <p>Gene structure analysis</p>
            </st>
            <p>We compared the gene structure of the conserved OGCs between human and mouse with human and mouse overlapping genes whose orthologs do not overlap but are adjacent in the genome of the other species. The first set (conserved overlapping genes between human and mouse; the first column in Table <tblr tid="T3">3</tblr>) was composed of 282 pairs of overlapping genes, while the second (overlapping in human but adjacent in mouse chromosomes; the second column in Table <tblr tid="T3">3</tblr>), and the third (overlapping in mouse but adjacent in human chromosomes; the third column in Table <tblr tid="T3">3</tblr>) were composed of 226 and 171 gene pairs, respectively. For each gene, we measured the gene length, defined as the genomic coordinates on the corresponding chromosome, and the exon numbers, as derived from the BLAT output. Using the Mann-Whitney U-test <abbrgrp><abbr bid="B44">44</abbr></abbrgrp> we compared gene length difference between the first and the second sets, and between the first and the third sets to assess the statistical significance of the difference in gene structure.</p>
            <p>We also analyzed the feature of the region (UTR or coding) involved in the overlap for all OGCs in the 3 sets, by counting the number of detectable overlaps after removing the UTRs. In this case, the statistical significance of the difference between the first and the second sets and the first and the third set were assessed by applying a chi-squared test (degree of freedom = 1) to the resulting 2 &#215; 2 contingency matrix <abbrgrp><abbr bid="B44">44</abbr></abbrgrp>.</p>
         </sec>
         <sec>
            <st>
               <p>Analysis of OGC expression in breast cancer</p>
            </st>
            <p>cDNA was obtained from polyadenylated breast cancer RNA (purity 85&#8211;90%). cDNA was normalized after reverse transcription to obtain a balanced mix of low and high abundance mRNA, as previously described <abbrgrp><abbr bid="B59">59</abbr></abbrgrp>. 2.1 micrograms of normalized, double-strand cDNA were then converted to a single strand library using the 454 protocol <abbrgrp><abbr bid="B46">46</abbr></abbrgrp>. Two independent cDNA libraries were generated with an average length per sequence read of 100 and 200 nt, respectively. A total of 198,658 non-redundant sequence reads, according to NCBI non-redundant database, were sequenced from each breast cancer cDNA library. The entire library was mapped against the 249,953 sequences of the human "all_mrna" transcript dataset from the UCSC human genome. A total of 37,774 reads corresponding to a specific cDNA and its related isoforms was identified (requiring blat perfect matches, 95% of the read covered by alignment). The reads were then aligned to the human RefSeq cDNA dataset from UCSC (25,922 sequences) requiring perfect coverage. 9,082 distinct matches were finally obtained, which were used for the subsequent calculations.</p>
            <p>Reads-to-gene assignment was performed by blasting the nucleotide sequences of all OGs to the library. Only reads showing 100% identity with a transcript were used in the analyses. To ensure the 454 sequences were unambiguously matched to the assigned transcript, we removed reads mapped to more than one locus. Since the 454 sequencing process does not involve <it>in-vivo </it>cloning and the cDNA is subjected to nebulization, in the deriving library it is not possible to assign the strand when the two transcripts overlap. Thus, we removed all sequence reads mapping to more than one gene within the same cluster. In total, 36 out of 3701 reads were removed, corresponding to an estimated loss of 0.9%, which likely did not create a significant bias.</p>
            <p>The statistical significance for the enrichment of co-expression in overlapping gene pairs was evaluated by an upper cumulative distribution function.</p>
         </sec>
      </sec>
      <sec>
         <st>
            <p>Authors' contributions</p>
         </st>
         <p>GS contributed to the study concept and design (Gene structure analysis), the data collection (Features of human overlapping genes, Benchmark), the analysis and interpretation of the data, and drafted the manuscript. MS was involved in the study design (Statistical null model of overlap formation), the data collection (Orthology assignment), the analysis and interpretation of the data, and provided his statistical expertise. PP built the cDNA library. SB contributed to the data interpretation as well as to the drafting of the manuscript. AG and ER did the pyrosequencing and primary sequence analysis of the cDNA library. PB and MT provided critical revision of the manuscript for important intellectual content. FDC contributed to the study concept and design, the analysis and interpretation of the data, the drafting of the manuscript, and supervised the entire study.</p>
      </sec>
   </bdy>
   <bm>
      <ack>
         <sec>
            <st>
               <p>Acknowledgements</p>
            </st>
            <p>We wish to thank Davide Rambaldi (IEO, Milan) for his help in retrieving the data needed for the simulation of the random distribution. We also thank Raoul Bonnal and Michele Iacono of ITB-CNR for contributing to the generation, sequencing and analysis of the 454 cDNA library sequences. This work was supported by the Start Up grant of AIRC to FDC and by "Borsa di studio per il perfezionamento all'estero" of the University of Milan to GS.</p>
         </sec>
      </ack>
      <refgrp>
         <bibl id="B1">
            <title>
               <p>Shedding light on the dark side of the genome: overlapping genes in higher eukaryotes</p>
            </title>
            <aug>
               <au>
                  <snm>Boi</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Solda'</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Tenchini</snm>
                  <fnm>ML</fnm>
               </au>
            </aug>
            <source>Current Genomics</source>
            <pubdate>2004</pubdate>
            <volume>5</volume>
            <fpage>509</fpage>
            <lpage>524</lpage>
            <xrefbib>
               <pubid idtype="doi">10.2174/1389202043349020</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B2">
            <title>
               <p>Overlapping genes in vertebrate genomes</p>
            </title>
            <aug>
               <au>
                  <snm>Makalowska</snm>
                  <fnm>I</fnm>
               </au>
               <au>
                  <snm>Lin</snm>
                  <fnm>CF</fnm>
               </au>
               <au>
                  <snm>Makalowski</snm>
                  <fnm>W</fnm>
               </au>
            </aug>
            <source>Comput Biol Chem</source>
            <pubdate>2005</pubdate>
            <volume>29</volume>
            <issue>1</issue>
            <fpage>1</fpage>
            <lpage>12</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/j.compbiolchem.2004.12.006</pubid>
                  <pubid idtype="pmpid" link="fulltext">15680581</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B3">
            <title>
               <p>Identification and analysis of functional elements in 1% of the human genome by the ENCODE pilot project</p>
            </title>
            <source>Nature</source>
            <pubdate>2007</pubdate>
            <volume>447</volume>
            <issue>7146</issue>
            <fpage>799</fpage>
            <lpage>816</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">2212820</pubid>
                  <pubid idtype="pmpid">17571346</pubid>
                  <pubid idtype="doi">10.1038/nature05874</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B4">
            <title>
               <p>The transcriptional landscape of the mammalian genome</p>
            </title>
            <aug>
               <au>
                  <snm>Carninci</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Kasukawa</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Katayama</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Gough</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Frith</snm>
                  <fnm>MC</fnm>
               </au>
               <au>
                  <snm>Maeda</snm>
                  <fnm>N</fnm>
               </au>
               <au>
                  <snm>Oyama</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Ravasi</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Lenhard</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Wells</snm>
                  <fnm>C</fnm>
               </au>
               <etal/>
            </aug>
            <source>Science</source>
            <pubdate>2005</pubdate>
            <volume>309</volume>
            <issue>5740</issue>
            <fpage>1559</fpage>
            <lpage>1563</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1126/science.1112014</pubid>
                  <pubid idtype="pmpid" link="fulltext">16141072</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B5">
            <title>
               <p>Transcriptional maps of 10 human chromosomes at 5-nucleotide resolution</p>
            </title>
            <aug>
               <au>
                  <snm>Cheng</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Kapranov</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Drenkow</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Dike</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Brubaker</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Patel</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Long</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Stern</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Tammana</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Helt</snm>
                  <fnm>G</fnm>
               </au>
               <etal/>
            </aug>
            <source>Science</source>
            <pubdate>2005</pubdate>
            <volume>308</volume>
            <issue>5725</issue>
            <fpage>1149</fpage>
            <lpage>1154</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1126/science.1108625</pubid>
                  <pubid idtype="pmpid" link="fulltext">15790807</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B6">
            <title>
               <p>Complex Loci in human and mouse genomes</p>
            </title>
            <aug>
               <au>
                  <snm>Engstrom</snm>
                  <fnm>PG</fnm>
               </au>
               <au>
                  <snm>Suzuki</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Ninomiya</snm>
                  <fnm>N</fnm>
               </au>
               <au>
                  <snm>Akalin</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Sessa</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Lavorgna</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Brozzi</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Luzi</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Tan</snm>
                  <fnm>SL</fnm>
               </au>
               <au>
                  <snm>Yang</snm>
                  <fnm>L</fnm>
               </au>
               <etal/>
            </aug>
            <source>PLoS Genet</source>
            <pubdate>2006</pubdate>
            <volume>2</volume>
            <issue>4</issue>
            <fpage>e47</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1449890</pubid>
                  <pubid idtype="pmpid" link="fulltext">16683030</pubid>
                  <pubid idtype="doi">10.1371/journal.pgen.0020047</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B7">
            <title>
               <p>Examples of the complex architecture of the human transcriptome revealed by RACE and high-density tiling arrays</p>
            </title>
            <aug>
               <au>
                  <snm>Kapranov</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Drenkow</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Cheng</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Long</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Helt</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Dike</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Gingeras</snm>
                  <fnm>TR</fnm>
               </au>
            </aug>
            <source>Genome Res</source>
            <pubdate>2005</pubdate>
            <volume>15</volume>
            <issue>7</issue>
            <fpage>987</fpage>
            <lpage>997</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1172043</pubid>
                  <pubid idtype="pmpid" link="fulltext">15998911</pubid>
                  <pubid idtype="doi">10.1101/gr.3455305</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B8">
            <title>
               <p>Antisense transcription in the mammalian transcriptome</p>
            </title>
            <aug>
               <au>
                  <snm>Katayama</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Tomaru</snm>
                  <fnm>Y</fnm>
               </au>
               <au>
                  <snm>Kasukawa</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Waki</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Nakanishi</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Nakamura</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Nishida</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Yap</snm>
                  <fnm>CC</fnm>
               </au>
               <au>
                  <snm>Suzuki</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Kawai</snm>
                  <fnm>J</fnm>
               </au>
               <etal/>
            </aug>
            <source>Science</source>
            <pubdate>2005</pubdate>
            <volume>309</volume>
            <issue>5740</issue>
            <fpage>1564</fpage>
            <lpage>1566</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1126/science.1112009</pubid>
                  <pubid idtype="pmpid" link="fulltext">16141073</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B9">
            <title>
               <p>Over 20% of human transcripts might form sense-antisense pairs</p>
            </title>
            <aug>
               <au>
                  <snm>Chen</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Sun</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Kent</snm>
                  <fnm>WJ</fnm>
               </au>
               <au>
                  <snm>Huang</snm>
                  <fnm>X</fnm>
               </au>
               <au>
                  <snm>Xie</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Wang</snm>
                  <fnm>W</fnm>
               </au>
               <au>
                  <snm>Zhou</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Shi</snm>
                  <fnm>RZ</fnm>
               </au>
               <au>
                  <snm>Rowley</snm>
                  <fnm>JD</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>2004</pubdate>
            <volume>32</volume>
            <issue>16</issue>
            <fpage>4812</fpage>
            <lpage>4820</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">519112</pubid>
                  <pubid idtype="pmpid" link="fulltext">15356298</pubid>
                  <pubid idtype="doi">10.1093/nar/gkh818</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B10">
            <title>
               <p>Sense-antisense pairs in mammals: functional/evolutionary considerations</p>
            </title>
            <aug>
               <au>
                  <snm>Galante</snm>
                  <fnm>PA</fnm>
               </au>
               <au>
                  <snm>Vidal</snm>
                  <fnm>DO</fnm>
               </au>
               <au>
                  <snm>de Souza</snm>
                  <fnm>JE</fnm>
               </au>
               <au>
                  <snm>Camargo</snm>
                  <fnm>AA</fnm>
               </au>
               <au>
                  <snm>de Souza</snm>
                  <fnm>SJ</fnm>
               </au>
            </aug>
            <source>Genome Biol</source>
            <pubdate>2007</pubdate>
            <volume>8</volume>
            <issue>3</issue>
            <fpage>R40</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1868933</pubid>
                  <pubid idtype="pmpid" link="fulltext">17371592</pubid>
                  <pubid idtype="doi">10.1186/gb-2007-8-3-r40</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B11">
            <title>
               <p>In search of antisense</p>
            </title>
            <aug>
               <au>
                  <snm>Lavorgna</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Dahary</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Lehner</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Sorek</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Sanderson</snm>
                  <fnm>CM</fnm>
               </au>
               <au>
                  <snm>Casari</snm>
                  <fnm>G</fnm>
               </au>
            </aug>
            <source>Trends Biochem Sci</source>
            <pubdate>2004</pubdate>
            <volume>29</volume>
            <issue>2</issue>
            <fpage>88</fpage>
            <lpage>94</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/j.tibs.2003.12.002</pubid>
                  <pubid idtype="pmpid" link="fulltext">15102435</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B12">
            <title>
               <p>Evidence for variation in abundance of antisense transcripts between multicellular animals but no relationship between antisense transcriptionand organismic complexity</p>
            </title>
            <aug>
               <au>
                  <snm>Sun</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Hurst</snm>
                  <fnm>LD</fnm>
               </au>
               <au>
                  <snm>Carmichael</snm>
                  <fnm>GG</fnm>
               </au>
               <au>
                  <snm>Chen</snm>
                  <fnm>J</fnm>
               </au>
            </aug>
            <source>Genome Res</source>
            <pubdate>2006</pubdate>
            <volume>16</volume>
            <issue>7</issue>
            <fpage>922</fpage>
            <lpage>933</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1484459</pubid>
                  <pubid idtype="pmpid" link="fulltext">16769979</pubid>
                  <pubid idtype="doi">10.1101/gr.5210006</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B13">
            <title>
               <p>Mammalian overlapping genes: the comparative perspective</p>
            </title>
            <aug>
               <au>
                  <snm>Veeramachaneni</snm>
                  <fnm>V</fnm>
               </au>
               <au>
                  <snm>Makalowski</snm>
                  <fnm>W</fnm>
               </au>
               <au>
                  <snm>Galdzicki</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Sood</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Makalowska</snm>
                  <fnm>I</fnm>
               </au>
            </aug>
            <source>Genome Res</source>
            <pubdate>2004</pubdate>
            <volume>14</volume>
            <issue>2</issue>
            <fpage>280</fpage>
            <lpage>286</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">327103</pubid>
                  <pubid idtype="pmpid" link="fulltext">14762064</pubid>
                  <pubid idtype="doi">10.1101/gr.1590904</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B14">
            <title>
               <p>Genome-wide in silico identification and analysis of cis natural antisense transcripts (cis-NATs) in ten species</p>
            </title>
            <aug>
               <au>
                  <snm>Zhang</snm>
                  <fnm>Y</fnm>
               </au>
               <au>
                  <snm>Liu</snm>
                  <fnm>XS</fnm>
               </au>
               <au>
                  <snm>Liu</snm>
                  <fnm>QR</fnm>
               </au>
               <au>
                  <snm>Wei</snm>
                  <fnm>L</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>2006</pubdate>
            <volume>34</volume>
            <issue>12</issue>
            <fpage>3465</fpage>
            <lpage>3475</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1524920</pubid>
                  <pubid idtype="pmpid" link="fulltext">16849434</pubid>
                  <pubid idtype="doi">10.1093/nar/gkl473</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B15">
            <title>
               <p>Genome-wide natural antisense transcription: coupling its regulation to its different regulatory mechanisms</p>
            </title>
            <aug>
               <au>
                  <snm>Lapidot</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Pilpel</snm>
                  <fnm>Y</fnm>
               </au>
            </aug>
            <source>EMBO Rep</source>
            <pubdate>2006</pubdate>
            <volume>7</volume>
            <issue>12</issue>
            <fpage>1216</fpage>
            <lpage>1222</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1794690</pubid>
                  <pubid idtype="pmpid" link="fulltext">17139297</pubid>
                  <pubid idtype="doi">10.1038/sj.embor.7400857</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B16">
            <title>
               <p>Expression of alternatively spliced FGF-2 antisense RNA transcripts in the central nervous system: regulation of FGF-2 mRNA translation</p>
            </title>
            <aug>
               <au>
                  <snm>Li</snm>
                  <fnm>AW</fnm>
               </au>
               <au>
                  <snm>Murphy</snm>
                  <fnm>PR</fnm>
               </au>
            </aug>
            <source>Mol Cell Endocrinol</source>
            <pubdate>2000</pubdate>
            <volume>170</volume>
            <issue>1-2</issue>
            <fpage>233</fpage>
            <lpage>242</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/S0303-7207(00)00440-8</pubid>
                  <pubid idtype="pmpid">11162906</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B17">
            <title>
               <p>Inhibition of c-erbA mRNA splicing by a naturally occurring antisense RNA</p>
            </title>
            <aug>
               <au>
                  <snm>Munroe</snm>
                  <fnm>SH</fnm>
               </au>
               <au>
                  <snm>Lazar</snm>
                  <fnm>MA</fnm>
               </au>
            </aug>
            <source>J Biol Chem</source>
            <pubdate>1991</pubdate>
            <volume>266</volume>
            <issue>33</issue>
            <fpage>22083</fpage>
            <lpage>22086</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">1657988</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B18">
            <title>
               <p>RNA editing and regulation of Drosophila 4f-rnp expression by sas-10 antisense readthrough mRNA transcripts</p>
            </title>
            <aug>
               <au>
                  <snm>Peters</snm>
                  <fnm>NT</fnm>
               </au>
               <au>
                  <snm>Rohrbach</snm>
                  <fnm>JA</fnm>
               </au>
               <au>
                  <snm>Zalewski</snm>
                  <fnm>BA</fnm>
               </au>
               <au>
                  <snm>Byrkett</snm>
                  <fnm>CM</fnm>
               </au>
               <au>
                  <snm>Vaughn</snm>
                  <fnm>JC</fnm>
               </au>
            </aug>
            <source>Rna</source>
            <pubdate>2003</pubdate>
            <volume>9</volume>
            <issue>6</issue>
            <fpage>698</fpage>
            <lpage>710</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1370437</pubid>
                  <pubid idtype="pmpid" link="fulltext">12756328</pubid>
                  <pubid idtype="doi">10.1261/rna.2120703</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B19">
            <title>
               <p>The non-coding Air RNA is required for silencing autosomal imprinted genes</p>
            </title>
            <aug>
               <au>
                  <snm>Sleutels</snm>
                  <fnm>F</fnm>
               </au>
               <au>
                  <snm>Zwart</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Barlow</snm>
                  <fnm>DP</fnm>
               </au>
            </aug>
            <source>Nature</source>
            <pubdate>2002</pubdate>
            <volume>415</volume>
            <issue>6873</issue>
            <fpage>810</fpage>
            <lpage>813</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">11845212</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B20">
            <title>
               <p>Transcription of antisense RNA leading to gene silencing and methylation as a novel cause of human genetic disease</p>
            </title>
            <aug>
               <au>
                  <snm>Tufarelli</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Stanley</snm>
                  <fnm>JA</fnm>
               </au>
               <au>
                  <snm>Garrick</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Sharpe</snm>
                  <fnm>JA</fnm>
               </au>
               <au>
                  <snm>Ayyub</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Wood</snm>
                  <fnm>WG</fnm>
               </au>
               <au>
                  <snm>Higgs</snm>
                  <fnm>DR</fnm>
               </au>
            </aug>
            <source>Nat Genet</source>
            <pubdate>2003</pubdate>
            <volume>34</volume>
            <issue>2</issue>
            <fpage>157</fpage>
            <lpage>165</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/ng1157</pubid>
                  <pubid idtype="pmpid" link="fulltext">12730694</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B21">
            <title>
               <p>A unique gene organization for two cholinergic markers, choline acetyltransferase and a putative vesicular transporter of acetylcholine</p>
            </title>
            <aug>
               <au>
                  <snm>Bejanin</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Cervini</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Mallet</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Berrard</snm>
                  <fnm>S</fnm>
               </au>
            </aug>
            <source>J Biol Chem</source>
            <pubdate>1994</pubdate>
            <volume>269</volume>
            <issue>35</issue>
            <fpage>21944</fpage>
            <lpage>21947</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">8071313</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B22">
            <title>
               <p>Repression of the human dihydrofolate reductase gene by a non-coding interfering transcript</p>
            </title>
            <aug>
               <au>
                  <snm>Martianov</snm>
                  <fnm>I</fnm>
               </au>
               <au>
                  <snm>Ramadass</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Serra Barros</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Chow</snm>
                  <fnm>N</fnm>
               </au>
               <au>
                  <snm>Akoulitchev</snm>
                  <fnm>A</fnm>
               </au>
            </aug>
            <source>Nature</source>
            <pubdate>2007</pubdate>
            <volume>445</volume>
            <issue>7128</issue>
            <fpage>666</fpage>
            <lpage>670</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/nature05519</pubid>
                  <pubid idtype="pmpid" link="fulltext">17237763</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B23">
            <title>
               <p>Oscillating evolution of a mammalian locus with overlapping reading frames: an XLalphas/ALEX relay</p>
            </title>
            <aug>
               <au>
                  <snm>Nekrutenko</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Wadhawan</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Goetting-Minesky</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Makova</snm>
                  <fnm>KD</fnm>
               </au>
            </aug>
            <source>PLoS Genet</source>
            <pubdate>2005</pubdate>
            <volume>1</volume>
            <issue>2</issue>
            <fpage>e18</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1186735</pubid>
                  <pubid idtype="pmpid" link="fulltext">16110341</pubid>
                  <pubid idtype="doi">10.1371/journal.pgen.0010018</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B24">
            <title>
               <p>Regulating gene expression through RNA nuclear retention</p>
            </title>
            <aug>
               <au>
                  <snm>Prasanth</snm>
                  <fnm>KV</fnm>
               </au>
               <au>
                  <snm>Prasanth</snm>
                  <fnm>SG</fnm>
               </au>
               <au>
                  <snm>Xuan</snm>
                  <fnm>Z</fnm>
               </au>
               <au>
                  <snm>Hearn</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Freier</snm>
                  <fnm>SM</fnm>
               </au>
               <au>
                  <snm>Bennett</snm>
                  <fnm>CF</fnm>
               </au>
               <au>
                  <snm>Zhang</snm>
                  <fnm>MQ</fnm>
               </au>
               <au>
                  <snm>Spector</snm>
                  <fnm>DL</fnm>
               </au>
            </aug>
            <source>Cell</source>
            <pubdate>2005</pubdate>
            <volume>123</volume>
            <issue>2</issue>
            <fpage>249</fpage>
            <lpage>263</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/j.cell.2005.08.033</pubid>
                  <pubid idtype="pmpid" link="fulltext">16239143</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B25">
            <title>
               <p>Naturally occurring antisense: transcriptional leakage or real overlap?</p>
            </title>
            <aug>
               <au>
                  <snm>Dahary</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Elroy-Stein</snm>
                  <fnm>O</fnm>
               </au>
               <au>
                  <snm>Sorek</snm>
                  <fnm>R</fnm>
               </au>
            </aug>
            <source>Genome Res</source>
            <pubdate>2005</pubdate>
            <volume>15</volume>
            <issue>3</issue>
            <fpage>364</fpage>
            <lpage>368</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">551562</pubid>
                  <pubid idtype="pmpid" link="fulltext">15710751</pubid>
                  <pubid idtype="doi">10.1101/gr.3308405</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B26">
            <title>
               <p>Antisense transcripts with FANTOM2 clone set and their implications for gene regulation</p>
            </title>
            <aug>
               <au>
                  <snm>Kiyosawa</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Yamanaka</snm>
                  <fnm>I</fnm>
               </au>
               <au>
                  <snm>Osato</snm>
                  <fnm>N</fnm>
               </au>
               <au>
                  <snm>Kondo</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Hayashizaki</snm>
                  <fnm>Y</fnm>
               </au>
            </aug>
            <source>Genome Res</source>
            <pubdate>2003</pubdate>
            <volume>13</volume>
            <issue>6B</issue>
            <fpage>1324</fpage>
            <lpage>1334</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">403655</pubid>
                  <pubid idtype="pmpid" link="fulltext">12819130</pubid>
                  <pubid idtype="doi">10.1101/gr.982903</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B27">
            <title>
               <p>Widespread occurrence of antisense transcription in the human genome</p>
            </title>
            <aug>
               <au>
                  <snm>Yelin</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Dahary</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Sorek</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Levanon</snm>
                  <fnm>EY</fnm>
               </au>
               <au>
                  <snm>Goldstein</snm>
                  <fnm>O</fnm>
               </au>
               <au>
                  <snm>Shoshan</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Diber</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Biton</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Tamir</snm>
                  <fnm>Y</fnm>
               </au>
               <au>
                  <snm>Khosravi</snm>
                  <fnm>R</fnm>
               </au>
               <etal/>
            </aug>
            <source>Nat Biotechnol</source>
            <pubdate>2003</pubdate>
            <volume>21</volume>
            <issue>4</issue>
            <fpage>379</fpage>
            <lpage>386</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/nbt808</pubid>
                  <pubid idtype="pmpid" link="fulltext">12640466</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B28">
            <title>
               <p>Rapid evolution of noncoding RNAs: lack of conservation does not mean lack of function</p>
            </title>
            <aug>
               <au>
                  <snm>Pang</snm>
                  <fnm>KC</fnm>
               </au>
               <au>
                  <snm>Frith</snm>
                  <fnm>MC</fnm>
               </au>
               <au>
                  <snm>Mattick</snm>
                  <fnm>JS</fnm>
               </au>
            </aug>
            <source>Trends Genet</source>
            <pubdate>2006</pubdate>
            <volume>22</volume>
            <issue>1</issue>
            <fpage>1</fpage>
            <lpage>5</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/j.tig.2005.10.003</pubid>
                  <pubid idtype="pmpid" link="fulltext">16290135</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B29">
            <title>
               <p>Antisense transcripts in the human genome</p>
            </title>
            <aug>
               <au>
                  <snm>Lehner</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Williams</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Campbell</snm>
                  <fnm>RD</fnm>
               </au>
               <au>
                  <snm>Sanderson</snm>
                  <fnm>CM</fnm>
               </au>
            </aug>
            <source>Trends Genet</source>
            <pubdate>2002</pubdate>
            <volume>18</volume>
            <issue>2</issue>
            <fpage>63</fpage>
            <lpage>65</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/S0168-9525(02)02598-2</pubid>
                  <pubid idtype="pmpid" link="fulltext">11818131</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B30">
            <title>
               <p>Computational discovery of sense-antisense transcription in the human and mouse genomes</p>
            </title>
            <aug>
               <au>
                  <snm>Shendure</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Church</snm>
                  <fnm>GM</fnm>
               </au>
            </aug>
            <source>Genome Biol</source>
            <pubdate>2002</pubdate>
            <volume>3</volume>
            <issue>9</issue>
            <fpage>RESEARCH0044</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">126869</pubid>
                  <pubid idtype="pmpid" link="fulltext">12225583</pubid>
                  <pubid idtype="doi">10.1186/gb-2002-3-9-research0044</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B31">
            <title>
               <p>Annotation of the Drosophila melanogaster euchromatic genome: a systematic review</p>
            </title>
            <aug>
               <au>
                  <snm>Misra</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Crosby</snm>
                  <fnm>MA</fnm>
               </au>
               <au>
                  <snm>Mungall</snm>
                  <fnm>CJ</fnm>
               </au>
               <au>
                  <snm>Matthews</snm>
                  <fnm>BB</fnm>
               </au>
               <au>
                  <snm>Campbell</snm>
                  <fnm>KS</fnm>
               </au>
               <au>
                  <snm>Hradecky</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Huang</snm>
                  <fnm>Y</fnm>
               </au>
               <au>
                  <snm>Kaminker</snm>
                  <fnm>JS</fnm>
               </au>
               <au>
                  <snm>Millburn</snm>
                  <fnm>GH</fnm>
               </au>
               <au>
                  <snm>Prochnik</snm>
                  <fnm>SE</fnm>
               </au>
               <etal/>
            </aug>
            <source>Genome Biol</source>
            <pubdate>2002</pubdate>
            <volume>3</volume>
            <issue>12</issue>
            <fpage>RESEARCH0083</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">151185</pubid>
                  <pubid idtype="pmpid" link="fulltext">12537572</pubid>
                  <pubid idtype="doi">10.1186/gb-2002-3-12-research0083</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B32">
            <title>
               <p>Origins of genes: "big bang" or continuous creation?</p>
            </title>
            <aug>
               <au>
                  <snm>Keese</snm>
                  <fnm>PK</fnm>
               </au>
               <au>
                  <snm>Gibbs</snm>
                  <fnm>A</fnm>
               </au>
            </aug>
            <source>Proc Natl Acad Sci USA</source>
            <pubdate>1992</pubdate>
            <volume>89</volume>
            <issue>20</issue>
            <fpage>9489</fpage>
            <lpage>9493</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">50157</pubid>
                  <pubid idtype="pmpid" link="fulltext">1329098</pubid>
                  <pubid idtype="doi">10.1073/pnas.89.20.9489</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B33">
            <title>
               <p>Transcriptional Interferences in cis Natural Antisense Transcripts of Humans and Mice</p>
            </title>
            <aug>
               <au>
                  <snm>Osato</snm>
                  <fnm>N</fnm>
               </au>
               <au>
                  <snm>Suzuki</snm>
                  <fnm>Y</fnm>
               </au>
               <au>
                  <snm>Ikeo</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Gojobori</snm>
                  <fnm>T</fnm>
               </au>
            </aug>
            <source>Genetics</source>
            <pubdate>2007</pubdate>
            <volume>176</volume>
            <issue>2</issue>
            <fpage>1299</fpage>
            <lpage>1306</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1894591</pubid>
                  <pubid idtype="pmpid" link="fulltext">17409075</pubid>
                  <pubid idtype="doi">10.1534/genetics.106.069484</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B34">
            <title>
               <p>Transcriptional collision between convergent genes in budding yeast</p>
            </title>
            <aug>
               <au>
                  <snm>Prescott</snm>
                  <fnm>EM</fnm>
               </au>
               <au>
                  <snm>Proudfoot</snm>
                  <fnm>NJ</fnm>
               </au>
            </aug>
            <source>Proc Natl Acad Sci USA</source>
            <pubdate>2002</pubdate>
            <volume>99</volume>
            <issue>13</issue>
            <fpage>8796</fpage>
            <lpage>8801</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">124378</pubid>
                  <pubid idtype="pmpid" link="fulltext">12077310</pubid>
                  <pubid idtype="doi">10.1073/pnas.132270899</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B35">
            <title>
               <p>In-frame overlapping genes: the challenges for regulating gene expression</p>
            </title>
            <aug>
               <au>
                  <snm>Yu</snm>
                  <fnm>JS</fnm>
               </au>
               <au>
                  <snm>Kokoska</snm>
                  <fnm>RJ</fnm>
               </au>
               <au>
                  <snm>Khemici</snm>
                  <fnm>V</fnm>
               </au>
               <au>
                  <snm>Steege</snm>
                  <fnm>DA</fnm>
               </au>
            </aug>
            <source>Mol Microbiol</source>
            <pubdate>2007</pubdate>
            <volume>63</volume>
            <issue>4</issue>
            <fpage>1158</fpage>
            <lpage>1172</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1111/j.1365-2958.2006.05572.x</pubid>
                  <pubid idtype="pmpid" link="fulltext">17238928</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B36">
            <title>
               <p>Caenorhabditis elegans operons: form and function</p>
            </title>
            <aug>
               <au>
                  <snm>Blumenthal</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Gleason</snm>
                  <fnm>KS</fnm>
               </au>
            </aug>
            <source>Nat Rev Genet</source>
            <pubdate>2003</pubdate>
            <volume>4</volume>
            <issue>2</issue>
            <fpage>112</fpage>
            <lpage>120</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/nrg995</pubid>
                  <pubid idtype="pmpid" link="fulltext">12560808</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B37">
            <title>
               <p>On the paucity of duplicated genes in Caenorhabditis elegans operons</p>
            </title>
            <aug>
               <au>
                  <snm>Cavalcanti</snm>
                  <fnm>AR</fnm>
               </au>
               <au>
                  <snm>Stover</snm>
                  <fnm>NA</fnm>
               </au>
               <au>
                  <snm>Landweber</snm>
                  <fnm>LF</fnm>
               </au>
            </aug>
            <source>J Mol Evol</source>
            <pubdate>2006</pubdate>
            <volume>62</volume>
            <issue>6</issue>
            <fpage>765</fpage>
            <lpage>771</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1007/s00239-005-0203-3</pubid>
                  <pubid idtype="pmpid" link="fulltext">16752214</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B38">
            <title>
               <p>Comparative analysis of cis-encoded antisense RNAs in eukaryotes</p>
            </title>
            <aug>
               <au>
                  <snm>Numata</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Okada</snm>
                  <fnm>Y</fnm>
               </au>
               <au>
                  <snm>Saito</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Kiyosawa</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Kanai</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Tomita</snm>
                  <fnm>M</fnm>
               </au>
            </aug>
            <source>Gene</source>
            <pubdate>2007</pubdate>
            <volume>392</volume>
            <issue>1&#8211;2</issue>
            <fpage>134</fpage>
            <lpage>141</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/j.gene.2006.12.005</pubid>
                  <pubid idtype="pmpid" link="fulltext">17250976</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B39">
            <title>
               <p>Evidence for a preferential targeting of 3'-UTRs by cis-encoded natural antisense transcripts</p>
            </title>
            <aug>
               <au>
                  <snm>Sun</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Hurst</snm>
                  <fnm>LD</fnm>
               </au>
               <au>
                  <snm>Carmichael</snm>
                  <fnm>GG</fnm>
               </au>
               <au>
                  <snm>Chen</snm>
                  <fnm>J</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>2005</pubdate>
            <volume>33</volume>
            <issue>17</issue>
            <fpage>5533</fpage>
            <lpage>5543</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1243798</pubid>
                  <pubid idtype="pmpid" link="fulltext">16204454</pubid>
                  <pubid idtype="doi">10.1093/nar/gki852</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B40">
            <title>
               <p>Duplication, degeneration and subfunctionalization of the nested synapsin-Timp genes in Fugu</p>
            </title>
            <aug>
               <au>
                  <snm>Yu</snm>
                  <fnm>WP</fnm>
               </au>
               <au>
                  <snm>Brenner</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Venkatesh</snm>
                  <fnm>B</fnm>
               </au>
            </aug>
            <source>Trends Genet</source>
            <pubdate>2003</pubdate>
            <volume>19</volume>
            <issue>4</issue>
            <fpage>180</fpage>
            <lpage>183</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/S0168-9525(03)00048-9</pubid>
                  <pubid idtype="pmpid" link="fulltext">12683968</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B41">
            <title>
               <p>Alternative splicing: new insights from global analyses</p>
            </title>
            <aug>
               <au>
                  <snm>Blencowe</snm>
                  <fnm>BJ</fnm>
               </au>
            </aug>
            <source>Cell</source>
            <pubdate>2006</pubdate>
            <volume>126</volume>
            <issue>1</issue>
            <fpage>37</fpage>
            <lpage>47</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/j.cell.2006.06.023</pubid>
                  <pubid idtype="pmpid" link="fulltext">16839875</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B42">
            <title>
               <p>Genome-wide survey of human alternative pre-mRNA splicing with exon junction microarrays</p>
            </title>
            <aug>
               <au>
                  <snm>Johnson</snm>
                  <fnm>JM</fnm>
               </au>
               <au>
                  <snm>Castle</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Garrett-Engele</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Kan</snm>
                  <fnm>Z</fnm>
               </au>
               <au>
                  <snm>Loerch</snm>
                  <fnm>PM</fnm>
               </au>
               <au>
                  <snm>Armour</snm>
                  <fnm>CD</fnm>
               </au>
               <au>
                  <snm>Santos</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Schadt</snm>
                  <fnm>EE</fnm>
               </au>
               <au>
                  <snm>Stoughton</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Shoemaker</snm>
                  <fnm>DD</fnm>
               </au>
            </aug>
            <source>Science</source>
            <pubdate>2003</pubdate>
            <volume>302</volume>
            <issue>5653</issue>
            <fpage>2141</fpage>
            <lpage>2144</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1126/science.1090100</pubid>
                  <pubid idtype="pmpid" link="fulltext">14684825</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B43">
            <title>
               <p>Alternative splicing in the human, mouse and rat genomes is associated with an increased frequency of exon creation and/or loss</p>
            </title>
            <aug>
               <au>
                  <snm>Modrek</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Lee</snm>
                  <fnm>CJ</fnm>
               </au>
            </aug>
            <source>Nat Genet</source>
            <pubdate>2003</pubdate>
            <volume>34</volume>
            <issue>2</issue>
            <fpage>177</fpage>
            <lpage>180</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/ng1159</pubid>
                  <pubid idtype="pmpid" link="fulltext">12730695</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B44">
            <title>
               <p>Biometry</p>
            </title>
            <aug>
               <au>
                  <snm>Sokal</snm>
                  <fnm>RR</fnm>
               </au>
               <au>
                  <snm>Rohlf</snm>
                  <fnm>FJ</fnm>
               </au>
            </aug>
            <publisher>New York, USA: W.H. Freeman &amp; Company</publisher>
            <edition>3</edition>
            <pubdate>1995</pubdate>
         </bibl>
         <bibl id="B45">
            <title>
               <p>Prominent use of distal 5' transcription start sites and discovery of a large number of additional exons in ENCODE regions</p>
            </title>
            <aug>
               <au>
                  <snm>Denoeud</snm>
                  <fnm>F</fnm>
               </au>
               <au>
                  <snm>Kapranov</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Ucla</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Frankish</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Castelo</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Drenkow</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Lagarde</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Alioto</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Manzano</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Chrast</snm>
                  <fnm>J</fnm>
               </au>
               <etal/>
            </aug>
            <source>Genome Res</source>
            <pubdate>2007</pubdate>
            <volume>17</volume>
            <issue>6</issue>
            <fpage>746</fpage>
            <lpage>759</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1891335</pubid>
                  <pubid idtype="pmpid" link="fulltext">17567994</pubid>
                  <pubid idtype="doi">10.1101/gr.5660607</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B46">
            <title>
               <p>Genome sequencing in microfabricated high-density picolitre reactors</p>
            </title>
            <aug>
               <au>
                  <snm>Margulies</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Egholm</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Altman</snm>
                  <fnm>WE</fnm>
               </au>
               <au>
                  <snm>Attiya</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Bader</snm>
                  <fnm>JS</fnm>
               </au>
               <au>
                  <snm>Bemben</snm>
                  <fnm>LA</fnm>
               </au>
               <au>
                  <snm>Berka</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Braverman</snm>
                  <fnm>MS</fnm>
               </au>
               <au>
                  <snm>Chen</snm>
                  <fnm>YJ</fnm>
               </au>
               <au>
                  <snm>Chen</snm>
                  <fnm>Z</fnm>
               </au>
               <etal/>
            </aug>
            <source>Nature</source>
            <pubdate>2005</pubdate>
            <volume>437</volume>
            <issue>7057</issue>
            <fpage>376</fpage>
            <lpage>380</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1464427</pubid>
                  <pubid idtype="pmpid" link="fulltext">16056220</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B47">
            <title>
               <p>Genome-wide analysis of coordinate expression and evolution of human cis-encoded sense-antisense transcripts</p>
            </title>
            <aug>
               <au>
                  <snm>Chen</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Sun</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Hurst</snm>
                  <fnm>LD</fnm>
               </au>
               <au>
                  <snm>Carmichael</snm>
                  <fnm>GG</fnm>
               </au>
               <au>
                  <snm>Rowley</snm>
                  <fnm>JD</fnm>
               </au>
            </aug>
            <source>Trends Genet</source>
            <pubdate>2005</pubdate>
            <volume>21</volume>
            <issue>6</issue>
            <fpage>326</fpage>
            <lpage>329</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/j.tig.2005.04.006</pubid>
                  <pubid idtype="pmpid" link="fulltext">15922830</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B48">
            <title>
               <p>NCBI Reference Sequence (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins</p>
            </title>
            <aug>
               <au>
                  <snm>Pruitt</snm>
                  <fnm>KD</fnm>
               </au>
               <au>
                  <snm>Tatusova</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Maglott</snm>
                  <fnm>DR</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>2005</pubdate>
            <issue>33 Database</issue>
            <fpage>D501</fpage>
            <lpage>504</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">539979</pubid>
                  <pubid idtype="pmpid" link="fulltext">15608248</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B49">
            <title>
               <p>The UCSC ftp Web site</p>
            </title>
            <url>ftp://hgdownload.cse.ucsc.edu/</url>
         </bibl>
         <bibl id="B50">
            <title>
               <p>The C. elegans Genome Database</p>
            </title>
            <url>http://www.wormbase.org/</url>
         </bibl>
         <bibl id="B51">
            <title>
               <p>RIKEN Mouse Genome Project database</p>
            </title>
            <url>http://fantom.gsc.riken.go.jp/</url>
         </bibl>
         <bibl id="B52">
            <title>
               <p>The Drosophila melanogaster genome database</p>
            </title>
            <url>http://flybase.net/</url>
         </bibl>
         <bibl id="B53">
            <title>
               <p>UCSC Genome Bionformatics Site</p>
            </title>
            <url>http://genome.ucsc.edu/</url>
         </bibl>
         <bibl id="B54">
            <title>
               <p>BLAT&#8211;the BLAST-like alignment tool</p>
            </title>
            <aug>
               <au>
                  <snm>Kent</snm>
                  <fnm>WJ</fnm>
               </au>
            </aug>
            <source>Genome Res</source>
            <pubdate>2002</pubdate>
            <volume>12</volume>
            <issue>4</issue>
            <fpage>656</fpage>
            <lpage>664</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">187518</pubid>
                  <pubid idtype="pmpid" link="fulltext">11932250</pubid>
                  <pubid idtype="doi">10.1101/gr.229202. Article published online before March 2002</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B55">
            <title>
               <p>Mining SAGE data allows large-scale, sensitive screening of antisense transcript expression</p>
            </title>
            <aug>
               <au>
                  <snm>Quere</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Manchon</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Lejeune</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Clement</snm>
                  <fnm>O</fnm>
               </au>
               <au>
                  <snm>Pierrat</snm>
                  <fnm>F</fnm>
               </au>
               <au>
                  <snm>Bonafoux</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Commes</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Piquemal</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Marti</snm>
                  <fnm>J</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>2004</pubdate>
            <volume>32</volume>
            <issue>20</issue>
            <fpage>e163</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">534641</pubid>
                  <pubid idtype="pmpid" link="fulltext">15561998</pubid>
                  <pubid idtype="doi">10.1093/nar/gnh161</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B56">
            <title>
               <p>Human chromosome 7: DNA sequence and biology</p>
            </title>
            <aug>
               <au>
                  <snm>Scherer</snm>
                  <fnm>SW</fnm>
               </au>
               <au>
                  <snm>Cheung</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>MacDonald</snm>
                  <fnm>JR</fnm>
               </au>
               <au>
                  <snm>Osborne</snm>
                  <fnm>LR</fnm>
               </au>
               <au>
                  <snm>Nakabayashi</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Herbrick</snm>
                  <fnm>JA</fnm>
               </au>
               <au>
                  <snm>Carson</snm>
                  <fnm>AR</fnm>
               </au>
               <au>
                  <snm>Parker-Katiraee</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Skaug</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Khaja</snm>
                  <fnm>R</fnm>
               </au>
               <etal/>
            </aug>
            <source>Science</source>
            <pubdate>2003</pubdate>
            <volume>300</volume>
            <issue>5620</issue>
            <fpage>767</fpage>
            <lpage>772</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1126/science.1083423</pubid>
                  <pubid idtype="pmpid" link="fulltext">12690205</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B57">
            <title>
               <p>Nested genes in the human genome</p>
            </title>
            <aug>
               <au>
                  <snm>Yu</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Ma</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Xu</snm>
                  <fnm>M</fnm>
               </au>
            </aug>
            <source>Genomics</source>
            <pubdate>2005</pubdate>
            <volume>86</volume>
            <issue>4</issue>
            <fpage>414</fpage>
            <lpage>422</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/j.ygeno.2005.06.008</pubid>
                  <pubid idtype="pmpid" link="fulltext">16084061</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B58">
            <title>
               <p>Basic local alignment search tool</p>
            </title>
            <aug>
               <au>
                  <snm>Altschul</snm>
                  <fnm>SF</fnm>
               </au>
               <au>
                  <snm>Gish</snm>
                  <fnm>W</fnm>
               </au>
               <au>
                  <snm>Miller</snm>
                  <fnm>W</fnm>
               </au>
               <au>
                  <snm>Myers</snm>
                  <fnm>EW</fnm>
               </au>
               <au>
                  <snm>Lipman</snm>
                  <fnm>DJ</fnm>
               </au>
            </aug>
            <source>J Mol Biol</source>
            <pubdate>1990</pubdate>
            <volume>215</volume>
            <issue>3</issue>
            <fpage>403</fpage>
            <lpage>410</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">2231712</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B59">
            <title>
               <p>Simple cDNA normalization using kamchatka crab duplex-specific nuclease</p>
            </title>
            <aug>
               <au>
                  <snm>Zhulidov</snm>
                  <fnm>PA</fnm>
               </au>
               <au>
                  <snm>Bogdanova</snm>
                  <fnm>EA</fnm>
               </au>
               <au>
                  <snm>Shcheglov</snm>
                  <fnm>AS</fnm>
               </au>
               <au>
                  <snm>Vagner</snm>
                  <fnm>LL</fnm>
               </au>
               <au>
                  <snm>Khaspekov</snm>
                  <fnm>GL</fnm>
               </au>
               <au>
                  <snm>Kozhemyako</snm>
                  <fnm>VB</fnm>
               </au>
               <au>
                  <snm>Matz</snm>
                  <fnm>MV</fnm>
               </au>
               <au>
                  <snm>Meleshkevitch</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Moroz</snm>
                  <fnm>LL</fnm>
               </au>
               <au>
                  <snm>Lukyanov</snm>
                  <fnm>SA</fnm>
               </au>
               <etal/>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>2004</pubdate>
            <volume>32</volume>
            <issue>3</issue>
            <fpage>e37</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">373426</pubid>
                  <pubid idtype="pmpid" link="fulltext">14973331</pubid>
                  <pubid idtype="doi">10.1093/nar/gnh031</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
      </refgrp>
   </bm>
</art>
