<?xml version='1.0'?>
<!DOCTYPE art SYSTEM 'http://www.biomedcentral.com/xml/article.dtd'>
<art>
   <ui>gb-2006-7-7-r67</ui>
   <ji>GBJ</ji>
   <fm>
      <dochead>Research</dochead>
      <bibl>
         <title>
            <p>Fine scale structural variants distinguish the genomes of <it>Drosophila melanogaster </it>and <it>D. pseudoobscura</it></p>
         </title>
         <aug>
            <au id="A1" ca="yes">
               <snm>Macdonald</snm>
               <mi>J</mi>
               <fnm>Stuart</fnm>
               <insr iid="I1"/>
               <email>sjm@uci.edu</email>
            </au>
            <au id="A2">
               <snm>Long</snm>
               <mi>D</mi>
               <fnm>Anthony</fnm>
               <insr iid="I1"/>
               <email>tdlong@uci.edu</email>
            </au>
         </aug>
         <insg>
            <ins id="I1">
               <p>Department of Ecology and Evolutionary Biology, University of California, Irvine, CA, 92697-2525, USA</p>
            </ins>
         </insg>
         <source>Genome Biology</source>
         <issn>1465-6906</issn>
         <pubdate>2006</pubdate>
         <volume>7</volume>
         <issue>7</issue>
         <fpage>R67</fpage>
         <url>http://genomebiology.com/2006/7/7/R67</url>
         <xrefbib>
            <pubidlist>
               <pubid idtype="pmpid">16872532</pubid>
               <pubid idtype="doi">10.1186/gb-2006-7-7-r67</pubid>
            </pubidlist>
         </xrefbib>
      </bibl>
      <history>
         <rec>
            <date>
               <day>7</day>
               <month>3</month>
               <year>2006</year>
            </date>
         </rec>
         <revrec>
            <date>
               <day>30</day>
               <month>4</month>
               <year>2006</year>
            </date>
         </revrec>
         <acc>
            <date>
               <day>27</day>
               <month>7</month>
               <year>2006</year>
            </date>
         </acc>
         <pub>
            <date>
               <day>27</day>
               <month>07</month>
               <year>2006</year>
            </date>
         </pub>
      </history>
      <cpyrt>
         <year>2006</year>
         <collab>Macdonald and Long; licensee BioMed Central Ltd.</collab>
         <note>This is an open access article distributed under the terms of the Creative Commons Attribution License (<url>http://creativecommons.org/licenses/by/2.0</url>), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</note>
      </cpyrt>
      <shorttitle>
         <p>Structural differences between Drosophila genomes</p>
      </shorttitle>
      <shortabs>
         <p>Comparative genomics reveals fine-scale structural variants, including microinversions, distinguishing two diverged Drosophila species</p>
      </shortabs>
      <abs>
         <sec>
            <st>
               <p>Abstract</p>
            </st>
            <sec>
               <st>
                  <p>Background</p>
               </st>
               <p>A primary objective of comparative genomics is to identify genomic elements of functional significance that contribute to phenotypic diversity. Complex changes in genome structure (insertions, duplications, rearrangements, translocations) may be widespread, and have important effects on organismal diversity. Any survey of genomic variation is incomplete without an assessment of structural changes.</p>
            </sec>
            <sec>
               <st>
                  <p>Results</p>
               </st>
               <p>We re-examine the genome sequences of the diverged species <it>Drosophila melanogaster </it>and <it>D. pseudoobscura </it>to identify fine-scale structural features that distinguish the genomes. We detect 95 large insertion/deletion events that occur within the introns of orthologous gene pairs, the majority of which represent insertion of transposable elements. We also identify 143 microinversions below 5 kb in size. These microinversions reside within introns or just upstream or downstream of genes, and invert conserved DNA sequence. The sequence conservation within microinversions suggests they may be enriched for functional genetic elements, and their position with respect to known genes implicates them in the regulation of gene expression. Although we found a distinct pattern of GC content across microinversions, this was indistinguishable from the pattern observed across blocks of conserved non-coding sequence.</p>
            </sec>
            <sec>
               <st>
                  <p>Conclusion</p>
               </st>
               <p><it>Drosophila </it>has long been known as a genus harboring a variety of large inversions that disrupt chromosome colinearity. Here we demonstrate that microinversions, many of which are below 1 kb in length, located in/near genes may also be an important source of genetic variation in <it>Drosophila</it>. Further examination of other <it>Drosophila </it>genome sequences will likely identify an array of novel microinversion events.</p>
            </sec>
         </sec>
      </abs>
   </fm>
   <meta>
      <classifications>
         <classification type="BMC" subtype="man_spc_id" id="30010010">Genome studies</classification>
         <classification type="BMC" subtype="man_spc_id" id="30010008">Evolution</classification>
         <classification type="BMC" subtype="man_spc_id" id="30010015">Model organisms</classification>
      </classifications>
   </meta>
   <bdy>
      <sec>
         <st>
            <p>Background</p>
         </st>
         <p>A major aim of comparative and population genomics is to elucidate the inter- and intraspecific genetic variation that contributes to phenotypic change. Understandably, the community has focused on the most common source of genetic variation, substitutions at the nucleotide level <abbrgrp><abbr bid="B1">1</abbr><abbr bid="B2">2</abbr></abbrgrp>. However, any catalog of genetic variation is incomplete without an examination of other, potentially more complex, forms of sequence-level variation, for example, large insertions and deletions of DNA, rearrangements, and translocations. Such events have been shown to be important in human disease susceptibility <abbrgrp><abbr bid="B3">3</abbr><abbr bid="B4">4</abbr></abbrgrp>. Using the tremendous genomic resources available for humans and chimpanzees, recent work has characterized the pattern of large deletions segregating within the human genome <abbrgrp><abbr bid="B5">5</abbr><abbr bid="B6">6</abbr><abbr bid="B7">7</abbr><abbr bid="B8">8</abbr></abbrgrp>, polymorphic inversions in humans <abbrgrp><abbr bid="B5">5</abbr><abbr bid="B9">9</abbr></abbrgrp>, as well as structural genome differences between humans and chimps <abbrgrp><abbr bid="B9">9</abbr><abbr bid="B10">10</abbr></abbrgrp>.</p>
         <p>Traditionally, species of the <it>Drosophila </it>genus have been an important system for examining variation in chromosome structure. This is largely due to the ability to directly observe such variation from the banding patterns of salivary gland polytene chromosomes <abbrgrp><abbr bid="B11">11</abbr></abbrgrp>. As a consequence of this technique it has been shown that large paracentric inversions - those that do not include the centromere - frequently segregate in <it>Drosophila </it>species <abbrgrp><abbr bid="B12">12</abbr><abbr bid="B13">13</abbr></abbrgrp>. Since inversions can become fixed during evolution, they can also drive architectural differences between the genomes of diverged species. The species <it>D. melanogaster </it>and <it>D. pseudoobscura </it>diverged 25 to 55 million years ago <abbrgrp><abbr bid="B14">14</abbr></abbrgrp>, and comparative analysis of the sequenced genomes of the two species has shown radical shuffling of regions within orthologous chromosome arms, likely via a series of overlapping paracentric inversions <abbrgrp><abbr bid="B15">15</abbr></abbrgrp>. Similar observations have also been made in comparisons of other <it>Drosophila </it>species <abbrgrp><abbr bid="B16">16</abbr><abbr bid="B17">17</abbr><abbr bid="B18">18</abbr></abbrgrp>.</p>
         <p>Most of the work on <it>Drosophila </it>inversions has examined those large events, much greater than a megabase in length, that disrupt chromosome colinearity and gene order. Nevertheless, very small paracentric inversions (below a few kilobases in length) that do not affect gene order may also be common in <it>Drosophila</it>. Indeed, Negre <it>et al</it>. <abbrgrp><abbr bid="B19">19</abbr></abbrgrp> recently demonstrated the existence of such microinversions in the <it>Drosophila </it>genes <it>labial </it>and <it>proboscipedia</it>. Here, we re-examine the <it>D. melanogaster </it>and <it>D. pseudoobscura </it>genomes to identify fine-scale structural differences between the species. Using a gene-by-gene sliding window BLAST strategy we identify 95 large insertion/deletion events, the majority of which represent insertions of transposable elements into one of the two genomes. We also identify 143 microinversions, 77.6% of which are below 1 kb in size. Sequence conservation within the microinversion is high (74.9%), suggesting they may harbor functional elements. Since we find microinversions in introns and immediately upstream and downstream of transcribed regions, it is plausible that microinversions act as regulators of alternative splicing and gene expression. Our analyses further confirm the role of inversions as an important source of genome variation in <it>Drosophila </it>evolution, showing that inversions in <it>Drosophila </it>can act to rearrange sequences at a sub-genic level.</p>
      </sec>
      <sec>
         <st>
            <p>Results and discussion</p>
         </st>
         <p>Using the genome sequences of the two fruitfly species <it>D. melanogaster </it><abbrgrp><abbr bid="B20">20</abbr><abbr bid="B21">21</abbr></abbrgrp> and <it>D. pseudoobscura </it><abbrgrp><abbr bid="B15">15</abbr></abbrgrp>, we identified 11,011 orthologous gene pairs. This is not inconsistent with the 10,516 orthologs identified by Richards <it>et al</it>. <abbrgrp><abbr bid="B15">15</abbr></abbrgrp>. For each orthologous pair, using a sliding-window framework we BLASTed overlapping, short 31 base-pair (bp) fragments of the <it>D. melanogaster </it>gene sequence against the <it>D. pseudoobscura </it>ortholog. Recording the details of each BLAST hit allowed fine-scale structural changes (inversions, insertion/deletion events) occurring since the separation of the <it>D. melanogaster </it>and <it>D. pseudoobscura </it>lineages to be identified.</p>
         <p>The bulk of transcribed DNA sequence in <it>Drosophila </it>does not code for protein, and may diverge rapidly between species. As <it>D. melanogaster </it>and <it>D. pseudoobscura </it>are diverged by 25 to 55 million years <abbrgrp><abbr bid="B14">14</abbr></abbrgrp>, many transcribed regions may show generally low sequence conservation. In such cases, the power of any approach to detect fine-scale structural variation will be limited. Although a pairwise whole-genome alignment of <it>D. melanogaster </it>and <it>D. pseudoobscura </it>is available, just 48% of bases can be reliably aligned <abbrgrp><abbr bid="B15">15</abbr></abbrgrp>. Hence, to be confident that tested pairs of sequences are identical by descent, we examined only transcribed regions showing clear evidence for orthology. For analysis we retained 5,738/11,011 (52.1%) conserved orthologous gene pairs (see Materials and methods and Additional data file 1). These orthologs span 42.2 Mb of sequence in <it>D. melanogaster</it>, which represents 35.6% of the 118.4 Mb release 4.2.1 <it>D. melanogaster </it>genome sequence.</p>
         <sec>
            <st>
               <p>Intragenic insertion/deletion events</p>
            </st>
            <p>We detected 95 large, intronic insertion/deletion events (indels) distributed across 86 of the 5,738 (1.5%) orthologous gene pairs: 80 genes have a single indel, three genes have two indels, and three genes have three indels (Additional data file 2). Since the 5,738 genes span 42.2 Mb of sequence in <it>D. melanogaster</it>, this suggests the rate of large insertion/deletion events is around 2.3 per Mb. The observed number of indel-harboring genes on each of the five major <it>Drosophila </it>chromosome arms is not significantly different from expectation (Table <tblr tid="T1">1</tblr>). The size of the inserted sequence ranges from 1,372 bp to 46,889 bp (mean 7,869 bp; standard deviation (SD) 7,347 bp), and 79/95 (83.2%) of the indels have the insertion in the <it>D. melanogaster </it>genome.</p>
            <tbl id="T1">
               <title>
                  <p>Table 1</p>
               </title>
               <caption>
                  <p>Distribution of fine-scale structural features across chromosome arms</p>
               </caption>
               <tblbdy cols="4">
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c cspan="3" ca="center">
                        <p>Number of orthologous pairs</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c cspan="3">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Chromosome arm*</p>
                     </c>
                     <c ca="center">
                        <p>Tested<sup>&#8224;</sup></p>
                     </c>
                     <c ca="center">
                        <p>Harboring microinversions<sup>&#8225;</sup></p>
                     </c>
                     <c ca="center">
                        <p>Harboring intragenic indels<sup>&#8225;</sup></p>
                     </c>
                  </r>
                  <r>
                     <c cspan="4">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <it>X</it>
                        </p>
                     </c>
                     <c ca="center">
                        <p>880</p>
                     </c>
                     <c ca="center">
                        <p>24 (0.111)</p>
                     </c>
                     <c ca="center">
                        <p>12 (0.889)</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <it>2L</it>
                        </p>
                     </c>
                     <c ca="center">
                        <p>997</p>
                     </c>
                     <c ca="center">
                        <p>34 (0.003)</p>
                     </c>
                     <c ca="center">
                        <p>19 (0.294)</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <it>2R</it>
                        </p>
                     </c>
                     <c ca="center">
                        <p>1,120</p>
                     </c>
                     <c ca="center">
                        <p>7 (&lt;0.001)</p>
                     </c>
                     <c ca="center">
                        <p>15 (0.805)</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <it>3L</it>
                        </p>
                     </c>
                     <c ca="center">
                        <p>1,177</p>
                     </c>
                     <c ca="center">
                        <p>21 (0.752)</p>
                     </c>
                     <c ca="center">
                        <p>22 (0.279)</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <it>3R</it>
                        </p>
                     </c>
                     <c ca="center">
                        <p>1,560</p>
                     </c>
                     <c ca="center">
                        <p>26 (0.464)</p>
                     </c>
                     <c ca="center">
                        <p>18 (0.297)</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>All 5 major arms</p>
                     </c>
                     <c ca="center">
                        <p>5,734</p>
                     </c>
                     <c ca="center">
                        <p>112</p>
                     </c>
                     <c ca="center">
                        <p>86</p>
                     </c>
                  </r>
               </tblbdy>
               <tblfn>
                  <p>*The chromosome arms are given the <it>D. melanogaster </it>designations, <it>X</it>, <it>2L</it>, <it>2R</it>, <it>3L</it>, and <it>3R</it>. These arms are known to be orthologous to <it>D. pseudoobscura </it>arms, <it>XL</it>, <it>4</it>, <it>3</it>, <it>XR</it>, and <it>2</it>, respectively [58]. <sup>&#8224;</sup>The number of conserved orthologs residing on each arm. <sup>&#8225;</sup>Values in parentheses are <it>P </it>values from a two-sided Binomial test of whether the number of event-harboring orthologs per arm differs from expectation. For each test, the number of trials equals the number of conserved orthologs per arm, the number of successes equals the number of event-harboring genes per arm, and the probability of success is equal to the total number of event-harboring genes detected divided by the total number of conserved orthologs tested (5,738).</p>
               </tblfn>
            </tbl>
            <p>Large insertion/deletion events distinguishing orthologous genomic regions can indicate the presence/absence of transposable elements (TEs) <abbrgrp><abbr bid="B22">22</abbr></abbrgrp>. To examine whether the indels we detect represent insertions, we used TE annotations for the <it>D. melanogaster </it>genome sequence <abbrgrp><abbr bid="B23">23</abbr><abbr bid="B24">24</abbr></abbrgrp>, and also compared insert sequences against <it>Drosophila </it>TE sequences using BLAST (see Materials and methods for details). Of the 79 indel events showing the insertion in the <it>D. melanogaster </it>genome, 70 (88.6%) map to an annotated TE, and 69 of these also BLAST against known <it>Drosophila </it>TE sequences. For those indels where the insertion is in the <it>D. pseudoobscura </it>genome, 6/16 (37.5%) insertion sequences BLAST to TEs. Since <it>D. pseudoobscura </it>TEs are less well curated than those of <it>D. melanogaster</it>, it is possible that some/most of the remaining ten indels with insertions in <it>D. pseudoobscura </it>are also TEs. Thus, the majority of the indels we identify likely represent TE insertions.</p>
            <p>In our analysis we detect TEs indirectly, and in an unbiased fashion, via the identification of large indels. Hence, our observation that the majority of indels have the insertion in the <it>D. melanogaster </it>genome suggests that <it>D. melanogaster </it>introns harbor more TEs than <it>D. pseudoobscura </it>introns. This corroborates the finding of Caspi and Pachter <abbrgrp><abbr bid="B22">22</abbr></abbrgrp> that most of the identifiable TEs in a four <it>Drosophila </it>species genome alignment are present solely in the <it>D. melanogaster </it>lineage, and represent recent insertions in this species. Given these results, we might suspect that the size of orthologous introns would be greater in <it>D. melanogaster </it>than in <it>D. pseudoobscura</it>. Indeed, while the lengths of orthologous introns are highly correlated between these species <abbrgrp><abbr bid="B25">25</abbr></abbrgrp>, there is a very slight skew towards larger introns in <it>D. melanogaster </it>(see supplemental Figure S1 of Richards <it>et al</it>. <abbrgrp><abbr bid="B15">15</abbr></abbrgrp>). However, Yandell <it>et al</it>. <abbrgrp><abbr bid="B25">25</abbr></abbrgrp> note that while some orthologous introns with highly divergent lengths in the two species may be due to TE insertions (validated by results presented here), most of the differences in the size are subtle and not easily explained by transposons.</p>
         </sec>
         <sec>
            <st>
               <p>Microinversions</p>
            </st>
            <p>We detected 121 small inversions within 93/5,738 (1.6%) orthologous gene pairs: 75 genes harbor a single inversion, 10 genes have two inversions, six genes have three inversions, and two genes have four inversions (Additional data file 3). On average, there are 2.9 microinversion events per Mb of transcribed sequence, suggesting that the rate of microinversion may be similar to the rate of large insertion/deletion - primarily TE insertion - events (2.3 events/Mb, see above). One of the intragenic inversions (CG31481_inv1) corresponds to the single <it>D. melanogaster</it>-<it>D. pseudoobscura </it>microinversion detected by Negre <it>et al</it>. <abbrgrp><abbr bid="B19">19</abbr></abbrgrp> in the <it>proboscipedia </it>gene. The top panel of Figure <figr fid="F1">1</figr> shows an example of a typical sliding-window BLAST profile, highlighting an inversion event. One possibility is that the events we identify as microinversions are in fact the result of genome assembly artifacts. To rule this out, three of the inversions (CG3578_inv1, CG3936_inv4, and CG32139_inv1) were confirmed by PCR/resequencing of the inversion breakpoints in both <it>D. melanogaster </it>and <it>D. pseudoobscura</it>. Also, for each of the 54 intragenic microinversion events less than 500 bp in size in both species we BLASTed the putatively inverted sequence, including 100 bp flanking each breakpoint, against databases of shotgun sequencing reads. When the orientation of the inversion observed in the assembled genome (relative to flanking sequences) is preserved in one or more reads, we can be confident that the microinversion events we detect are not due to errors in genome assembly. Over the 54 inversions, 51 (94.4%) correctly BLAST to at least one read for both species, and on average, inversions correctly BLAST to 10.0 (6.6) sequence reads in <it>D. melanogaster </it>(<it>D. pseudoobscura</it>). There were no BLAST hits to reads with sequences inconsistent with the inversion orientation in the genome assembly. We conclude that the microinversions we detect are likely real, and not caused by genome assembly artifacts.</p>
            <fig id="F1">
               <title>
                  <p>Figure 1</p>
               </title>
               <caption>
                  <p>Sequence similarity between <it>Drosophila melanogaster </it>(<it>D. mel</it>) and <it>D. pseudoobscura </it>(<it>D. pse</it>) for the <it>Sox21b </it>(CG32139) gene</p>
               </caption>
               <text>
                  <p>Sequence similarity between <it>Drosophila melanogaster </it>(<it>D. mel</it>) and <it>D. pseudoobscura </it>(<it>D. pse</it>) for the <it>Sox21b </it>(CG32139) gene. Top panel: sliding-window BLAST profile. We stepped through <it>D. melanogaster Sox21b </it>gene in 15 bp increments, and at each position BLASTed a 31 bp segment against the <it>D. pseudoobscura </it>ortholog. Each line represents a BLAST hit with a score above 45, the endpoints show the position of the hit in each genome, and the color of the line represents the orientation of the hit (black = same sequence orientation in each genome, red = different orientations in each genome). Central panel: structure of the <it>Sox21b </it>gene in <it>D. melanogaster</it>. Filled boxes represent exons, and open boxes represent untranslated regions (UTRs). Bottom panel: VISTA plot. The appropriate region of the <it>D. melanogaster</it>-<it>D. pseudoobscura </it>genome alignment was downloaded from the VISTA Browser [44]. We stepped through the alignment in 5 bp increments, and for each 501 bp window calculated the percentage of identical nucleotides between the sequences. The plot is shown relative to the <it>D. melanogaster </it>sequence, and represents a smoothed curve through the data using the ksmooth function in the statistical programming language R [49]. Areas under the curve are painted if they show >70% nucleotide conservation (dark blue = within an exon, light blue = within a UTR, pink = intronic and >100 bp in size).</p>
               </text>
               <graphic file="gb-2006-7-7-r67-1"/>
            </fig>
            <p>Given our success identifying intronic microinversion events, we sought to examine those regions flanking the 5,738 conserved orthologs for microinversions that potentially disrupt upstream or downstream regulatory domains. We extended the sequence of each ortholog by 2 kb upstream and downstream in both <it>D. melanogaster </it>and <it>D. pseudoobscura</it>, and repeated our sliding-window BLAST procedure. In comparison with our scan of intragenic regions, an analysis of short regions flanking genes has lower power to detect microinversion events for three reasons. First, intergenic sequence is generally less conserved than transcribed intronic sequence, although this difference may be slight <abbrgrp><abbr bid="B26">26</abbr></abbrgrp>. Second, we only scan 2 kb regions, and thus can detect only microinversions below this size. Finally, outside of transcribed regions synteny between the two genomes can break down. Richards <it>et al</it>. <abbrgrp><abbr bid="B15">15</abbr></abbrgrp> report that the average number of genes within a <it>D. melanogaster</it>-<it>D pseudoobscura </it>syntenic block is 10.7, or around 83 kb of sequence. Thus, the intergenic regions we compare may not always be orthologous.</p>
            <p>We discovered 22 microinversions in the 19.7 Mb of unique intergenic sequence tested, or 1.1 events/Mb (Additional data file 4). This is proportionally far fewer inversions than we found in intragenic regions (121 microinversions were detected in 42.2 Mb of transcribed sequence, or 2.9 events/Mb), for the reasons stated above. Three of the 22 microinversions were upstream or downstream of genes also harboring an intragenic microinversion event. In total, over both of our sliding-window BLAST tests, we identify 143 unique microinversions distinguishing the genomes of <it>D. melanogaster </it>and <it>D. pseudoobscura</it>. These 143 events are in/near 112 different genes.</p>
            <p>In <it>D. melanogaster </it>the frequency of nested genes, genes residing within introns of other genes, is around 7%, and the frequency of overlapping genes is around 15% <abbrgrp><abbr bid="B27">27</abbr></abbrgrp>. None of the microinversions overlap a host gene exon, but 7/143 microinversions overlap an annotated exon from a nested/overlapping gene in <it>D. melanogaster </it>(Additional data files <supplr sid="S3">3</supplr> and <supplr sid="S4">4</supplr>). These seven microinversions were not identified by direct scanning of the nested/overlapping genes, presumably due to low sequence conservation of these genes between <it>D. melanogaster </it>and <it>D. pseudoobscura</it>. It is unclear what, if any, effect these seven microinversions may have on the ability of the orthologous <it>D. pseudoobscura </it>nested/overlapping genes to function correctly. To verify that the inverted sequences are single-copy in each of the tested genomes, we BLASTed the sequence of all 143 microinversions against the appropriate genome assembly. The sequences of 142/143 are single copy, while the remaining intronic inversion, CG1794_inv1, BLASTs six times to the genomes of both <it>D. melanogaster </it>and <it>D. pseudoobscura</it>. The inverted region in this case encompasses the cytosolic tRNA gene <it>tRNA:met3:46A </it>(CR30003) that resides in an intron of the <it>Matrix metalloproteinase 2 </it>(<it>Mmp2</it>) gene. We detect multiple BLAST hits for this sequence because tRNA genes are present in multiple copies throughout the fly genome.</p>
            <p>The size of the 143 microinversions ranges from 46 bp to 4,006 bp (mean 628 bp; SD 635 bp) in <it>D. melanogaster</it>, and from 40 bp to 4,408 bp (mean 706 bp; SD 731 bp) in <it>D. pseudoobscura</it>. The difference in length between the species is due to insertion/deletion of nucleotides. There does not appear to be any strong directional change in microinversion length between the species, as the <it>D. pseudoobscura </it>arrangement is longer in just 86/143 (60.1%) of cases. Overall, the majority of microinversions are below 1 kb in both species (111/143, 77.6%). Using Clustalx version 1.83.1 <abbrgrp><abbr bid="B28">28</abbr><abbr bid="B29">29</abbr></abbrgrp> we aligned each <it>D. melanogaster </it>inversion event sequence with the corresponding, reverse complemented <it>D. pseudoobscura </it>sequence. Over the 143 events, ignoring alignment gaps, the average percent nucleotide identity is 74.9% (SD 12.8%). We expect a high level of conservation for the identified microinversions, as our ability to detect them was contingent on sequence conservation. Within the <it>D. melanogaster </it>and <it>D. pseudoobscura </it>genome alignment, only 46% of the <it>D. melanogaster </it>bases are identical <abbrgrp><abbr bid="B15">15</abbr></abbrgrp>, and this may generally obscure the signature of historical inversion events. Thus, the 143 detectable, conserved microinversions likely represent only a fraction of the events that have occurred since the divergence of <it>D. melanogaster </it>and <it>D. pseudoobscura</it>. Comparing the genomes of more closely related species of <it>Drosophila </it>may reveal much greater numbers of microinversions.</p>
            <p>In total, 112 genes harbor a microinversion within the transcribed region or just upstream or downstream. From Table <tblr tid="T1">1</tblr> it is clear there is a significant excess of genes with microinversions on <it>D. melanogaster </it>chromosome <it>2L </it>(Binomial test, <it>P </it>= 0.003), and a significant dearth on chromosome <it>2R </it>(Binomial test, <it>P </it>&lt; 0.001). What is not clear is why this might be the case, as within the major chromosome arms genes containing microinversions appear to be evenly distributed (Figure <figr fid="F2">2</figr>). If we consider the position of the intragenic microinversions within the host genes, they appear to preferentially reside within larger introns. Of the 121 intragenic microinversions, 82 (67.2%) are within the largest host gene intron, and 104 (85.2%) are within one of the largest two introns. Similar values are found when considering only those genes with greater than four introns (data not shown). However, within the host intron, the inversions show no positional preference: over the 121 intronic inversions, the distribution of the distance between the inversion breakpoints and the flanking exons (weighted by the size of the host intron) is approximately uniform (Additional data file 5). These observations are particularly interesting in light of the recent observation that longer introns diverge more slowly than shorter introns in <it>Drosophila </it><abbrgrp><abbr bid="B30">30</abbr></abbrgrp>. If longer introns are under selective constraint, they may be expected to contain many functional motifs, which could be disrupted and/or shuffled around by an intronic microinversion event.</p>
            <fig id="F2">
               <title>
                  <p>Figure 2</p>
               </title>
               <caption>
                  <p>Positions of the 112 microinversion-harboring genes in the <it>D. melanogaster </it>genome</p>
               </caption>
               <text>
                  <p>Positions of the 112 microinversion-harboring genes in the <it>D. melanogaster </it>genome. Using data from release 4.2.1 of the <it>D. melanogaster </it>genome assembly, the physical position of each of the 112 microinversion-harboring genes is mapped onto the <it>D. melanogaster </it>chromosomes. The midpoint of each gene is used to map to chromosome. The centromeres for chromosomes 2 and 3 are represented by filled black circles, and the positions of microinversion-harboring genes are indicated by vertical blue lines.</p>
               </text>
               <graphic file="gb-2006-7-7-r67-2"/>
            </fig>
         </sec>
         <sec>
            <st>
               <p>Impact of microinversions on gene regulation</p>
            </st>
            <p>Comparative genomics seeks to identify functional elements by examining the pattern of sequence conservation across species. The rationale behind this approach is that over evolutionary time sequences will diverge, unless they are under some form of functional or selective constraint. Thus, the maintenance of sequence conservation despite inversion makes the microinversion events we describe particularly interesting, as they may be enriched for functional motifs. Since the microinversions are present both within introns and upstream of genes, this brings up the possibility that inversions might impact the regulation of splicing and gene expression. For example, shuffling transcription factor binding sites within regulatory domains could alter the ability of sets of factors to bind in a coordinated fashion, and thereby up- or down-regulate expression, or alter the timing or tissue-specificity of transcription.</p>
            <p>We examined the position of the 143 microinversion events we identify relative to annotated regulatory regions in the <it>D. melanogaster </it>genome. We used two complementary resources: the DNase I footprint database is a systematically curated set of 1,362 <it>Drosophila </it>transcription factor binding sites <abbrgrp><abbr bid="B31">31</abbr><abbr bid="B32">32</abbr></abbrgrp>, and the REDfly database is a comprehensive collection of 628 known <it>cis</it>-regulatory modules (CRMs; sequences sufficient to regulate gene expression) in <it>D. melanogaster </it><abbrgrp><abbr bid="B33">33</abbr><abbr bid="B34">34</abbr></abbrgrp>. None of the DNase I footprints overlap the sequence of any <it>D. melanogaster </it>microinversion. However, three microinversions are present within a CRM. Microinversion CG31481_inv1, initially detected by Negre <it>et al</it>. <abbrgrp><abbr bid="B19">19</abbr></abbrgrp>, resides in intron 2 of the gene <it>proboscipedia </it>(<it>pb</it>), and is present within a 10.4 kb sequence showing enhancer activity <abbrgrp><abbr bid="B35">35</abbr></abbrgrp>. Microinversion CG1030_inv1, situated just 3' of the gene <it>Sex combs reduced </it>(<it>Scr</it>), is present within a 6.7 kb region exhibiting enhancer activity <abbrgrp><abbr bid="B36">36</abbr></abbrgrp>. Finally, the inversion CG12287_inv1 resides in intron 3 of the gene <it>POU domain protein 2 </it>(<it>pdm2</it>), and overlaps a 1.3 kb enhancer region detected and validated by Berman <it>et al</it>. <abbrgrp><abbr bid="B37">37</abbr></abbrgrp>.</p>
            <p>Of course, we do not know whether the microinversions we identify actually have an effect on transcriptional regulation in the two species. It is possible that in the three cases we describe the microinversions have no impact on the spacing/ordering of transcription factor binding sites. This may be particularly true for the two large enhancer regions, which at 10.4 kb and 6.7 kb likely do not represent the minimal enhancer. Work on the <it>Sox21b </it>gene, which shows a microinversion in intron 1 (Figure <figr fid="F1">1</figr>), has demonstrated that the pattern of <it>Sox21b </it>embryonic expression is conserved between <it>D. melanogaster </it>and <it>D. pseudoobscura </it><abbrgrp><abbr bid="B38">38</abbr></abbrgrp>. Thus, for this gene at a particular stage in development, the transcribed microinversion appears to be neutral with respect to expression pattern. As the community begins to understand more about binding site biology and the gene regulatory 'code', we may also be able to determine if the inversions we identify generally have a significant impact on gene regulation.</p>
         </sec>
         <sec>
            <st>
               <p>Genomic signature of microinversions</p>
            </st>
            <p>In analyzing the breakpoints between the syntenic blocks of <it>D. melanogaster </it>and <it>D. pseudoobscura</it>, Richards <it>et al</it>. <abbrgrp><abbr bid="B15">15</abbr></abbrgrp> provided evidence for a <it>D. pseudoobscura</it>-specific breakpoint motif, which could in principle effect large inversions via ectopic exchange. The motif is virtually absent from intron sequences, and is thus unlikely to be the cause of the microinversion events we describe here. In bacteria, short (12 to 23 bp) inverted repeat elements have been shown to permit inversion of the intervening DNA segment <abbrgrp><abbr bid="B39">39</abbr></abbrgrp>. However, the precise mechanism by which very small inversion events occur in eukaryotes in unknown.</p>
            <p>As an initial investigation into this problem, we examined whether the DNA sequence about the microinversion events showed any detectable signature. Richards <it>et al</it>. <abbrgrp><abbr bid="B15">15</abbr></abbrgrp> noted that breakpoint junctions between syntenic blocks of <it>D. melanogaster </it>and <it>D. pseudoobscura </it>were AT rich. The top panel of Figure <figr fid="F3">3</figr> shows data from a sliding-window analysis of average GC content across the flanking regions and breakpoints for the 143 <it>D. melanogaster</it>-<it>D. pseudoobscura </it>microinversions. It is apparent that in both species, GC content in the flanking region increases slowly towards the inversion breakpoints, and drops dramatically in the first/last 20 bp of the inversion. The average GC content for introns (where we identify most microinversions) is 40.0% in <it>D. melanogaster </it>and 44.0% in <it>D. pseudoobscura</it>, and 200 bp from the microinversion, GC content returns to this genome-wide average. Also, the GC content of the inversions themselves is similar to the intronic average (the average GC content for <it>D. melanogaster </it>inverted sequence is 42.5%, and for <it>D. pseudoobscura </it>is 44.6%).</p>
            <fig id="F3">
               <title>
                  <p>Figure 3</p>
               </title>
               <caption>
                  <p>GC content across microinversion breakpoints and conserved sequence blocks</p>
               </caption>
               <text>
                  <p>GC content across microinversion breakpoints and conserved sequence blocks. Top panel: 143 <it>Drosophila melanogaster</it>-<it>D. pseudoobscura </it>microinversions. For each microinversion breakpoint we extracted 200 bp flanking the breakpoint and 20 bp internal to the inversion as a contiguous section (we examined just 20 bp internal to each inversion breakpoint as the minimum inversion size was 40 bp). For each species, across all sequences for a given inversion breakpoint, we calculated GC content for all overlapping 5 bp windows. Each point in the plot represents the mean GC content for a single window. Vertical dashed lines indicate the inversion breakpoints. Note that the distance between these lines is variable across inversion events. Bottom panel: 774 <it>Drosophila melanogaster</it>-<it>D. pseudoobscura </it>conserved non-coding blocks. Using sliding-window BLAST data we identified all blocks of conserved non-coding sequence from the 93 genes harboring intronic microinversions (see Materials and methods for details). Sequence data were extracted from in/around the conserved blocks and analyzed as described for the microinversion data.</p>
               </text>
               <graphic file="gb-2006-7-7-r67-3"/>
            </fig>
            <p>One possibility is that the GC content pattern we observe across microinversion breakpoints is due not to inversions <it>per se</it>, but instead to a change in GC content between conserved and non-conserved sequence: the microinversions we detect essentially represent conserved sequence, present in opposite orientation in the two genomes. We extracted sequence from all 774 conserved non-coding sequence blocks in the 93 genes harboring intronic microinversions (see Materials and methods for details), and subjected these to the same sliding window GC content analysis we performed for the microinversions. As shown in Figure <figr fid="F3">3</figr>, the pattern of GC content across microinversion breakpoints (top panel), and the pattern across junctions between conserved and non-conserved sequence (bottom panel), is identical. The GC content patterns across conserved <it>Drosophila </it>sequence are very similar to those recorded by Walter <it>et al</it>. <abbrgrp><abbr bid="B40">40</abbr></abbrgrp> for 1,373 blocks of non-coding sequence conserved between human and <it>Takifugu rubripes </it>(Fugu). The fact that the pattern is maintained across vertebrate and invertebrate systems is deserving of further work.</p>
            <p>In an attempt to distinguish microinversions from conserved blocks based on nucleotide sequence data, we investigated the frequency of all 5-mer sequence motifs across the boundaries of the events, and examined the nucleotide compositional bias at the edges of the events <abbrgrp><abbr bid="B41">41</abbr></abbrgrp>. Neither test clearly distinguished microinversions from conserved blocks (data not shown), suggesting that if there is a general mechanism underlying <it>Drosophila </it>microinversion, it is not easily discernible from primary sequence data alone.</p>
         </sec>
         <sec>
            <st>
               <p>Phylogenetic distribution of microinversion events</p>
            </st>
            <p>It is of interest to ask when the microinversions we identify occurred in the <it>Drosophila </it>lineage, and which arrangement (standard or inverted) is the ancestral state. Using data from the 12 recently sequenced <it>Drosophila </it>genomes <abbrgrp><abbr bid="B42">42</abbr></abbrgrp> we extracted the orthologous regions surrounding 15 of the intragenic microinversions. For each region we then performed the same sliding-window BLAST procedure we describe above, in each case testing the <it>D. melanogaster </it>and the <it>D. pseudoobscura </it>orthologs individually against each of the other 11 species' orthologs. Figure <figr fid="F4">4</figr> details the results of these analyses.</p>
            <fig id="F4">
               <title>
                  <p>Figure 4</p>
               </title>
               <caption>
                  <p>Phylogenetic distribution of fifteen microinversion events</p>
               </caption>
               <text>
                  <p>Phylogenetic distribution of fifteen microinversion events. For 15 microinversions distinguishing the genomes of <it>D. melanogaster </it>and <it>D. pseudoobscura</it>, we examined orthologous regions from 10 other <it>Drosophila </it>species to determine whether they harbor the standard (St; <it>D. melanogaster</it>-like) or inverted (In; <it>D. pseudoobscura</it>-like) arrangement. Some species could not be reliably shown to have either arrangement (shown with a dash). A consensus phylogeny of the 12 species is provided. The microinversion events are grouped according to phylogenetic position, and mapped onto the consensus phylogeny.</p>
               </text>
               <graphic file="gb-2006-7-7-r67-4"/>
            </fig>
            <p>For nine of the events (CG6464_inv1, CG9019_inv1, CG9623_inv1, CG11354_inv1, CG12154_inv1, CG12287_inv1, CG31762_inv1, CG32139_inv1, and CG33529_inv1) the data are consistent with the inversion occurring prior to the divergence of the <it>melanogaster </it>group of species. For two events (CG3578_inv1 and CG3936_inv4) the inversion likely occurred prior to the divergence of the <it>melanogaster </it>subgroup of five species. Three microinversion events (CG2872_inv3, CG4220_inv1 and CG15455_inv1) occurred along the <it>obscura </it>group lineage. Finally, one event (CG4838_inv1) shows the inverted arrangement in the three species <it>D. willistoni</it>, <it>D. persimilis</it>, and <it>D. pseudoobscura</it>, and the standard arrangement in the remaining nine species. Three explanations are compatible with the phylogenetic distribution of CG4838_inv1. First, the same inversion may have occurred independently in the lineage leading to <it>D. willistoni </it>and in the lineage leading to the <it>obscura </it>group species. Second, the inversion may have occurred prior to the divergence of <it>D. willistoni </it>and the <it>obscura </it>group species, but re-inverted again in the lineage leading to the <it>melanogaster </it>group of species. Alternatively, the state of the CG4838_inv1 microinversion in <it>D. willistoni </it>may not be correct, and the inverted form may actually be present only in the pair of <it>obscura </it>group species. The latter possibility is conceivable as the current draft assembly of the <it>D. willistoni </it>genome has not been subject to the same scrutiny as the genomes of <it>D. melanogaster </it>and <it>D. pseudoobscura</it>.</p>
            <p>Due to ascertainment bias (the microinversion must distinguish <it>D. melanogaster </it>and <it>D. pseudoobscura</it>) we identify only a particular subset of <it>Drosophila </it>microinversions. It will be extremely interesting to extend our analyses to all pairs of <it>Drosophila </it>species, and place identified microinversions on the <it>Drosophila </it>phylogeny. We predict that many more microinversions will be identified between other <it>Drosophila </it>species pairs, and show different phylogenetic patterns.</p>
            <p>Finally, we note that the presence of both the standard and inverted arrangements of the 15 tested microinversions in multiple species provides independent support that microinversions are real features of <it>Drosophila </it>genome architecture.</p>
         </sec>
         <sec>
            <st>
               <p>Using BLAST to examine genome architecture</p>
            </st>
            <p>A widely used method to examine sequence differences between/among diverged species is to use VISTA plots of aligned sequence data <abbrgrp><abbr bid="B43">43</abbr></abbrgrp>. This highly informative method allows the local nucleotide conservation between species to be assessed, and VISTA plots can be generated for arbitrary regions of aligned genomes using a web-based utility <abbrgrp><abbr bid="B44">44</abbr></abbrgrp>. However, while the combination of genome alignment and VISTA plots has been widely employed, the approach may miss some architectural sequence features. For instance, in a VISTA plot comparing two genomes, one is marked as the reference sequence, and the plot is drawn relative to that sequence. Thus, insertions/deletions distinguishing the sequences are not easily seen. This is demonstrated in Figure <figr fid="F1">1</figr> - in the VISTA plot, using <it>D. melanogaster </it>as the reference sequence, it is not possible to determine that the <it>D. pseudoobscura Sox21b </it>gene is expanded relative to the <it>D. melanogaster </it>homolog. However, our BLAST approach shows that this is the case. Also, while there are methods available to identify rearrangements during genome alignment <abbrgrp><abbr bid="B45">45</abbr></abbrgrp>, these are not readily presented using the VISTA plot format. Generally, examining VISTA plots of aligned sequence data may capture much of the important differences between orthologous regions of diverged species. However, some ultrastructural features of the sequences may be missed in some cases. Sliding-window BLAST-based procedures such as that presented here, or those implemented in the GATA software package <abbrgrp><abbr bid="B46">46</abbr></abbrgrp>, are likely to prove a worthwhile addition to the armory of those examining the causes and effects of DNA sequence differences between diverged species.</p>
         </sec>
      </sec>
      <sec>
         <st>
            <p>Conclusion</p>
         </st>
         <p>We describe the use of a sliding-window BLAST-based approach to examine micro-scale genome architectural features. We almost certainly underestimate the actual number of such events occurring since the most recent common ancestor of these species, as in general there is considerable divergence between the genomes. Nevertheless, the microinversions we identify in this survey may be a particularly interesting class as they are conserved, and reside in introns or upstream of genes, and could have regulatory effects on gene expression and alternative exon splicing. We expect that microinversions will be fairly frequent in many organisms, not only <it>Drosophila</it>, and may be a particularly important source of genetic variation both among species and within populations.</p>
      </sec>
      <sec>
         <st>
            <p>Materials and methods</p>
         </st>
         <sec>
            <st>
               <p>Genome sequences and annotation</p>
            </st>
            <p>The genome sequences of <it>D. melanogaster </it>(release 4.2.1) and <it>D. pseudoobscura </it>(release 1.04), and the annotation features for <it>D. melanogaster </it>(in GFF v.3 format) were downloaded from FlyBase <abbrgrp><abbr bid="B47">47</abbr></abbrgrp>. Details of all the <it>D. melanogaster </it>genes were extracted from the GFF annotation files using a custom perlscript. Orthologous regions of <it>D. pseudoobscura </it>were identified via BLAST, using the standalone BLAST executable function blastall <abbrgrp><abbr bid="B48">48</abbr></abbrgrp>.</p>
         </sec>
         <sec>
            <st>
               <p>Sliding-window BLAST comparison of orthologs</p>
            </st>
            <p>Release 4.2.1 of the <it>D. melanogaster </it>genome harbors 13,667 annotated protein-coding genes, each represented by a unique CG identifier. We identified 11,011 <it>D. melanogaster </it>protein-coding genes having orthologs in <it>D. pseudoobscura</it>. For each <it>D. melanogaster </it>gene sequence we scanned through the sequence in 15 bp steps, at each step BLASTing a 31 bp query sequence against the putative <it>D. pseudoobscura </it>ortholog. This was accomplished using a custom perlscript calling the standalone BLAST executable function bl2seq <abbrgrp><abbr bid="B48">48</abbr></abbrgrp>. For each 31 bp <it>D. melanogaster </it>query sequence, we recorded the position, score, orientation and sequence of the best BLAST hit within the <it>D. pseudoobscura </it>ortholog. Only BLAST hits with scores above 45 were considered in further analyses. There were 5,738 orthologous pairs with at least two above threshold BLAST hits, and greater than 5% of the <it>D. melanogaster </it>gene sequence showing above threshold hits. Only these genes were retained for further analysis (<supplr sid="S1">Additional data file 1</supplr>).</p>
         </sec>
         <sec>
            <st>
               <p>Identification of structural features</p>
            </st>
            <p>A custom script written in the freely available statistical programming language R <abbrgrp><abbr bid="B49">49</abbr></abbrgrp> was applied to each of the resulting sliding-window ortholog BLAST files. Inversions were recognized as at least two consecutive, above-threshold BLAST hits, where the <it>D. melanogaster </it>query sequences BLAST <it>D. pseudoobscura </it>in reverse orientation, and the order of the hits in the two genomes is reversed (that is, the <it>D. melanogaster </it>query sequences A-B-C-D-E are reverse complemented in <it>D. pseudoobscura</it>, and in reverse order E-D-C-B-A). We placed no restriction on the distance between BLAST hits defining a microinversion to avoid identifying only small events with high levels of nucleotide conservation throughout their length. This means that the threshold of nucleotide conservation required to detect a microinversion is not a constant across the genome. Large insertion/deletion events distinguishing the two genomes were also identified. To be detected, the endpoints of the BLAST hits flanking the insertion had to be separated by greater than 1 kb, and be 10 times more distant than the endpoints flanking the deletion. Plots for all 5,738 genes were manually checked to ensure the accuracy of our automatic scripts (Additional data file 6 <abbrgrp><abbr bid="B50">50</abbr></abbrgrp>). Also, since we analyzed each gene independently, and genes can overlap in the <it>Drosophila </it>genome <abbrgrp><abbr bid="B27">27</abbr></abbrgrp>, we ensured that the inversion and insertion events we describe are unique.</p>
         </sec>
         <sec>
            <st>
               <p>Testing for transposable element insertion</p>
            </st>
            <p>To test whether the large insertion/deletion events we observe are the result of TE insertion, we performed two tests. For those events where the insertion is in the <it>D. melanogaster </it>genome, we compared the position of each insertion with the positions of 6,013 TEs annotated in the <it>D. melanogaster </it>genome <abbrgrp><abbr bid="B23">23</abbr><abbr bid="B24">24</abbr></abbrgrp>. No corresponding database exists for <it>D. pseudoobscura</it>. Second, using BLAST we compared each insertion sequence to a set of TE sequences identified in <it>Drosophila</it>. These sequences are present in the file 'D_mel_transposon_sequence_set.fasta' (version 9.4.1) available from the BDGP natural transposable element project website <abbrgrp><abbr bid="B51">51</abbr></abbrgrp>.</p>
         </sec>
         <sec>
            <st>
               <p>Confirmation of microinversion events</p>
            </st>
            <p>To ensure that inferred inversion events are not generally the result of genome assembly errors we designed 1 kb PCR amplicons about three of the inversion events: CG3578_inv1, CG3936_inv4, and CG32139_inv1. Products were amplified in the fly strains used for genome sequencing, that is, <it>D. melanogaster </it>stock number 2057 (Bloomington stock Center) or <it>D. pseudoobscura </it>stock number 14011-0121.94 (Tucson Drosophila species stock center). Accuracy of the genome assemblies was confirmed via dideoxy sequencing. PCR/sequencing oligos are available in Additional data file 7.</p>
            <p>The orientation of a putatively inverted sequence in a genome assembly is likely correct if, relative to the flanking sequences, the orientation is preserved within one or more single shotgun sequencing reads. Hence, for the 54 intragenic microinversion events less than 500 bp in size in both species, we extracted the sequence of the inversion and the 100 bp flanking each breakpoint, and BLASTed against the appropriate genome shotgun trace archive database using Mega BLAST <abbrgrp><abbr bid="B52">52</abbr></abbrgrp>.</p>
         </sec>
         <sec>
            <st>
               <p>GC content analysis</p>
            </st>
            <p>For each breakpoint of the 143 <it>D. melanogaster</it>-<it>D. pseudoobscura </it>microinversions we extracted a contiguous segment of 220 bp (200 bp flanking the breakpoint, and 20 bp internal to the inversion) from each species. For each species, independently for each breakpoint, across all sequences we calculate GC content for all overlapping 5 bp windows.</p>
            <p>Conserved blocks were defined on the basis of the <it>D. melanogaster</it>-<it>D. pseudoobscura </it>sliding-window BLAST procedure described above. In the microinversions we identify, the average number of above-threshold BLAST hits per 100 bp of <it>D. melanogaster </it>sequence is 1.6. We therefore defined a conserved block as a sequence having at least 1.6 BLAST hits per 100 bp of <it>D. melanogaster </it>sequence. All of these hits must be between sequences having the same orientation in the two genomes. Furthermore, the 200 bp flanking each edge of the block must be free of above-threshold BLAST hits. Finally, the conserved blocks must be at least 200 bp in length in both species, and no part of the conserved blocks or flanking sequence can be exonic. Using these rules we identified 774 blocks of non-coding sequence conserved between <it>D. melanogaster </it>and <it>D. pseudoobscura </it>in the 93 genes harboring intronic microinversions. To examine GC content change across the boundaries of conserved and non-conserved sequence, using the 774 blocks we performed an analysis identical to that described for the microinversion breakpoints above.</p>
         </sec>
         <sec>
            <st>
               <p>Phylogenetic distribution of microinversion events</p>
            </st>
            <p>For 15 intragenic microinversion events (CG2872_inv3, CG3578_inv1, CG3936_inv4, CG4220_inv1, CG4838_inv1, CG6464_inv1, CG9019_inv1, CG9623_inv1, CG11354_inv1, CG12154_inv1, CG12287_inv1, CG15455_inv1, CG31762_inv1, CG32139_inv1, CG33529_inv1), we identified the surrounding orthologous regions from 10 other <it>Drosophila </it>species using BLAST via the DroSpeGe website <abbrgrp><abbr bid="B53">53</abbr></abbrgrp>. For each of the 15 regions we performed the sliding-window BLAST protocol described above, testing the <it>D. melanogaster </it>sequence and the <it>D. pseudoobscura </it>sequence independently against sequence from every other species. The presence of the standard (<it>D. melanogaster</it>-like) or inverted (<it>D. pseudoobscura</it>-like) sequence arrangement was recorded in each case.</p>
            <p>Orthologous sequences were extracted from the following assemblies: dsim_wu050602 (<it>D. simulans</it>), dsec_br051028 (<it>D. sechellia</it>), dyak_caf051213 (<it>D. yakuba</it>), dere_caf051209 (<it>D. erecta</it>), dana_caf051209 (<it>D. ananassae</it>), dper_br051028 (<it>D. persimilis</it>), dmoj_caf051209 (<it>D. mojavensis</it>), dwil_caf060213 (<it>D. willistoni</it>), dvir_caf051209 (<it>D. virilis</it>), and dgri_caf051209 (<it>D. grimshawi</it>). The <it>D. simulans </it>and <it>D. yakuba </it>assemblies were provided by the Genome Sequencing Center, Washington University <abbrgrp><abbr bid="B54">54</abbr></abbrgrp>. The <it>D. erecta</it>, <it>D. ananassae</it>, <it>D. mojavensis</it>, <it>D. virilis</it>, and <it>D. grimshawi </it>assemblies were provided by Agencourt Bioscience Corporation <abbrgrp><abbr bid="B55">55</abbr></abbrgrp>. The <it>D. sechellia </it>and <it>D. persimilis </it>assemblies were provided by the Broad Institute <abbrgrp><abbr bid="B56">56</abbr></abbrgrp>. The <it>D. willistoni </it>assembly was provided by the J. Craig Venter Institute <abbrgrp><abbr bid="B57">57</abbr></abbrgrp>.</p>
         </sec>
      </sec>
      <sec>
         <st>
            <p>Additional data files</p>
         </st>
         <p>The following additional data files are available with the online version of this article. Additional data file <supplr sid="S1">1</supplr> is a spreadsheet providing details of all 5,738 genes that are sufficiently conserved between <it>Drosophila melanogaster </it>and <it>D. pseudoobscura </it>to be tested. The number of microinversions and insertion/deletion events detected within each gene is also indicated. Additional data file <supplr sid="S2">2</supplr> is a spreadsheet giving details of all 95 insertion/deletion events. Additional data files <supplr sid="S3">3</supplr> and <supplr sid="S4">4</supplr> are spreadsheets giving details of all 121 microinversions detected within genes, and all 22 microinversions detected upstream and downstream of genes, respectively. Additional data file <supplr sid="S5">5</supplr> is a PDF showing histograms of the distance between the microinversion breakpoints and the nearest flanking exon for the 121 intragenic microinversions. Additional data file <supplr sid="S6">6</supplr> is a zipped directory holding 5,738 PDFs, each showing a sliding-window <it>D. melanogaster</it>-<it>D. pseudoobscura </it>BLAST profile for a conserved pair of orthologs <abbrgrp><abbr bid="B50">50</abbr></abbrgrp>. Additional data file 7 is a text file providing the sequences of the PCR/sequencing oligos used for microinversion validation.</p>
         <suppl id="S1">
            <title>
               <p>Additional data file 1</p>
            </title>
            <caption>
               <p>Details of the 5,738 conserved orthologous gene pairs examined</p>
            </caption>
            <text>
               <p>Tabulated data on all 5,738 tested orthologous gene pairs. Each row of the table represents a gene. The columns of the table are as follows: Column 1 ('CG') holds the CG identifier for the gene. Column 2 ('gene.name') holds the name of the gene, if any. Column 3 ('Dmel.geneINFO') gives the position of the gene within release 4.2.1 of the <it>Drosophila melanogaster </it>genome. Column 4 ('Dpse.geneINFO') gives the position of the gene within release 1.04 of the <it>D. pseudoobscura </it>genome. Column 5 ('num.exons') gives the number of exons in the gene. Column 6 ('Dmel.gene.bp') provides the length in base pairs of the gene in <it>D. melanogaster</it>. Column 7 ('BLASThit.bp') gives the amount of <it>D. melanogaster </it>sequence, in base pairs, included in the set of above-threshold sliding-window BLAST hits. Column 8 ('num.BLASThits') holds the number of above-threshold sliding-window BLAST hits. Column 9 ('num.inserts') gives the number of large insertion/deletions events detected in the gene. Column 10 ('num.inversions') gives the number of microinversion events detected in the gene.</p>
            </text>
            <file name="gb-2006-7-7-r67-S1.txt">
               <p>Click here for file</p>
            </file>
         </suppl>
         <suppl id="S2">
            <title>
               <p>Additional data file 2</p>
            </title>
            <caption>
               <p>Details of the 95 large insertion/deletion events detected</p>
            </caption>
            <text>
               <p>Tabulated data on all 95 large insertion/deletion (indel) events detected. Each row of the table represents an indel. The columns of the table are as follows. Column 1 ('CG') holds the CG identifier for the gene. Column 2 ('Dmel.chr') gives the <it>D. melanogaster </it>chromosome on which the gene resides. Columns 3, 4, and 5 ('Dmel.geneSTART', 'Dmel.geneSTOP', and 'Dmel.geneORIENT') give the gene start position, stop position, and orientation, respectively, in release 4.2.1 of the <it>D. melanogaster </it>genome assembly. Column 6 ('Dpse.chr') gives the <it>D. pseudoobscura </it>chromosome on which the gene resides. Columns 7, 8, and 9 ('Dpse.geneSTART', 'Dpse.geneSTOP', and 'Dpse.geneORIENT') give the gene start position, stop position, and orientation, respectively, in release 1.04 of the <it>D. pseudoobscura </it>genome assembly. Columns 10 and 11 ('Dmel.insSTARTREL' and 'Dmel.insSTOPREL') give the position of the indel relative to the start of the host <it>D. melanogaster </it>gene. Columns 12 and 13 ('Dmel.insSTART.genome' and 'Dmel.insSTOP.genome') give the position of the indel in release 4.2.1 of the <it>D. melanogaster </it>genome assembly. Columns 14 and 15 ('Dpse.insSTARTREL' and 'Dpse.insSTOPREL') give the position of the indel relative to the start of the host <it>D. pseudoobscura </it>gene. Columns 16 and 17 ('Dmel.LEN' and 'Dpse.LEN') provide the lengths of the insertion/deletion in each genome. Column 18 ('insert.species') notes the species harboring the insert. Column 19 ('insert.amount') notes the amount of DNA inserted in base pairs. For those indels showing the insertion in <it>D. melanogaster </it>column 20 ('Dmel.insert.annot.TE') notes if the insertion overlaps the position of a known transposable element in the <it>D. melanogaster </it>genome. Column 21 ('insert.BLAST') shows the type of transposable element, if any, the inserted sequence BLASTs to.</p>
            </text>
            <file name="gb-2006-7-7-r67-S2.txt">
               <p>Click here for file</p>
            </file>
         </suppl>
         <suppl id="S3">
            <title>
               <p>Additional data file 3</p>
            </title>
            <caption>
               <p>Details of the 121 microinversions detected within genes</p>
            </caption>
            <text>
               <p>Tabulated data on all 121 microinversions detected by scanning regions within genes. Each row of the table represents a microinversion. The columns of the table are as follows. Column 1 ('CG') holds the CG identifier for the gene. Column 2 ('Dmel.chr') gives the <it>Drosophila melanogaster </it>chromosome on which the gene resides. Columns 3, 4, and 5 ('Dmel.geneSTART', 'Dmel.geneSTOP', and 'Dmel.geneORIENT') give the gene start position, stop position, and orientation, respectively, in release 4.2.1 of the <it>D. melanogaster </it>genome assembly. Column 6 ('Dpse.chr') gives the <it>D. pseudoobscura </it>chromosome on which the gene resides. Columns 7, 8, and 9 ('Dpse.geneSTART', 'Dpse.geneSTOP', and 'Dpse.geneORIENT') give the gene start position, stop position, and orientation, respectively, in release 1.04 of the <it>D. pseudoobscura </it>genome assembly. Column 10 ('inv.num') applies an arbitrary number to each microinversion so independent events within a single gene can be distinguished. Columns 11 and 12 ('Dmel.invSTARTREL' and 'Dmel.invSTOPREL') give the positions of the microinversion breakpoints relative to the start of the host <it>D. melanogaster </it>gene. Columns 13 and 14 ('Dmel.invSTART.genome' and 'Dmel.invSTOP.genome') give the positions of the microinversion breakpoints in release 4.2.1 of the <it>D. melanogaster </it>genome assembly. Column 15 ('Dmel.exon.overlap') is a 0/1 vector indicating whether the microinversion overlaps with an annotated <it>D. melanogaster </it>exon. Column 16 ('Dmel.gene.overlap') gives the identifier of the nested gene an inversion overlaps. Column 17 ('Dmel.invLEN') is the length of the microinversion in <it>D. melanogaster</it>. Columns 18 and 19 ('Dpse.invSTARTREL' and 'Dpse.invSTOPREL') give the positions of the microinversion breakpoints relative to the start of the host <it>D. pseudoobscura </it>gene. Column 20 ('Dpse.invLEN') is the size of the microinversion in <it>D. pseudoobscura</it>. Column 21 ('num.Dmel.introns') is the number of introns in the <it>D. melanogaster </it>host gene. Column 22 ('Dmel.invINTRON') is the number of the intron within which the microinversion resides, column 23 ('Dmel.invINTRON.size') is the size of that intron, and column 24 ('Dmel.invINTRON.sizerank') is the ranked size of that intron, with a 1 indicating the microinversion is present within the largest intron. Columns 25 and 26 ('Dmel.invINTRON.leftdist' and 'Dmel.invINTRON.rightdist') are distances between the 5'-breakpoint and the 3'-breakpoint, respectively, of the microinversion and the nearest flanking exon.</p>
            </text>
            <file name="gb-2006-7-7-r67-S3.txt">
               <p>Click here for file</p>
            </file>
         </suppl>
         <suppl id="S4">
            <title>
               <p>Additional data file 4</p>
            </title>
            <caption>
               <p>Details of the 22 microinversions detected up- and downstream of genes</p>
            </caption>
            <text>
               <p>Tabulated data on all 22 microinversions detected by scanning 2 kb regions upstream and downstream of each conserved orthologous gene pair. Each row of the table represents a microinversion. The columns of the table are as follows. Column 1 ('CG') holds the CG identifier for the gene. Column 2 ('Dmel.chr') gives the <it>D. melanogaster </it>chromosome on which the gene resides. Columns 3, 4, and 5 ('Dmel.geneSTART', 'Dmel.geneSTOP', and 'Dmel.geneORIENT') give the gene start position, stop position, and orientation, respectively, in release 4.2.1 of the <it>D. melanogaster </it>genome assembly. Column 6 ('Dpse.chr') gives the <it>D. pseudoobscura </it>chromosome on which the gene resides. Columns 7, 8, and 9 ('Dpse.geneSTART', 'Dpse.geneSTOP', and 'Dpse.geneORIENT') give the gene start position, stop position, and orientation, respectively, in release 1.04 of the <it>D. pseudoobscura </it>genome assembly. Column 10 ('inv.num') gives an arbitrary number to each microinversion so independent events within a single gene can be distinguished. Columns 11 and 12 ('Dmel.invSTARTREL' and 'Dmel.invSTOPREL') give the positions of the microinversion breakpoints relative to the start of the host <it>D. melanogaster </it>gene. Columns 13 and 14 ('Dmel.invSTART.genome' and 'Dmel.invSTOP.genome') give the positions of the microinversion breakpoints in release 4.2.1 of the <it>D. melanogaster </it>genome assembly. Column 15 ('Dmel.invPOS.relgene') indicates whether the microinversion is 5' or 3' of the test gene. Column 16 ('Dmel.exon.overlap') is a 0/1 vector indicating whether the microinversion overlaps with an annotated exon in <it>D. melanogaster</it>. Column 17 ('Dmel.gene.overlap') gives the identifier of the nested gene an inversion overlaps. Column 18 ('Dmel.invLEN') is the length of the microinversion in <it>D. melanogaster</it>. Columns 19 and 20 ('Dpse.invSTARTREL' and 'Dpse.invSTOPREL') give the positions of the microinversion breakpoints relative to the start of the host <it>D. pseudoobscura </it>gene. Column 21 ('Dpse.invLEN') is the size of the microinversion in <it>D. pseudoobscura</it>.</p>
            </text>
            <file name="gb-2006-7-7-r67-S4.txt">
               <p>Click here for file</p>
            </file>
         </suppl>
         <suppl id="S5">
            <title>
               <p>Additional data file 5</p>
            </title>
            <caption>
               <p>The distribution of microinversion position within host gene introns</p>
            </caption>
            <text>
               <p>For all 121 microinversions identified by scanning within genes, the distance between each breakpoint and the closest flanking exon was extracted, and divided by the size of the host intron. The plots are histograms of the 121 weighted distances, considering the 5'- and 3'-breakpoints separately. To test the distances against a uniform distribution we used a two-sided Kolmogorov-Smirnov test, and the results are presented above the plots.</p>
            </text>
            <file name="gb-2006-7-7-r67-S5.pdf">
               <p>Click here for file</p>
            </file>
         </suppl>
         <suppl id="S6">
            <title>
               <p>Additional data file 6</p>
            </title>
            <caption>
               <p>Sliding-window BLAST profiles for all 5,738 orthologous gene pairs tested</p>
            </caption>
            <text>
               <p>This file is a zipped archive of the sliding-window BLAST profiles (as PDFs) for all 5,738 orthologous gene pairs tested. The file is available at <abbrgrp><abbr bid="B50">50</abbr></abbrgrp>. The name of the plot is the CG identifier for the gene. We stepped through <it>D. melanogaster </it>gene in 15 bp increments, and at each position BLASTed a 31 bp segment against the <it>D. pseudoobscura </it>ortholog. Each line represents a BLAST hit with a score above 45, the endpoints show the position of the hit in each genome, and the color of line represents the orientation of the hit (black = same sequence orientation in each genome, red = different orientations in each genome).</p>
            </text>
            <file name="gb-2006-7-7-r67-S6.txt">
               <p>Click here for file</p>
            </file>
         </suppl>
      </sec>
   </bdy>
   <bm>
      <ack>
         <sec>
            <st>
               <p>Acknowledgements</p>
            </st>
            <p>We thank JD Gruber for help with various aspects of code development. This work was supported by National Institutes of Health grant GM 58564 to ADL.</p>
         </sec>
      </ack>
      <refgrp>
         <bibl id="B1">
            <title>
               <p>A haplotype map of the human genome.</p>
            </title>
            <aug>
               <au>
                  <snm>Altshuler</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Brooks</snm>
                  <fnm>LD</fnm>
               </au>
               <au>
                  <snm>Chakravarti</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Collins</snm>
                  <fnm>FS</fnm>
               </au>
               <au>
                  <snm>Daly</snm>
                  <fnm>MJ</fnm>
               </au>
               <au>
                  <snm>Donnelly</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <cnm>International HapMap Consortium</cnm>
               </au>
            </aug>
            <source>Nature</source>
            <pubdate>2005</pubdate>
            <volume>437</volume>
            <fpage>1299</fpage>
            <lpage>1320</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/nature04226</pubid>
                  <pubid idtype="pmpid" link="fulltext">16255080</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B2">
            <title>
               <p>Whole-genome patterns of common DNA variation in three human populations.</p>
            </title>
            <aug>
               <au>
                  <snm>Hinds</snm>
                  <fnm>DA</fnm>
               </au>
               <au>
                  <snm>Stuve</snm>
                  <fnm>LL</fnm>
               </au>
               <au>
                  <snm>Nilsen</snm>
                  <fnm>GB</fnm>
               </au>
               <au>
                  <snm>Halperin</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Eskin</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Ballinger</snm>
                  <fnm>DG</fnm>
               </au>
               <au>
                  <snm>Frazer</snm>
                  <fnm>KA</fnm>
               </au>
               <au>
                  <snm>Cox</snm>
                  <fnm>DR</fnm>
               </au>
            </aug>
            <source>Science</source>
            <pubdate>2005</pubdate>
            <volume>307</volume>
            <fpage>1072</fpage>
            <lpage>1079</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1126/science.1105436</pubid>
                  <pubid idtype="pmpid" link="fulltext">15718463</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B3">
            <title>
               <p>Discovering genotypes underlying human phenotypes: past successes for mendelian disease, future approaches for complex disease.</p>
            </title>
            <aug>
               <au>
                  <snm>Botstein</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Risch</snm>
                  <fnm>N</fnm>
               </au>
            </aug>
            <source>Nat Genet</source>
            <pubdate>2003</pubdate>
            <volume>33</volume>
            <issue>Suppl</issue>
            <fpage>228</fpage>
            <lpage>237</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/ng1090</pubid>
                  <pubid idtype="pmpid" link="fulltext">12610532</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B4">
            <title>
               <p>Structural variation in the human genome.</p>
            </title>
            <aug>
               <au>
                  <snm>Feuk</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Carson</snm>
                  <fnm>AR</fnm>
               </au>
               <au>
                  <snm>Scherer</snm>
                  <fnm>SW</fnm>
               </au>
            </aug>
            <source>Nat Rev Genet</source>
            <pubdate>2006</pubdate>
            <volume>7</volume>
            <fpage>85</fpage>
            <lpage>97</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/nrg1767</pubid>
                  <pubid idtype="pmpid" link="fulltext">16418744</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B5">
            <title>
               <p>Fine-scale structural variation of the human genome.</p>
            </title>
            <aug>
               <au>
                  <snm>Tuzun</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Sharp</snm>
                  <fnm>AJ</fnm>
               </au>
               <au>
                  <snm>Bailey</snm>
                  <fnm>JA</fnm>
               </au>
               <au>
                  <snm>Kaul</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Morrison</snm>
                  <fnm>VA</fnm>
               </au>
               <au>
                  <snm>Pertz</snm>
                  <fnm>LM</fnm>
               </au>
               <au>
                  <snm>Haugen</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Hayden</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Albertson</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Pinkel</snm>
                  <fnm>D</fnm>
               </au>
               <etal/>
            </aug>
            <source>Nat Genet</source>
            <pubdate>2005</pubdate>
            <volume>37</volume>
            <fpage>727</fpage>
            <lpage>732</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/ng1562</pubid>
                  <pubid idtype="pmpid" link="fulltext">15895083</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B6">
            <title>
               <p>A high-resolution survey of deletion polymorphism in the human genome.</p>
            </title>
            <aug>
               <au>
                  <snm>Conrad</snm>
                  <fnm>DF</fnm>
               </au>
               <au>
                  <snm>Andrews</snm>
                  <fnm>TD</fnm>
               </au>
               <au>
                  <snm>Carter</snm>
                  <fnm>NP</fnm>
               </au>
               <au>
                  <snm>Hurles</snm>
                  <fnm>ME</fnm>
               </au>
               <au>
                  <snm>Pritchard</snm>
                  <fnm>JK</fnm>
               </au>
            </aug>
            <source>Nat Genet</source>
            <pubdate>2006</pubdate>
            <volume>38</volume>
            <fpage>75</fpage>
            <lpage>81</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">16327808</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B7">
            <title>
               <p>Common deletions and SNPs are in linkage disequilibrium in the human genome.</p>
            </title>
            <aug>
               <au>
                  <snm>Hinds</snm>
                  <fnm>DA</fnm>
               </au>
               <au>
                  <snm>Kloek</snm>
                  <fnm>AP</fnm>
               </au>
               <au>
                  <snm>Jen</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Chen</snm>
                  <fnm>X</fnm>
               </au>
               <au>
                  <snm>Frazer</snm>
                  <fnm>KA</fnm>
               </au>
            </aug>
            <source>Nat Genet</source>
            <pubdate>2006</pubdate>
            <volume>38</volume>
            <fpage>82</fpage>
            <lpage>85</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">16327809</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B8">
            <title>
               <p>Common deletion polymorphisms in the human genome.</p>
            </title>
            <aug>
               <au>
                  <snm>McCarroll</snm>
                  <fnm>SA</fnm>
               </au>
               <au>
                  <snm>Hadnott</snm>
                  <fnm>TN</fnm>
               </au>
               <au>
                  <snm>Perry</snm>
                  <fnm>GH</fnm>
               </au>
               <au>
                  <snm>Sabeti</snm>
                  <fnm>PC</fnm>
               </au>
               <au>
                  <snm>Zody</snm>
                  <fnm>MC</fnm>
               </au>
               <au>
                  <snm>Barrett</snm>
                  <fnm>JC</fnm>
               </au>
               <au>
                  <snm>Dallaire</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Gabriel</snm>
                  <fnm>SB</fnm>
               </au>
               <au>
                  <snm>Lee</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Daly</snm>
                  <fnm>MJ</fnm>
               </au>
               <etal/>
            </aug>
            <source>Nat Genet</source>
            <pubdate>2006</pubdate>
            <volume>38</volume>
            <fpage>86</fpage>
            <lpage>92</lpage>
            <xrefbib>
               <pubid idtype="pmpid">16468122</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B9">
            <title>
               <p>Discovery of human inversion polymorphisms by comparative analysis of human and chimpanzee DNA sequence assemblies.</p>
            </title>
            <aug>
               <au>
                  <snm>Feuk</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>MacDonald</snm>
                  <fnm>JR</fnm>
               </au>
               <au>
                  <snm>Tang</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Carson</snm>
                  <fnm>AR</fnm>
               </au>
               <au>
                  <snm>Li</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Rao</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Khaja</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Scherer</snm>
                  <fnm>SW</fnm>
               </au>
            </aug>
            <source>PLoS Genet</source>
            <pubdate>2005</pubdate>
            <volume>1</volume>
            <fpage>e56</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1270012</pubid>
                  <pubid idtype="pmpid" link="fulltext">16254605</pubid>
                  <pubid idtype="doi">10.1371/journal.pgen.0010056</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B10">
            <title>
               <p>A genome-wide survey of structural variation between human and chimpanzee.</p>
            </title>
            <aug>
               <au>
                  <snm>Newman</snm>
                  <fnm>TL</fnm>
               </au>
               <au>
                  <snm>Tuzun</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Morrison</snm>
                  <fnm>VA</fnm>
               </au>
               <au>
                  <snm>Hayden</snm>
                  <fnm>KE</fnm>
               </au>
               <au>
                  <snm>Ventura</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>McGrath</snm>
                  <fnm>SD</fnm>
               </au>
               <au>
                  <snm>Rocchi</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Eichler</snm>
                  <fnm>EE</fnm>
               </au>
            </aug>
            <source>Genome Res</source>
            <pubdate>2005</pubdate>
            <volume>15</volume>
            <fpage>1344</fpage>
            <lpage>1356</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1240076</pubid>
                  <pubid idtype="pmpid" link="fulltext">16169929</pubid>
                  <pubid idtype="doi">10.1101/gr.4338005</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B11">
            <title>
               <p>A new method for the study of chromosome aberrations and the plotting of chromosome maps in <it>Drosophila melanogaster</it>.</p>
            </title>
            <aug>
               <au>
                  <snm>Painter</snm>
                  <fnm>TS</fnm>
               </au>
            </aug>
            <source>Genetics</source>
            <pubdate>1934</pubdate>
            <volume>19</volume>
            <fpage>175</fpage>
            <lpage>188</lpage>
         </bibl>
         <bibl id="B12">
            <title>
               <p>Chromosomal polymorphism in natural and experimental populations.</p>
            </title>
            <aug>
               <au>
                  <snm>Sperlich</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Pfriem</snm>
                  <fnm>P</fnm>
               </au>
            </aug>
            <source>The Genetics and Biology of Drosophila</source>
            <publisher>London: Academic Press</publisher>
            <editor>Ashburner M, Carson HL, Thompson JN Jr</editor>
            <pubdate>1986</pubdate>
            <volume>3c</volume>
            <fpage>257</fpage>
            <lpage>309</lpage>
         </bibl>
         <bibl id="B13">
            <aug>
               <au>
                  <snm>Powell</snm>
                  <fnm>JR</fnm>
               </au>
            </aug>
            <source>Progress and Prospects in Evolutionary Biology: The Drosophila Model</source>
            <publisher>New York: Oxford University Press</publisher>
            <pubdate>1997</pubdate>
         </bibl>
         <bibl id="B14">
            <title>
               <p>Molecular phylogeny and divergence times of drosophilid species.</p>
            </title>
            <aug>
               <au>
                  <snm>Russo</snm>
                  <fnm>CA</fnm>
               </au>
               <au>
                  <snm>Takezaki</snm>
                  <fnm>N</fnm>
               </au>
               <au>
                  <snm>Nei</snm>
                  <fnm>M</fnm>
               </au>
            </aug>
            <source>Mol Biol Evol</source>
            <pubdate>1995</pubdate>
            <volume>12</volume>
            <fpage>391</fpage>
            <lpage>404</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">7739381</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B15">
            <title>
               <p>Comparative genome sequencing of <it>Drosophila pseudoobscura</it>: chromosomal, gene, and <it>cis</it>-element evolution.</p>
            </title>
            <aug>
               <au>
                  <snm>Richards</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Liu</snm>
                  <fnm>Y</fnm>
               </au>
               <au>
                  <snm>Bettencourt</snm>
                  <fnm>BR</fnm>
               </au>
               <au>
                  <snm>Hradecky</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Letovsky</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Nielsen</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Thornton</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Hubisz</snm>
                  <fnm>MJ</fnm>
               </au>
               <au>
                  <snm>Chen</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Meisel</snm>
                  <fnm>RP</fnm>
               </au>
               <etal/>
            </aug>
            <source>Genome Res</source>
            <pubdate>2005</pubdate>
            <volume>15</volume>
            <fpage>1</fpage>
            <lpage>18</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">540289</pubid>
                  <pubid idtype="pmpid" link="fulltext">15632085</pubid>
                  <pubid idtype="doi">10.1101/gr.3059305</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B16">
            <title>
               <p>Chromosomal homology and molecular organization of Muller's elements D and E in the <it>Drosophila repleta </it>species group.</p>
            </title>
            <aug>
               <au>
                  <snm>Ranz</snm>
                  <fnm>JM</fnm>
               </au>
               <au>
                  <snm>Segarra</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Ruiz</snm>
                  <fnm>A</fnm>
               </au>
            </aug>
            <source>Genetics</source>
            <pubdate>1997</pubdate>
            <volume>145</volume>
            <fpage>281</fpage>
            <lpage>295</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">9071584</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B17">
            <title>
               <p>How malleable is the eukaryotic genome? Extreme rate of chromosomal rearrangement in the genus <it>Drosophila</it>.</p>
            </title>
            <aug>
               <au>
                  <snm>Ranz</snm>
                  <fnm>JM</fnm>
               </au>
               <au>
                  <snm>Casals</snm>
                  <fnm>F</fnm>
               </au>
               <au>
                  <snm>Ruiz</snm>
                  <fnm>A</fnm>
               </au>
            </aug>
            <source>Genome Res</source>
            <pubdate>2001</pubdate>
            <volume>11</volume>
            <fpage>230</fpage>
            <lpage>239</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">311025</pubid>
                  <pubid idtype="pmpid" link="fulltext">11157786</pubid>
                  <pubid idtype="doi">10.1101/gr.162901</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B18">
            <title>
               <p>Assessing the impact of comparative genomic sequence data on the functional annotation of the <it>Drosophila </it>genome.</p>
            </title>
            <aug>
               <au>
                  <snm>Bergman</snm>
                  <fnm>CM</fnm>
               </au>
               <au>
                  <snm>Pfeiffer</snm>
                  <fnm>BD</fnm>
               </au>
               <au>
                  <snm>Rincon-Limas</snm>
                  <fnm>DE</fnm>
               </au>
               <au>
                  <snm>Hoskins</snm>
                  <fnm>RA</fnm>
               </au>
               <au>
                  <snm>Gnirke</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Mungall</snm>
                  <fnm>CJ</fnm>
               </au>
               <au>
                  <snm>Wang</snm>
                  <fnm>AM</fnm>
               </au>
               <au>
                  <snm>Kronmiller</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Pacleb</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Park</snm>
                  <fnm>S</fnm>
               </au>
               <etal/>
            </aug>
            <source>Genome Biol</source>
            <pubdate>2002</pubdate>
            <volume>3</volume>
            <fpage>RESEARCH0086</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">151188</pubid>
                  <pubid idtype="pmpid" link="fulltext">12537575</pubid>
                  <pubid idtype="doi">10.1186/gb-2002-3-12-research0086</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B19">
            <title>
               <p>Conservation of regulatory sequences and gene expression patterns in the disintegrating <it>Drosophila Hox </it>gene complex.</p>
            </title>
            <aug>
               <au>
                  <snm>Negre</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Casillas</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Suzanne</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Sanchez-Herrero</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Akam</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Nefedov</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Barbadilla</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>de Jong</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Ruiz</snm>
                  <fnm>A</fnm>
               </au>
            </aug>
            <source>Genome Res</source>
            <pubdate>2005</pubdate>
            <volume>15</volume>
            <fpage>692</fpage>
            <lpage>700</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1088297</pubid>
                  <pubid idtype="pmpid" link="fulltext">15867430</pubid>
                  <pubid idtype="doi">10.1101/gr.3468605</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B20">
            <title>
               <p>The genome sequence of <it>Drosophila melanogaster</it>.</p>
            </title>
            <aug>
               <au>
                  <snm>Adams</snm>
                  <fnm>MD</fnm>
               </au>
               <au>
                  <snm>Celniker</snm>
                  <fnm>SE</fnm>
               </au>
               <au>
                  <snm>Holt</snm>
                  <fnm>RA</fnm>
               </au>
               <au>
                  <snm>Evans</snm>
                  <fnm>CA</fnm>
               </au>
               <au>
                  <snm>Gocayne</snm>
                  <fnm>JD</fnm>
               </au>
               <au>
                  <snm>Amanatides</snm>
                  <fnm>PG</fnm>
               </au>
               <au>
                  <snm>Scherer</snm>
                  <fnm>SE</fnm>
               </au>
               <au>
                  <snm>Li</snm>
                  <fnm>PW</fnm>
               </au>
               <au>
                  <snm>Hoskins</snm>
                  <fnm>RA</fnm>
               </au>
               <au>
                  <snm>Galle</snm>
                  <fnm>RF</fnm>
               </au>
               <etal/>
            </aug>
            <source>Science</source>
            <pubdate>2000</pubdate>
            <volume>287</volume>
            <fpage>2185</fpage>
            <lpage>2195</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1126/science.287.5461.2185</pubid>
                  <pubid idtype="pmpid" link="fulltext">10731132</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B21">
            <title>
               <p>Finishing a whole-genome shotgun: release 3 of the <it>Drosophila melanogaster </it>euchromatic genome sequence.</p>
            </title>
            <aug>
               <au>
                  <snm>Celniker</snm>
                  <fnm>SE</fnm>
               </au>
               <au>
                  <snm>Wheeler</snm>
                  <fnm>DA</fnm>
               </au>
               <au>
                  <snm>Kronmiller</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Carlson</snm>
                  <fnm>JW</fnm>
               </au>
               <au>
                  <snm>Halpern</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Patel</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Adams</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Champe</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Dugan</snm>
                  <fnm>SP</fnm>
               </au>
               <au>
                  <snm>Frise</snm>
                  <fnm>E</fnm>
               </au>
               <etal/>
            </aug>
            <source>Genome Biol</source>
            <pubdate>2002</pubdate>
            <volume>3</volume>
            <fpage>RESEARCH0079</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">151181</pubid>
                  <pubid idtype="pmpid" link="fulltext">12537568</pubid>
                  <pubid idtype="doi">10.1186/gb-2002-3-12-research0079</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B22">
            <title>
               <p>Identification of transposable elements using multiple alignments of related genomes.</p>
            </title>
            <aug>
               <au>
                  <snm>Caspi</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Pachter</snm>
                  <fnm>L</fnm>
               </au>
            </aug>
            <source>Genome Res</source>
            <pubdate>2006</pubdate>
            <volume>16</volume>
            <fpage>260</fpage>
            <lpage>270</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1361722</pubid>
                  <pubid idtype="pmpid" link="fulltext">16354754</pubid>
                  <pubid idtype="doi">10.1101/gr.4361206</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B23">
            <title>
               <p>The REPET Database</p>
            </title>
            <url>http://dynagen.ijm.jussieu.fr/repet/dmel4/index.html</url>
         </bibl>
         <bibl id="B24">
            <title>
               <p>Combined evidence annotation of transposable elements in genome sequences.</p>
            </title>
            <aug>
               <au>
                  <snm>Quesneville</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Bergman</snm>
                  <fnm>CM</fnm>
               </au>
               <au>
                  <snm>Andrieu</snm>
                  <fnm>O</fnm>
               </au>
               <au>
                  <snm>Autard</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Nouaud</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Ashburner</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Anxolabehere</snm>
                  <fnm>D</fnm>
               </au>
            </aug>
            <source>PLoS Comput Biol</source>
            <pubdate>2005</pubdate>
            <volume>1</volume>
            <fpage>166</fpage>
            <lpage>175</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1185648</pubid>
                  <pubid idtype="pmpid" link="fulltext">16110336</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B25">
            <title>
               <p>Large-scale trends in the evolution of gene structures within 11 animal genomes.</p>
            </title>
            <aug>
               <au>
                  <snm>Yandell</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Mungall</snm>
                  <fnm>CJ</fnm>
               </au>
               <au>
                  <snm>Smith</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Prochnik</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Kaminker</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Hartzell</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Lewis</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Rubin</snm>
                  <fnm>GM</fnm>
               </au>
            </aug>
            <source>PLoS Comput Biol</source>
            <pubdate>2006</pubdate>
            <volume>2</volume>
            <fpage>e15</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1386723</pubid>
                  <pubid idtype="pmpid" link="fulltext">16518452</pubid>
                  <pubid idtype="doi">10.1371/journal.pcbi.0020015</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B26">
            <title>
               <p>Analysis of conserved noncoding DNA in <it>Drosophila </it>reveals similar constraints in intergenic and intronic sequences.</p>
            </title>
            <aug>
               <au>
                  <snm>Bergman</snm>
                  <fnm>CM</fnm>
               </au>
               <au>
                  <snm>Kreitman</snm>
                  <fnm>M</fnm>
               </au>
            </aug>
            <source>Genome Res</source>
            <pubdate>2001</pubdate>
            <volume>11</volume>
            <fpage>1335</fpage>
            <lpage>1345</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1101/gr.178701</pubid>
                  <pubid idtype="pmpid" link="fulltext">11483574</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B27">
            <title>
               <p>Annotation of the <it>Drosophila melanogaster </it>euchromatic genome: a systematic review.</p>
            </title>
            <aug>
               <au>
                  <snm>Misra</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Crosby</snm>
                  <fnm>MA</fnm>
               </au>
               <au>
                  <snm>Mungall</snm>
                  <fnm>CJ</fnm>
               </au>
               <au>
                  <snm>Matthews</snm>
                  <fnm>BB</fnm>
               </au>
               <au>
                  <snm>Campbell</snm>
                  <fnm>KS</fnm>
               </au>
               <au>
                  <snm>Hradecky</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Huang</snm>
                  <fnm>Y</fnm>
               </au>
               <au>
                  <snm>Kaminker</snm>
                  <fnm>JS</fnm>
               </au>
               <au>
                  <snm>Millburn</snm>
                  <fnm>GH</fnm>
               </au>
               <au>
                  <snm>Prochnik</snm>
                  <fnm>SE</fnm>
               </au>
               <etal/>
            </aug>
            <source>Genome Biol</source>
            <pubdate>2002</pubdate>
            <volume>3</volume>
            <fpage>RESEARCH0083</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">151185</pubid>
                  <pubid idtype="pmpid" link="fulltext">12537572</pubid>
                  <pubid idtype="doi">10.1186/gb-2002-3-12-research0083</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B28">
            <title>
               <p>Clustalx for Mac OS X</p>
            </title>
            <url>http://www.embl.de/~chenna/clustal/darwin/</url>
         </bibl>
         <bibl id="B29">
            <title>
               <p>Multiple sequence alignment with the Clustal series of programs.</p>
            </title>
            <aug>
               <au>
                  <snm>Chenna</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Sugawara</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Koike</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Lopez</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Gibson</snm>
                  <fnm>TJ</fnm>
               </au>
               <au>
                  <snm>Higgins</snm>
                  <fnm>DG</fnm>
               </au>
               <au>
                  <snm>Thompson</snm>
                  <fnm>JD</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>2003</pubdate>
            <volume>31</volume>
            <fpage>3497</fpage>
            <lpage>3500</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">168907</pubid>
                  <pubid idtype="pmpid" link="fulltext">12824352</pubid>
                  <pubid idtype="doi">10.1093/nar/gkg500</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B30">
            <title>
               <p>Patterns of intron sequence evolution in <it>Drosophila </it>are dependent upon length and GC content.</p>
            </title>
            <aug>
               <au>
                  <snm>Haddrill</snm>
                  <fnm>PR</fnm>
               </au>
               <au>
                  <snm>Charlesworth</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Halligan</snm>
                  <fnm>DL</fnm>
               </au>
               <au>
                  <snm>Andolfatto</snm>
                  <fnm>P</fnm>
               </au>
            </aug>
            <source>Genome Biol</source>
            <pubdate>2005</pubdate>
            <volume>6</volume>
            <fpage>R67</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1273634</pubid>
                  <pubid idtype="pmpid" link="fulltext">16086849</pubid>
                  <pubid idtype="doi">10.1186/gb-2005-6-8-r67</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B31">
            <title>
               <p>The <it>Drosophila </it>DNase I Footprint Database</p>
            </title>
            <url>http://www.flyreg.org</url>
         </bibl>
         <bibl id="B32">
            <title>
               <p><it>Drosophila </it>DNase I footprint database: a systematic genome annotation of transcription factor binding sites in the fruitfly, <it>Drosophila melanogaster</it>.</p>
            </title>
            <aug>
               <au>
                  <snm>Bergman</snm>
                  <fnm>CM</fnm>
               </au>
               <au>
                  <snm>Carlson</snm>
                  <fnm>JW</fnm>
               </au>
               <au>
                  <snm>Celniker</snm>
                  <fnm>SE</fnm>
               </au>
            </aug>
            <source>Bioinformatics</source>
            <pubdate>2005</pubdate>
            <volume>21</volume>
            <fpage>1747</fpage>
            <lpage>1749</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/bioinformatics/bti173</pubid>
                  <pubid idtype="pmpid" link="fulltext">15572468</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B33">
            <title>
               <p>The REDfly Database</p>
            </title>
            <url>http://redfly.ccr.buffalo.edu</url>
         </bibl>
         <bibl id="B34">
            <title>
               <p>REDfly: a regulatory element database for Drosophila.</p>
            </title>
            <aug>
               <au>
                  <snm>Gallo</snm>
                  <fnm>SM</fnm>
               </au>
               <au>
                  <snm>Li</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Hu</snm>
                  <fnm>Z</fnm>
               </au>
               <au>
                  <snm>Halfon</snm>
                  <fnm>MS</fnm>
               </au>
            </aug>
            <source>Bioinformatics</source>
            <pubdate>2006</pubdate>
            <volume>22</volume>
            <fpage>381</fpage>
            <lpage>383</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/bioinformatics/bti794</pubid>
                  <pubid idtype="pmpid" link="fulltext">16303794</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B35">
            <title>
               <p>A functional analysis of 5', intronic and promoter regions of the homeotic gene <it>proboscipedia </it>in <it>Drosophila melanogaster</it>.</p>
            </title>
            <aug>
               <au>
                  <snm>Kapoun</snm>
                  <fnm>AM</fnm>
               </au>
               <au>
                  <snm>Kaufman</snm>
                  <fnm>TC</fnm>
               </au>
            </aug>
            <source>Development</source>
            <pubdate>1995</pubdate>
            <volume>121</volume>
            <fpage>2127</fpage>
            <lpage>2141</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">7635058</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B36">
            <title>
               <p>Characterization of the <it>cis</it>-regulatory region of the <it>Drosophila </it>homeotic gene <it>Sex combs reduced</it>.</p>
            </title>
            <aug>
               <au>
                  <snm>Gindhart</snm>
                  <fnm>JG</fnm>
                  <suf>Jr</suf>
               </au>
               <au>
                  <snm>King</snm>
                  <fnm>AN</fnm>
               </au>
               <au>
                  <snm>Kaufman</snm>
                  <fnm>TC</fnm>
               </au>
            </aug>
            <source>Genetics</source>
            <pubdate>1995</pubdate>
            <volume>139</volume>
            <fpage>781</fpage>
            <lpage>795</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">7713432</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B37">
            <title>
               <p>Computational identification of developmental enhancers: conservation and function of transcription factor binding-site clusters in <it>Drosophila melanogaster </it>and <it>Drosophila pseudoobscura</it>.</p>
            </title>
            <aug>
               <au>
                  <snm>Berman</snm>
                  <fnm>BP</fnm>
               </au>
               <au>
                  <snm>Pfeiffer</snm>
                  <fnm>BD</fnm>
               </au>
               <au>
                  <snm>Laverty</snm>
                  <fnm>TR</fnm>
               </au>
               <au>
                  <snm>Salzberg</snm>
                  <fnm>SL</fnm>
               </au>
               <au>
                  <snm>Rubin</snm>
                  <fnm>GM</fnm>
               </au>
               <au>
                  <snm>Eisen</snm>
                  <fnm>MB</fnm>
               </au>
               <au>
                  <snm>Celniker</snm>
                  <fnm>SE</fnm>
               </au>
            </aug>
            <source>Genome Biol</source>
            <pubdate>2004</pubdate>
            <volume>5</volume>
            <fpage>R61</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">522868</pubid>
                  <pubid idtype="pmpid" link="fulltext">15345045</pubid>
                  <pubid idtype="doi">10.1186/gb-2004-5-9-r61</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B38">
            <title>
               <p>Conserved genomic organisation of Group B Sox genes in insects.</p>
            </title>
            <aug>
               <au>
                  <snm>McKimmie</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Woerfel</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Russell</snm>
                  <fnm>S</fnm>
               </au>
            </aug>
            <source>BMC Genet</source>
            <pubdate>2005</pubdate>
            <volume>6</volume>
            <fpage>26</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1166547</pubid>
                  <pubid idtype="pmpid" link="fulltext">15943880</pubid>
                  <pubid idtype="doi">10.1186/1471-2156-6-26</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B39">
            <title>
               <p>DNA inversions between short inverted repeats in <it>Escherichia coli</it>.</p>
            </title>
            <aug>
               <au>
                  <snm>Schofield</snm>
                  <fnm>MA</fnm>
               </au>
               <au>
                  <snm>Agbunag</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Miller</snm>
                  <fnm>JH</fnm>
               </au>
            </aug>
            <source>Genetics</source>
            <pubdate>1992</pubdate>
            <volume>132</volume>
            <fpage>295</fpage>
            <lpage>302</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">1427029</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B40">
            <title>
               <p>Striking nucleotide frequency pattern at the borders of highly conserved vertebrate non-coding sequences.</p>
            </title>
            <aug>
               <au>
                  <snm>Walter</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Abnizova</snm>
                  <fnm>I</fnm>
               </au>
               <au>
                  <snm>Elgar</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Gilks</snm>
                  <fnm>WR</fnm>
               </au>
            </aug>
            <source>Trends Genet</source>
            <pubdate>2005</pubdate>
            <volume>21</volume>
            <fpage>436</fpage>
            <lpage>440</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/j.tig.2005.06.003</pubid>
                  <pubid idtype="pmpid" link="fulltext">15979195</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B41">
            <title>
               <p>WebLogo: a sequence logo generator.</p>
            </title>
            <aug>
               <au>
                  <snm>Crooks</snm>
                  <fnm>GE</fnm>
               </au>
               <au>
                  <snm>Hon</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Chandonia</snm>
                  <fnm>JM</fnm>
               </au>
               <au>
                  <snm>Brenner</snm>
                  <fnm>SE</fnm>
               </au>
            </aug>
            <source>Genome Res</source>
            <pubdate>2004</pubdate>
            <volume>14</volume>
            <fpage>1188</fpage>
            <lpage>1190</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">419797</pubid>
                  <pubid idtype="pmpid" link="fulltext">15173120</pubid>
                  <pubid idtype="doi">10.1101/gr.849004</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B42">
            <title>
               <p>Project to Sequence 12 <it>Drosophila </it>Genomes</p>
            </title>
            <url>http://rana.lbl.gov/drosophila</url>
         </bibl>
         <bibl id="B43">
            <title>
               <p>VISTA: visualizing global DNA sequence alignments of arbitrary length.</p>
            </title>
            <aug>
               <au>
                  <snm>Mayor</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Brudno</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Schwartz</snm>
                  <fnm>JR</fnm>
               </au>
               <au>
                  <snm>Poliakov</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Rubin</snm>
                  <fnm>EM</fnm>
               </au>
               <au>
                  <snm>Frazer</snm>
                  <fnm>KA</fnm>
               </au>
               <au>
                  <snm>Pachter</snm>
                  <fnm>LS</fnm>
               </au>
               <au>
                  <snm>Dubchak</snm>
                  <fnm>I</fnm>
               </au>
            </aug>
            <source>Bioinformatics</source>
            <pubdate>2000</pubdate>
            <volume>16</volume>
            <fpage>1046</fpage>
            <lpage>1047</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/bioinformatics/16.11.1046</pubid>
                  <pubid idtype="pmpid" link="fulltext">11159318</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B44">
            <title>
               <p>VISTA Plot Web-based Genome-alignment Viewer</p>
            </title>
            <url>http://genome.lbl.gov/vista/index.shtml</url>
         </bibl>
         <bibl id="B45">
            <title>
               <p>Glocal alignment: finding rearrangements during alignment.</p>
            </title>
            <aug>
               <au>
                  <snm>Brudno</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Malde</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Poliakov</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Do</snm>
                  <fnm>CB</fnm>
               </au>
               <au>
                  <snm>Couronne</snm>
                  <fnm>O</fnm>
               </au>
               <au>
                  <snm>Dubchak</snm>
                  <fnm>I</fnm>
               </au>
               <au>
                  <snm>Batzoglou</snm>
                  <fnm>S</fnm>
               </au>
            </aug>
            <source>Bioinformatics</source>
            <pubdate>2003</pubdate>
            <volume>19</volume>
            <issue>Suppl 1</issue>
            <fpage>i54</fpage>
            <lpage>62</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/bioinformatics/btg1005</pubid>
                  <pubid idtype="pmpid" link="fulltext">12855437</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B46">
            <title>
               <p>GATA: a graphic alignment tool for comparative sequence analysis.</p>
            </title>
            <aug>
               <au>
                  <snm>Nix</snm>
                  <fnm>DA</fnm>
               </au>
               <au>
                  <snm>Eisen</snm>
                  <fnm>MB</fnm>
               </au>
            </aug>
            <source>BMC Bioinformatics</source>
            <pubdate>2005</pubdate>
            <volume>6</volume>
            <fpage>9</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">546196</pubid>
                  <pubid idtype="pmpid" link="fulltext">15655071</pubid>
                  <pubid idtype="doi">10.1186/1471-2105-6-9</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B47">
            <title>
               <p>FlyBase <it>Drosophila </it>Genome Annotation</p>
            </title>
            <url>http://www.flybase.org/annot/</url>
         </bibl>
         <bibl id="B48">
            <title>
               <p>NCBI Standalone BLAST</p>
            </title>
            <url>ftp://ftp.ncbi.nih.gov/blast/executables/</url>
         </bibl>
         <bibl id="B49">
            <title>
               <p>The R Project for Statistical Computing</p>
            </title>
            <url>http://www.R-project.org</url>
         </bibl>
         <bibl id="B50">
            <title>
               <p>Additional Data File 6 [AU query: please provide a more complete title]</p>
            </title>
            <url>http://hjmuller.bio.uci.edu/~smacdonald/add_data_file_6_plots.zip</url>
         </bibl>
         <bibl id="B51">
            <title>
               <p>BDGP: Natural Transposable Element Project</p>
            </title>
            <url>http://www.fruitfly.org/p_disrupt/TE.html</url>
         </bibl>
         <bibl id="B52">
            <title>
               <p>Mega BLAST Against Archive of Shotgun Genome Sequence Traces</p>
            </title>
            <url>http://www.ncbi.nlm.nih.gov/blast/mmtrace.shtml</url>
         </bibl>
         <bibl id="B53">
            <title>
               <p>BLAST Against <it>Drosophila </it>Genome Assemblies</p>
            </title>
            <url>http://insects.eugenes.org/species/</url>
         </bibl>
         <bibl id="B54">
            <title>
               <p>Genome Sequencing Center, Washington University</p>
            </title>
            <url>http://genome.wustl.edu/</url>
         </bibl>
         <bibl id="B55">
            <title>
               <p>Agencourt Bioscience Corporation</p>
            </title>
            <url>http://www.agencourt.com/</url>
         </bibl>
         <bibl id="B56">
            <title>
               <p>Broad Institute</p>
            </title>
            <url>http://www.broad.mit.edu/</url>
         </bibl>
         <bibl id="B57">
            <title>
               <p>J. Craig Venter Institute</p>
            </title>
            <url>http://www.venterinstitute.org/</url>
         </bibl>
         <bibl id="B58">
            <aug>
               <au>
                  <snm>Ashburner</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Golic</snm>
                  <fnm>KG</fnm>
               </au>
               <au>
                  <snm>Hawley</snm>
                  <fnm>RS</fnm>
               </au>
            </aug>
            <source>Drosophila: A Laboratory Handbook</source>
            <publisher>Cold Spring Harbor, New York: Cold Spring Harbor Laboratory Press</publisher>
            <pubdate>2005</pubdate>
         </bibl>
      </refgrp>
   </bm>
</art>
