<?xml version='1.0'?>
<!DOCTYPE art SYSTEM 'http://www.biomedcentral.com/xml/article.dtd'>
<art>
   <ui>1471-2199-9-21</ui>
   <ji>1471-2199</ji>
   <fm>
      <dochead>Research article</dochead>
      <bibl>
         <title>
            <p>Comparative genomic analysis of the arthropod muscle myosin heavy chain genes allows ancestral gene reconstruction and reveals a new type of 'partially' processed pseudogene</p>
         </title>
         <aug>
            <au id="A1">
               <snm>Odronitz</snm>
               <fnm>Florian</fnm>
               <insr iid="I1"/>
               <email>flod@nmr.mpibpc.mpg.de</email>
            </au>
            <au id="A2" ca="yes">
               <snm>Kollmar</snm>
               <fnm>Martin</fnm>
               <insr iid="I1"/>
               <email>mako@nmr.mpibpc.mpg.de</email>
            </au>
         </aug>
         <insg>
            <ins id="I1">
               <p>Abteilung NMR basierte Strukturbiologie, Max-Planck-Institut f&#252;r Biophysikalische Chemie, Am Fassberg 11, D-37077 G&#246;ttingen, Germany</p>
            </ins>
         </insg>
         <source>BMC Molecular Biology</source>
         <issn>1471-2199</issn>
         <pubdate>2008</pubdate>
         <volume>9</volume>
         <issue>1</issue>
         <fpage>21</fpage>
         <url>http://www.biomedcentral.com/1471-2199/9/21</url>
         <xrefbib>
            <pubidlist>
               <pubid idtype="pmpid">18254963</pubid>
               <pubid idtype="doi">10.1186/1471-2199-9-21</pubid>
            </pubidlist>
         </xrefbib>
      </bibl>
      <history>
         <rec>
            <date>
               <day>24</day>
               <month>10</month>
               <year>2007</year>
            </date>
         </rec>
         <acc>
            <date>
               <day>06</day>
               <month>2</month>
               <year>2008</year>
            </date>
         </acc>
         <pub>
            <date>
               <day>06</day>
               <month>2</month>
               <year>2008</year>
            </date>
         </pub>
      </history>
      <cpyrt>
         <year>2008</year>
         <collab>Odronitz and Kollmar; licensee BioMed Central Ltd.</collab>
         <note>This is an Open Access article distributed under the terms of the Creative Commons Attribution License (<url>http://creativecommons.org/licenses/by/2.0</url>), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</note>
      </cpyrt>
      <abs>
         <sec>
            <st>
               <p>Abstract</p>
            </st>
            <sec>
               <st>
                  <p>Background</p>
               </st>
               <p>Alternative splicing of mutually exclusive exons is an important mechanism for increasing protein diversity in eukaryotes. The insect <it>Mhc </it>(myosin heavy chain) gene produces all different muscle myosins as a result of alternative splicing in contrast to most other organisms of the Metazoa lineage, that have a family of muscle genes with each gene coding for a protein specialized for a functional niche.</p>
            </sec>
            <sec>
               <st>
                  <p>Results</p>
               </st>
               <p>The muscle myosin heavy chain genes of 22 species of the Arthropoda ranging from the waterflea to wasp and <it>Drosophila </it>have been annotated. The analysis of the gene structures allowed the reconstruction of an ancient muscle myosin heavy chain gene and showed that during evolution of the arthropods introns have mainly been lost in these genes although intron gain might have happened in a few cases. Surprisingly, the genome of <it>Aedes aegypti </it>contains another and that of <it>Culex pipiens quinquefasciatus </it>two further muscle myosin heavy chain genes, called <it>Mhc3 </it>and <it>Mhc4</it>, that contain only one variant of the corresponding alternative exons of the <it>Mhc1 </it>gene. <it>Mhc3 </it>transcription in <it>Aedes aegypti </it>is documented by EST data. <it>Mhc3 </it>and <it>Mhc4 </it>inserted in the <it>Aedes </it>and <it>Culex </it>genomes either by gene duplication followed by the loss of all but one variant of the alternative exons, or by incorporation of a transcript of which all other variants have been spliced out retaining the exon-intron structure. The second and more likely possibility represents a new type of a 'partially' processed pseudogene.</p>
            </sec>
            <sec>
               <st>
                  <p>Conclusion</p>
               </st>
               <p>Based on the comparative genomic analysis of the alternatively spliced arthropod muscle myosin heavy chain genes we propose that the splicing process operates sequentially on the transcript. The process consists of the splicing of the mutually exclusive exons until one exon out of the cluster remains while retaining surrounding intronic sequence. In a second step splicing of introns takes place. A related mechanism could be responsible for the splicing of other genes containing mutually exclusive exons.</p>
            </sec>
         </sec>
      </abs>
   </fm>
   <meta>
      <classifications>
         <classification type="bmc" subtype="user_supplied_xml" id="endnote"/>
      </classifications>
   </meta>
   <bdy>
      <sec>
         <st>
            <p>Background</p>
         </st>
         <p>Alternative splicing is an important and widespread mechanism that is used by higher organisms to express molecularly distinct mRNAs in response to developmental and cellular contexts <abbrgrp><abbr bid="B1">1</abbr><abbr bid="B2">2</abbr></abbrgrp>. Mutually exclusive splicing, in which only one exon is chosen out of a cluster of alternative exons arranged in a tandem array, is a very frequent alternative splicing event on a genome-wide level <abbrgrp><abbr bid="B3">3</abbr><abbr bid="B4">4</abbr></abbrgrp>. Several mechanisms have been proposed that explain why only one of the two or more variants is included in the mature mRNA <abbrgrp><abbr bid="B5">5</abbr><abbr bid="B6">6</abbr><abbr bid="B7">7</abbr></abbrgrp>. Mostly, Metazoa contain mutually exclusive exons only in pairs. Extreme cases for mutually exclusive splicing are the insects <it>Dscam </it>genes that have arrays of up to 52 variants as observed in the <it>Drosophila Dscam </it>gene <abbrgrp><abbr bid="B8">8</abbr></abbrgrp>. A less dramatic example is the mutually exclusive spliced <it>Drosophila </it>muscle myosin heavy chain gene that can potentially produce 480 different mRNAs <abbrgrp><abbr bid="B9">9</abbr></abbrgrp>.</p>
         <p>Myosins comprise a large superfamily of actin-based motors that fulfill a variety of cellular functions from cell division, cellular locomotion, and vesicle transport to muscle contraction <abbrgrp><abbr bid="B10">10</abbr><abbr bid="B11">11</abbr></abbrgrp>. 35 classes of myosins have been identified to date with each class being responsible for a different function <abbrgrp><abbr bid="B12">12</abbr><abbr bid="B13">13</abbr><abbr bid="B14">14</abbr></abbrgrp>. The first myosin was identified in skeletal muscle tissue over hundred years ago (for a review about the history of muscle myosin see <abbrgrp><abbr bid="B15">15</abbr></abbrgrp>) and, since different myosins turned up, it has been referred to as conventional myosin or class-II myosin. Class-II myosins comprise the largest and most extensively studied class not only because the muscle myosin genes and muscles have been in the focus of biophysical and biochemical studies for decades and because the metazoan species are the most studied organisms but also because this class contains the most isoforms per organism <abbrgrp><abbr bid="B12">12</abbr></abbrgrp>.</p>
         <p><it>Drosophila melanogaster </it>contains two class-II myosin genes, one encoding the muscle isoforms (<it>Mhc</it>) and one the nonmuscle isoform (<it>zipper</it>) <abbrgrp><abbr bid="B16">16</abbr></abbrgrp>. The <it>Mhc </it>gene produces all different muscle myosins as a result of alternative RNA splicing <abbrgrp><abbr bid="B9">9</abbr></abbrgrp>. This is in contrast to the organisms of most other taxa of the Metazoa lineage, that have a family of muscle myosin heavy chain genes with each gene coding for a protein specialized for a functional niche. For example, the nematode <it>Caenorhabditis elegans </it>expresses six muscle myosins <abbrgrp><abbr bid="B13">13</abbr></abbrgrp>, while the ascidian <it>Ciona intestinalis </it>genome contains five muscle myosin heavy chain genes <abbrgrp><abbr bid="B17">17</abbr></abbrgrp> and vertebrate genomes encode up to 22 muscle myosin heavy chain isoforms <abbrgrp><abbr bid="B12">12</abbr></abbrgrp>.</p>
         <p>The <it>Drosophila Mhc </it>gene consists of 30 exons including five clusters of alternatively spliced exons and one differentially included penultimate exon. Thus, 480 combinations of alternative exons are possible. The four clusters of alternative exons in the motor domain part of the gene code for 120 different variations of the motor domain. In contrast to the muscle myosins of the other metazoa species, changes modulating myosin function are thus limited to four regions in the head domain. These discrete regions of sequence variation have been shown to produce physiological differences among the various muscle types <abbrgrp><abbr bid="B18">18</abbr></abbrgrp>. Although many variations are possible and all alternative exons get expressed at some point in <it>Drosophila's </it>life, only a limited number of combinations seem to be employed. For example, during <it>Drosophila </it>embryogenesis only seven <it>Mhc </it>transcripts have been found to be expressed <abbrgrp><abbr bid="B18">18</abbr></abbrgrp>.</p>
         <p>The genome of <it>Drosophila melanogaster </it>was the third eukaryotic genome to be completely sequenced <abbrgrp><abbr bid="B19">19</abbr></abbrgrp>. Since then, the number of sequenced organisms has increased rapidly. Of the phylum Arthropoda, the genomes of the mosquitos <it>Anopheles gambiae </it><abbrgrp><abbr bid="B20">20</abbr></abbrgrp> and <it>Aedes aegypti </it><abbrgrp><abbr bid="B21">21</abbr></abbrgrp> and the silkworm <it>Bombyx mori </it><abbrgrp><abbr bid="B22">22</abbr></abbrgrp> have been published, and 17 further insect genomes have been finished of which eleven belong to the <it>Drosophila </it>species group <abbrgrp><abbr bid="B23">23</abbr><abbr bid="B24">24</abbr></abbrgrp>.</p>
         <p>Originally, pseudogenes have been defined as DNA sequences that are derived from functional genes, but acquired such degenerative features as premature stop codons and frameshift mutations, which make them unable to produce functional proteins <abbrgrp><abbr bid="B25">25</abbr><abbr bid="B26">26</abbr><abbr bid="B27">27</abbr></abbrgrp>. Non-processed pseudogenes are thought to result from tandem duplications of genes with subsequent accumulation of disabling mutations. Processed pseudogenes lack introns and their original upstream gene regulatory resions and presumably arise by retrotransposition of a mature messenger RNA (mRNA). While non-processed pseudogenes are commonly found near the functional original gene, processed pseudogenes are randomly inserted into the genome. Also, partially processed pseudogenes have been reported that sometimes contain the complete coding region <abbrgrp><abbr bid="B28">28</abbr><abbr bid="B29">29</abbr></abbrgrp>. Recent studies have shown, that pseudogenes are not just "Junk" DNA but often exhibit functional roles (for a review see <abbrgrp><abbr bid="B26">26</abbr></abbrgrp>).</p>
         <p>Here, we report the comparative genomic analysis of the muscle myosin heavy chain genes of all arthropod species that have completely been sequenced so far. On this basis we propose that the splicing process operates sequentially on the transcript involving the splicing of all unwanted alternative versions of an exon while retaining intronic sequence around the remaining variant.</p>
      </sec>
      <sec>
         <st>
            <p>Results</p>
         </st>
         <sec>
            <st>
               <p>Identification and annotation of the muscle myosin heavy chains</p>
            </st>
            <p>The arthropod muscle myosin heavy chain genes were identified by TBLASTN searches against the corresponding genome data of the different species using the <it>Drosophila melanogaster </it>protein as query (Figure <figr fid="F1">1</figr>, see Additional file <supplr sid="S1">1</supplr>). The species analysed were the mosquitos <it>Aedes aegypti, Culex pipiens quinquefasciatus </it>and <it>Anopheles gambiae</it>, the silkworm <it>Bombyx mori</it>, the honeybee <it>Apis mellifera</it>, the jewel wasp <it>Nasonia vitripennis</it>, the waterflea <it>Daphnia pulex</it>, the rust-red flour beetle <it>Tribolium castaneum</it>, the body louse <it>Pediculus humanus corporis</it>, and thirteen <it>Drosophila </it>species (Table <tblr tid="T1">1</tblr>). According to the general nomenclature for myosin sequences <abbrgrp><abbr bid="B12">12</abbr></abbrgrp> the alternatively spliced muscle myosin heavy chain genes are named <it>Mhc1</it>, and the non-muscle myosin heavy chain genes are denoted <it>Mhc2</it>. The sequences were assigned by manual inspection of the genomic DNA sequences. Exons have been confirmed by the identification of flanking consensus intron-exon splice junction donor and acceptor sequences (Figure <figr fid="F1">1</figr>) <abbrgrp><abbr bid="B30">30</abbr></abbrgrp>. Because of the five to nine clusters of mutually exclusive exons and the included or excluded penultimate exon, automatic identification of all exons failed. The genomic sequences of <it>Apis mellifera </it>and <it>Bombyx mori </it>contain several gaps that at least in one case must have contained missing exons. The expression of the myosin genes including the transcription of some of the mutually exclusive exons has been confirmed by analysis of corresponding EST data.</p>
            <suppl id="S1">
               <title>
                  <p>Additional file 1</p>
               </title>
               <text>
                  <p><b>Mhc1 sequence alignment</b>. The file contains the aligned arthropod Mhc1 protein sequences. Also included are all variants of the alternatively spliced exons.</p>
               </text>
               <file name="1471-2199-9-21-S1.fas">
                  <p>Click here for file</p>
               </file>
            </suppl>
            <tbl id="T1">
               <title>
                  <p>Table 1</p>
               </title>
               <caption>
                  <p>Nucleotide ID's and number of combinations of alternative exons for the motor domains and the full-length proteins.</p>
               </caption>
               <tblbdy cols="5">
                  <r>
                     <c ca="left">
                        <p>Species</p>
                     </c>
                     <c ca="center">
                        <p>Species Abbr.</p>
                     </c>
                     <c ca="center">
                        <p>Nucleotide ID's GenBank:</p>
                     </c>
                     <c ca="center">
                        <p>Motor domain</p>
                     </c>
                     <c ca="center">
                        <p>Full-length protein</p>
                     </c>
                  </r>
                  <r>
                     <c cspan="5">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <it>Daphnia pulex</it>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <it>Dap</it>
                        </p>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c ca="center">
                        <p>1536</p>
                     </c>
                     <c ca="center">
                        <p>> 3072</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <it>Bombyx mori str. Dazao</it>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <it>Bm</it>
                        </p>
                     </c>
                     <c ca="center">
                        <p><ext-link ext-link-type="gen" ext-link-id="AADK01001734">AADK01001734</ext-link>, <ext-link ext-link-type="gen" ext-link-id="BAAB01137479">BAAB01137479</ext-link></p>
                        <p><ext-link ext-link-type="gen" ext-link-id="BAAB01017092">BAAB01017092</ext-link>, <ext-link ext-link-type="gen" ext-link-id="AV404226">AV404226</ext-link></p>
                        <p><ext-link ext-link-type="gen" ext-link-id="AADK01040535">AADK01040535</ext-link>, <ext-link ext-link-type="gen" ext-link-id="AADK01049792">AADK01049792</ext-link></p>
                     </c>
                     <c ca="center">
                        <p>192</p>
                     </c>
                     <c ca="center">
                        <p>768</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <it>Tribolium castaneum str. Georgia GA2</it>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <it>Tic</it>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <ext-link ext-link-type="gen" ext-link-id="AAJJ01000118">AAJJ01000118</ext-link>
                        </p>
                     </c>
                     <c ca="center">
                        <p>192</p>
                     </c>
                     <c ca="center">
                        <p>> 384</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <it>Nasonia vitripennis str. SymAX</it>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <it>Nav</it>
                        </p>
                     </c>
                     <c ca="center">
                        <p><ext-link ext-link-type="gen" ext-link-id="AAZX01008059">AAZX01008059</ext-link>, <ext-link ext-link-type="gen" ext-link-id="AAZX01007288">AAZX01007288</ext-link></p>
                     </c>
                     <c ca="center">
                        <p>144</p>
                     </c>
                     <c ca="center">
                        <p>> 288</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <it>Apis mellifera str. DH4</it>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <it>Am</it>
                        </p>
                     </c>
                     <c ca="center">
                        <p><ext-link ext-link-type="gen" ext-link-id="AADG05005753">AADG05005753</ext-link>, <ext-link ext-link-type="gen" ext-link-id="AADG05005754">AADG05005754</ext-link></p>
                        <p>
                           <ext-link ext-link-type="gen" ext-link-id="AADG05005757">AADG05005757</ext-link>
                        </p>
                     </c>
                     <c ca="center">
                        <p>96</p>
                     </c>
                     <c ca="center">
                        <p>384</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <it>Drosophila ananassae TSC#14024-0371.13</it>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <it>Da</it>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <ext-link ext-link-type="gen" ext-link-id="AAPP01015693">AAPP01015693</ext-link>
                        </p>
                     </c>
                     <c ca="center">
                        <p>120</p>
                     </c>
                     <c ca="center">
                        <p>480</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <it>Drosophila erecta TSC#14021-0224.01</it>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <it>Der</it>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <ext-link ext-link-type="gen" ext-link-id="AAPQ01007075">AAPQ01007075</ext-link>
                        </p>
                     </c>
                     <c ca="center">
                        <p>120</p>
                     </c>
                     <c ca="center">
                        <p>480</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <it>Drosophila grimshawi TSC#15287-2541.00</it>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <it>Dg</it>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <ext-link ext-link-type="gen" ext-link-id="AAPT01021775">AAPT01021775</ext-link>
                        </p>
                     </c>
                     <c ca="center">
                        <p>120</p>
                     </c>
                     <c ca="center">
                        <p>480</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <it>Drosophila hydei</it>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <it>Dh</it>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <ext-link ext-link-type="gen" ext-link-id="X77570">X77570</ext-link>
                        </p>
                     </c>
                     <c ca="center">
                        <p>120</p>
                     </c>
                     <c ca="center">
                        <p>480</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <it>Drosophila melanogaster</it>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <it>Dm</it>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <ext-link ext-link-type="gen" ext-link-id="NM_165190">NM_165190</ext-link>
                        </p>
                     </c>
                     <c ca="center">
                        <p>120</p>
                     </c>
                     <c ca="center">
                        <p>480</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <it>Drosophila mojavensis TSC#15081-1352.22</it>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <it>Dmo</it>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <ext-link ext-link-type="gen" ext-link-id="AAPU01010481">AAPU01010481</ext-link>
                        </p>
                     </c>
                     <c ca="center">
                        <p>120</p>
                     </c>
                     <c ca="center">
                        <p>480</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <it>Drosophila persimilis MSH-3</it>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <it>Drp</it>
                        </p>
                     </c>
                     <c ca="center">
                        <p><ext-link ext-link-type="gen" ext-link-id="AAIZ01000908">AAIZ01000908</ext-link>, <ext-link ext-link-type="gen" ext-link-id="AAIZ01000907">AAIZ01000907</ext-link></p>
                        <p><ext-link ext-link-type="gen" ext-link-id="AAIZ01000906">AAIZ01000906</ext-link>, <ext-link ext-link-type="gen" ext-link-id="AAIZ01000905">AAIZ01000905</ext-link></p>
                        <p><ext-link ext-link-type="gen" ext-link-id="AAIZ01000904">AAIZ01000904</ext-link>, <ext-link ext-link-type="gen" ext-link-id="AAIZ01024863">AAIZ01024863</ext-link></p>
                        <p>
                           <ext-link ext-link-type="gen" ext-link-id="AAIZ01000903">AAIZ01000903</ext-link>
                        </p>
                     </c>
                     <c ca="center">
                        <p>120</p>
                     </c>
                     <c ca="center">
                        <p>480</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <it>Drosophila pseudoobscura MV2-25</it>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <it>Dp</it>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <ext-link ext-link-type="gen" ext-link-id="AAFS01000199">AAFS01000199</ext-link>
                        </p>
                     </c>
                     <c ca="center">
                        <p>120</p>
                     </c>
                     <c ca="center">
                        <p>480</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <it>Drosophila sechellia Rob3c</it>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <it>Dse</it>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <ext-link ext-link-type="gen" ext-link-id="AAKO01001629">AAKO01001629</ext-link>
                        </p>
                     </c>
                     <c ca="center">
                        <p>120</p>
                     </c>
                     <c ca="center">
                        <p>480</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <it>Drosophila simulans str. white501</it>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <it>Dss</it>
                        </p>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c ca="center">
                        <p>120</p>
                     </c>
                     <c ca="center">
                        <p>480</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <it>Drosophila virilis TSC#15010-1051.87</it>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <it>Dv</it>
                        </p>
                     </c>
                     <c ca="center">
                        <p><ext-link ext-link-type="gen" ext-link-id="AANI01016210">AANI01016210</ext-link>, <ext-link ext-link-type="gen" ext-link-id="AANI01016211">AANI01016211</ext-link></p>
                     </c>
                     <c ca="center">
                        <p>120</p>
                     </c>
                     <c ca="center">
                        <p>480</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <it>Drosophila yakuba Tai18E2</it>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <it>Dy</it>
                        </p>
                     </c>
                     <c ca="center">
                        <p><ext-link ext-link-type="gen" ext-link-id="AAEU01002444">AAEU01002444</ext-link>, <ext-link ext-link-type="gen" ext-link-id="AAEU01002445">AAEU01002445</ext-link></p>
                        <p>
                           <ext-link ext-link-type="gen" ext-link-id="AAEU01002446">AAEU01002446</ext-link>
                        </p>
                     </c>
                     <c ca="center">
                        <p>120</p>
                     </c>
                     <c ca="center">
                        <p>480</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <it>Drosophila willistoni TSC#14030-0811.24</it>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <it>Dw</it>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <ext-link ext-link-type="gen" ext-link-id="AAQB01006734">AAQB01006734</ext-link>
                        </p>
                     </c>
                     <c ca="center">
                        <p>120</p>
                     </c>
                     <c ca="center">
                        <p>480</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <it>Anopheles gambiae str. PEST</it>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <it>Ang</it>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <ext-link ext-link-type="gen" ext-link-id="AAAB01008980">AAAB01008980</ext-link>
                        </p>
                     </c>
                     <c ca="center">
                        <p>128</p>
                     </c>
                     <c ca="center">
                        <p>768</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <it>Aedes aegypti str. Liverpool Mhc1</it>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <it>Aea</it>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <ext-link ext-link-type="gen" ext-link-id="AAGE02009209">AAGE02009209</ext-link>
                        </p>
                     </c>
                     <c ca="center">
                        <p>128</p>
                     </c>
                     <c ca="center">
                        <p>512</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <it>Aedes aegypti str. Liverpool Mhc3</it>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <it>Aea</it>
                        </p>
                     </c>
                     <c ca="center">
                        <p><ext-link ext-link-type="gen" ext-link-id="AAGE02009019">AAGE02009019</ext-link>, <ext-link ext-link-type="gen" ext-link-id="AAGE02009018">AAGE02009018</ext-link></p>
                     </c>
                     <c ca="center">
                        <p>1</p>
                     </c>
                     <c ca="center">
                        <p>1</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <it>Pediculus humanus corporis str. USDA</it>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <it>Pdc</it>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <ext-link ext-link-type="gen" ext-link-id="AAZO01001178">AAZO01001178</ext-link>
                        </p>
                     </c>
                     <c ca="center">
                        <p>16</p>
                     </c>
                     <c ca="center">
                        <p>32</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <it>Culex pipiens quinquefasciatus JHB Mhc1</it>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <it>Cpq</it>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <ext-link ext-link-type="gen" ext-link-id="AAWU01000999">AAWU01000999</ext-link>
                        </p>
                     </c>
                     <c ca="center">
                        <p>128</p>
                     </c>
                     <c ca="center">
                        <p>512</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <it>Culex pipiens quinquefasciatus JHB Mhc3</it>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <it>Cpq</it>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <ext-link ext-link-type="gen" ext-link-id="AAWU01000999">AAWU01000999</ext-link>
                        </p>
                     </c>
                     <c ca="center">
                        <p>1</p>
                     </c>
                     <c ca="center">
                        <p>1</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <it>Culex pipiens quinquefasciatus JHB Mhc4</it>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <it>Cpq</it>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <ext-link ext-link-type="gen" ext-link-id="AAWU01000999">AAWU01000999</ext-link>
                        </p>
                     </c>
                     <c ca="center">
                        <p>1</p>
                     </c>
                     <c ca="center">
                        <p>1</p>
                     </c>
                  </r>
               </tblbdy>
            </tbl>
            <fig id="F1">
               <title>
                  <p>Figure 1</p>
               </title>
               <caption>
                  <p>Diagram of the arthropod <it>Mhc1 </it>genes with exon-intron structure</p>
               </caption>
               <text>
                  <p><b>Diagram of the arthropod <it>Mhc1 </it>genes with exon-intron structure</b>. The gene structures of the arthropod muscle myosins genes are shown using the following color code: light-gray: intron sequences; dark-gray: common exons; colored: alternatively spliced exons. The <it>Drosophila melanogaster Mhc1 </it>gene is shown as representative for all <it>Drosophila sp. Mhc1 </it>genes, because their gene structures only differ in the length of the introns. The transcriptional and translational start sites, the stop codons and polyadenylation sites are shown if they have been determined. Some genes are spread on several contigs. The corresponding gap positions are shown in black, if further exons are not expected, and in red, if exons are definitively missing. The genes are drawn to scale except for the <it>Aedes aegypti </it>genes where the extremely long introns have been shortened. Gaps have been filled with 100 bp although their exact length is unknown.</p>
               </text>
               <graphic file="1471-2199-9-21-1"/>
            </fig>
            <p>The untranslated first exons of the genes have been assigned by analysing EST data, if possible. Because untranslated 5' exons were found for all those species for which EST data covering the amino-termini of the genes is available, it is expected that the other arthropod myosin genes also contain untranslated first exons. Accordingly, the unambiguously identified exons have been numbered starting with exon two. Duplicated exons were named in alphabetical order according to the direction of transcription, the exception being the alternatively spliced exon 11 of the <it>Drosophila </it>Mhc1 of which the first of the mutually exclusive exons was named 11e for historical reasons <abbrgrp><abbr bid="B9">9</abbr></abbrgrp>. The differentially included penultimate exons of the <it>Drosophila </it>species have been predicted based on their similarity at the DNA level. Although this exon mainly consists of untranslated bases and its identity between the <it>Drosophila </it>species is almost as low as that found in intron regions, the exon borders are conserved enough to be recognised. The carboxy-terminal exons of the other arthropod <it>Mhc1 </it>genes have been confirmed by analysing EST data, if possible. For <it>TicMhc1 </it>and <it>DapMhc1 </it>only one carboxy-terminal exon could be confirmed by EST data. However, given the exon conservation between all arthropod <it>Mhc1 </it>genes it is expected that both genes contain another carboxy-terminal exon. For <it>Nasonia</it>, EST data is not available. The carboxy-terminal exon of the <it>NavMhc1 </it>gene was identified based on its homology to the other <it>Mhc1 </it>exons. An exon corresponding to the penultimate exon of the other genes could not be identified.</p>
            <p>The <it>Drosophila sp. Mhc1 </it>genes, the <it>AeaMhc1 </it>and the <it>CpqMhc1 </it>gene contain consensus polyadenylation signals AATAAA, while the <it>Mhc1 </it>genes of <it>Ang</it>, <it>Am</it>, <it>Dap</it>, <it>Nav</it>, <it>Pdc</it>, and <it>Tic </it>contain polyadenylation signals of type AAAAAA. For the <it>DmMhc1 </it>gene it has been shown that the use of either polyadenylation site is not regulated <abbrgrp><abbr bid="B31">31</abbr><abbr bid="B32">32</abbr></abbrgrp> and the same might be true for the two or multiple polyadenylation sites of the other arthropod genes.</p>
         </sec>
         <sec>
            <st>
               <p>Identification of further muscle myosin heavy chain genes in <it>Aedes aegypti </it>and <it>Culex pipiens quinquefasciatus</it></p>
            </st>
            <p>Surprisingly, a second muscle myosin heavy chain gene has been identified in <it>Aedes aegypti </it>(Figure <figr fid="F1">1</figr>) and named <it>Mhc3</it>. The <it>Mhc3 </it>gene contains the same exon organisation as <it>Mhc1 </it>except that it does not have any cluster of alternatively spliced exons and misses the two carboxy-terminal exons (Figure <figr fid="F1">1</figr>). Many EST clones provide supporting evidence for the deduced carboxy-terminus, the amino-terminal untranslated exon1, and other parts of the gene. The exons related to the alternatively spliced exons of <it>Mhc1 </it>are either identical ("exon3b") or very similar to one of the <it>Mhc1 </it>exons. The protein sequence of Mhc3 has an overall sequence identity of 91.4% to Mhc1. Besides the different carboxy-termini, the largest differences are in loop-1, which is three residues shorter in Mhc3, and in loop-2, which has only six instead of ten glycines and might therefore be structurally more restricted. The <it>Culex pipiens quinquefasciatus </it>genome encodes another two muscle myosin heavy chain genes that are very similar to each other and have been named <it>Mhc3 </it>and <it>Mhc4 </it>(Figure <figr fid="F1">1</figr>). Both have the same exon organisation as the <it>CpqMhc1 </it>gene except that they do not have any cluster of alternatively spliced exons and miss the two carboxy-terminal exons. Another difference is that alternative exons 8 are fused to the following constitutive exons in the <it>Mhc3 </it>and <it>Mhc4 </it>genes. The protein sequence identity between <it>CpqMhc3 </it>and <it>CpqMhc4 </it>is 92.0%, the identity to <it>CpqMhc1 </it>is 84.4% and 90.4%, respectively. Surprisingly, <it>AeaMhc3</it>, <it>CpqMhc3 </it>and <it>CpqMhc4 </it>retained identical variants of the alternatively spliced exons of the corresponding <it>Mhc1 </it>genes.</p>
         </sec>
         <sec>
            <st>
               <p>The <it>BmMhc1</it>, <it>TicMhc1</it>, <it>PdcMhc1 </it>and <it>DapMhc1 </it>genes contain further clusters of alternatively spliced exons</p>
            </st>
            <p>The analysis of the <it>BmMhc1</it>, <it>TicMhc1</it>, <it>PdcMhc1</it>, and <it>DapMhc1 </it>genes revealed further clusters of alternatively spliced exons compared to the <it>DmMhc1 </it>gene. All further sets of alternative exons encode for sequence that is part of the motor domain. The additional alternative exon of <it>Bm</it>, <it>Pdc </it>and <it>Tic </it>is conserved between these three organisms, and is also encoded within the <it>Dap Mhc1 </it>gene. It is located between the alternatively spliced exons 11 and 17 (<it>Bm</it>), alternative exon 13 and constitutive exon 19 (<it>Pdc</it>), and alternative exons 12 and 16 (<it>Tic</it>), respectively, and separated from the neighbouring alternatively spliced exons by constitutively expressed exons (Figure <figr fid="F1">1</figr>). In contrast to the other alternatively spliced exons, these alternatively spliced exons are different in length and amino acid conservation (see Additional file <supplr sid="S2">2</supplr>, figure S6A). The first part of the exon encodes part of loop-2 (see below), that is a very flexible loop involved in actin-binding. In the arthropod genes it mainly consists of glycines, arginines, and lysines. Thus, the alternatively spliced exons of <it>Bm</it>, <it>Tic</it>, <it>Pdc</it>, and <it>Dap </it>encode different numbers and compositions of these residues. The second part of the alternatively spliced exon is part of the following alpha-helix and hence completely conserved in length and strongly conserved in composition. In addition to this cluster of alternatively spliced exons, the <it>DapMhc1 </it>gene contains three further sets of alternatively spliced exons extending its number of clusters of alternatively spliced exons to nine (compared to five in <it>Drosophila</it>). Alternative exon 6 encodes an alternative P-loop to loop-1 sequence, alternative exon 11 directly follows the alternative exon encoding a structural part near the ATP-binding site, and alternative exon 18 encodes an alternative version of the sequence after loop-2 (Figure <figr fid="F1">1</figr>).</p>
            <suppl id="S2">
               <title>
                  <p>Additional file 2</p>
               </title>
               <text>
                  <p><b>Sequence alignment and analysis of the alternatively spliced exons</b>. The file contains the aligned alternative exons of the arthropod Mhc1 protein sequences. Also included are the graphical representations of the sequence identities.</p>
               </text>
               <file name="1471-2199-9-21-S2.pdf">
                  <p>Click here for file</p>
               </file>
            </suppl>
         </sec>
         <sec>
            <st>
               <p>The <it>PdcMhc1 </it>gene encodes a strongly reduced set of possible transcripts</p>
            </st>
            <p>The <it>Pediculus humanus corporis Mhc1 </it>gene contains the most reduced set of alternative exons (Figure <figr fid="F1">1</figr>). It has four sets of alternative exons each comprising two variants. However, the sequence encoding part of the converter domain, which is encoded by sets of three to five alternative exons in the other arthropod genes, has been fused to the following exon forming one constitutive exon in the <it>PdcMhc1 </it>gene (exon 19, Figure <figr fid="F1">1</figr>). Also, the part in the tail domain encoded by a set of two alternative exons in all other arthropod genes is represented by only one exon in the <it>PdcMhc1 </it>gene (exon 25). Altogether, the alternative exons encode for 16 different versions of the motor domain and 32 different mRNAs of the <it>PdcMhc1 </it>gene, compared to potentially 120 different combinations of alternative exons for only the motor domain of the <it>Drosophila Mhc1 </it>gene.</p>
         </sec>
         <sec>
            <st>
               <p>Conservation of alternatively spliced exons</p>
            </st>
            <p>The number of variants differs between the arthropod species for many of the alternatively spliced exons (Figures <figr fid="F1">1</figr> and <figr fid="F2">2</figr>). For the first set of alternatively spliced exons two variants have been found in all <it>Mhc1 </it>genes. Both differ by two absolutely conserved residues, namely the amino acids alanine and aspartate at positions 25 and 26 in the "a" variants of the exon that are substituted by serine and asparagine in the "b" variants (Figure <figr fid="F3">3</figr>). A slightly less conserved marker for the "b" variants is a cysteine at position 21. Variant 3a of the <it>DapMhc1 </it>is an exception as it has an additional residue at the N-terminus compared to the other <it>Mhc1 </it>variant "a" exons. The <it>DapMhc1 </it>gene encodes three clusters of alternatively spliced exons not found in the other arthropod <it>Mhc1 </it>genes. For all three clusters exons variant "b" is more homologous to the corresponding amino acid sequences of the other Mhc1 proteins than variant "a" (see Additional file <supplr sid="S2">2</supplr>, figures S2, S4, and S6B). The alternatively spliced exons of <it>BmMhc1</it>, <it>DapMhc1</it>, <it>PdcMhc1 </it>and <it>TicMhc1 </it>covering loop-2 are different in length and starting position (see Additional file <supplr sid="S2">2</supplr>, figure S6A). However, the "a" variants are more similar to each other than to the "b" variants and the corresponding amino acid sequences of the other Mhc1 proteins. Thus, the common ancestor of <it>Bm</it>, <it>Dap</it>, and <it>Tic </it>had in all probability already contained an "a" and a "b" variant. Completely conserved residues characterizing the "a" variant are a serine at the end of loop-2, a glutamate at position 3, and a leucine at position 8 of the following helix ([G/K/R 8-9]<b>S </b>[G/A]F [<b>Q</b>/M]TVS [S/A]<b>L</b>YR). Except for <it>PdcMhc1</it>, all arthropod <it>Mhc1 </it>genes have two variants of the mutually exclusively spliced exon in the tail (Figure <figr fid="F2">2</figr>; see also Additional file <supplr sid="S2">2</supplr>, figure S8). The most conserved differences between the two variants are an aspartate at position 14 in variant "b" (either an asparagine or a glutamine in variant "a") and an asparagine at position 24 (an arginine in variant "a"). In addition, at position 15 the "b" variants have a large hydrophobic residue (leucine, methionine, or phenylalanine) while the "a" variants have a small polar residue (serine or threonine). In contrast to the other <it>Mhc1 </it>genes, the "a" variant of <it>DapMhc1 </it>is closer related to the "b" variants than to the other "a" variants.</p>
            <fig id="F2">
               <title>
                  <p>Figure 2</p>
               </title>
               <caption>
                  <p>Relationships between alternatively spliced exon</p>
               </caption>
               <text>
                  <p><b>Relationships between alternatively spliced exon</b>. Sections of the <it>Mhc1 </it>genes of Figure 1 have been aligned showing the relationship between the exon-intron structures of the regions containing alternatively spliced exons. Continuous lines connect variants that are almost identical and thus expected to be derived from a common ancestor. Bold lines connecting alternative exons in regions containing multiple variants per <it>Mhc1 </it>gene highlight particularly conserved exons in these sets. Dotted lines represent putative connections between certain variants although their identity is not very strong on the protein level.</p>
               </text>
               <graphic file="1471-2199-9-21-2"/>
            </fig>
            <fig id="F3">
               <title>
                  <p>Figure 3</p>
               </title>
               <caption>
                  <p>Sequence conservation in the first set of the alternatively spliced exons</p>
               </caption>
               <text>
                  <p><b>Sequence conservation in the first set of the alternatively spliced exons</b>. On top, the protein sequence alignment of the alternative exons is shown. The upper sequences, termed Mhc1, Mhc3, and Mhc4, respectively, represent the variant "a" exons. Below, the comparison of the sequence identity between each exon and variant "a" and "b" of every other Mhc1 protein is shown. The graphic has to be read in columns. The higher identity between an exon listed on top and variant "a" or "b" of a certain Mhc1 protein listed on the left side has been set to 1 (red color) while the difference of the lower identity to the value of the higher identity is plotted for the other combination of exons. Thus, in every column the higher identity of the named exon to one of the variants of the other Mhc1 proteins is visualized.</p>
               </text>
               <graphic file="1471-2199-9-21-3"/>
            </fig>
            <p>The situation is more complex for the remaining clusters of mutually exclusive exons that contain three to six variants. The exon encoding a loop-helix motif adjacent to the ATP-binding site (blue color in Figure <figr fid="F1">1</figr>) is not as conserved as the other alternatively spliced exons (Figure <figr fid="F2">2</figr>; see also Additional file <supplr sid="S2">2</supplr>, figure S3). Therefore, it is difficult to identify characteristic residues/motifs for the respective variants. Except for the <it>PdcMhc1 </it>and <it>TicMhc1 </it>genes all genes contain four variants. The variant with the most characteristic residues is variant "c". It is characterized by a positively charged residue at position 8 (arginine or histidine), a conserved arginine at position 21, and a conserved asparagine at position 26. None of these residues appear in any of the other variants at the respective positions. The <it>TicMhc1</it>, <it>PdcMhc1</it>, and <it>DapMhc1 </it>genes have lost this variant. The only strong characteristic of variant "d" is a conserved isoleucine or valine at position 20 that is found in all <it>Mhc1 </it>genes. Variants "a" and "b" do not contain any distinguishing residues.</p>
            <p>The alternatively spliced exons spanning the relay helix and the relay loop are the longest and most conserved of the mutually exclusive exons (see Additional file <supplr sid="S2">2</supplr>, figure S5). The variability ranges from two variants in the <it>Pediculus Mhc1 </it>gene to six variants in the <it>Nasonia </it>gene (Figures <figr fid="F1">1</figr> and <figr fid="F2">2</figr>). The least conserved part of the exon is the relay loop that is not embedded in the motor domain. In this region, characteristic residues for certain variants are found. Variant "c" is characterized by a conserved glutamine at position 49 and either a glutamine or an asparagine at position 50. A copy of this variant is present in all <it>Mhc1 </it>genes except that of <it>Tic</it>. Another conserved variant is variant "d" characterized by a glutamine at position 49 followed by a proline at position 50. This variant appears in the <it>Mhc1 </it>genes of <it>Aea</it>, <it>Ang</it>, <it>Cpq</it>, <it>Tic</it>, and <it>Bm</it>. Similar to the situation for the alternatively spliced exon at the ATP-binding site, the other variants are not conserved enough to define characteristic residues. It is thus not clear which were present in the ancient arthropod gene and which arose through exon duplication in the individual genes. Again, the <it>DapMhc1 </it>is the exception because its first two variants, characterized by two conserved methionines at positions 42 and 55, differ from all other variants.</p>
            <p>The variants of the cluster of alternative exons encoding part of the converter domain also show a high degree of variability (Figure <figr fid="F2">2</figr>; see also Additional file <supplr sid="S2">2</supplr>, figure S7). Two of the variants have characteristic features. Variant "a" is the most conserved of the variants at the protein level having a conserved methionine at position 9 and a conserved cysteine at position 26. These residues do not appear in any of the other variants of this cluster. Variant "a" of this cluster is conserved in the <it>Mhc1 </it>genes of all species and therefore must have been present in their common ancestor. The last of the variants has a characteristic feature at the DNA level. The intron following the last variant always has a GT 5' splice site. This is in contrast to all other variants of this exon whose following introns have a GC 5' splice site. At the amino acid level this variant is characterized by a lysine at position 2, a cysteine at position 5 and a glutamate at position 20.</p>
            <p>Wherever EST and/or cDNA data was available a differentially excluded penultimate exon could be identified. These exons are very short (one to thirteen residues) and not conserved (see Additional file <supplr sid="S2">2</supplr>, figure S9), and therefore similar exons have not been predicted for the species for which EST data is not available. For <it>Ang </it>three carboxy-termini have been identified. Based on EST data the <it>AngMhc1 </it>transcript may also end with a short extension to the antepenultimate exon. This C-terminus is similar to that found for <it>AeaMhc3 </it>and <it>CpqMhc4 </it>and might be used in a similar combination of the other alternatively spliced exons.</p>
         </sec>
         <sec>
            <st>
               <p>Phylogenetic analysis of the arthropod muscle myosin heavy chain genes</p>
            </st>
            <p>A phylogenetic tree of all arthropod Mhc1 protein sequences, always incorporating the first variant of the clusters of alternatively spliced exons and excluding the differentially included penultimate exon, has been generated (Figure <figr fid="F4">4</figr>). In general, the tree reflects the phylogenetic relationship between the species. The <it>Aea</it>Mhc3 sequence is most closely related to the <it>Cpq</it>Mhc3 and the <it>Cpq</it>Mhc4 sequence implicating that the last common ancestor of <it>Aedes </it>and <it>Culex </it>already had one of these genes. The phylogeny of the <it>Drosophila </it>species slightly differs compared to other analyses <abbrgrp><abbr bid="B23">23</abbr></abbrgrp>. Thus, the <it>Da</it>Mhc1 sequence would have been expected to separate after the divergence of the <it>Dp</it>Mhc1 sequence. Similarly, the <it>Dse</it>Mhc1 gene would have been expected to be the closest relative of the <it>Dss</it>Mhc1 sequence. Overall, the sequence identity is very high. Between <it>Dap</it>Mhc1 and the other sequences the identity is 70.6 &#8211; 77.9%, while it is between 77.0% and 99.7% between the other species.</p>
            <fig id="F4">
               <title>
                  <p>Figure 4</p>
               </title>
               <caption>
                  <p>Phylogenetic tree of the arthropod muscle myosin heavy chain proteins</p>
               </caption>
               <text>
                  <p><b>Phylogenetic tree of the arthropod muscle myosin heavy chain proteins</b>. The amino acid sequences of the full-length proteins were aligned manually. Because of their incompleteness the sequences of <it>Drosophila persimilis </it>and <it>Drosophila yakuba </it>have been omitted from the tree calculation. Support values for each internal branch were obtained by 1,000 bootstrap steps. The scale bar corresponds to 0.1 estimated amino acid substitutions per site.</p>
               </text>
               <graphic file="1471-2199-9-21-4"/>
            </fig>
         </sec>
         <sec>
            <st>
               <p>Predicting the gene structure of an ancient <it>Mhc1 </it>gene</p>
            </st>
            <p>Whenever intron positions are shared between the genes, the corresponding type of splice site is conserved, with the exception of the shared exon 9 (<it>AmMhc1</it>), exon 10 (<it>TicMhc1</it>), exon 9 (<it>BmMhc1</it>), and the alternatively spliced exon 11 of <it>DapMhc1 </it>(Figure <figr fid="F5">5</figr>). All introns have consensus dinucleotide borders except those downstream of the last variant of the cluster of alternative exons encoding part of the motor domain (homologs of exon 11 in <it>DmMhc1</it>), which have a GC dinucleotide at the 5' donor site instead of the consensus GT. The 3' exons of these alternatively spliced exons again have a consensus GT site. Exon '10a' of <it>AeaMhc3 </it>is almost identical to exon 10a of <it>AeaMhc1 </it>and the following intron also has a GC dinucleotide at the 5' donor site. In contrast to the introns following the exons 9 of <it>AmMhc1</it>, <it>NavMhc1</it>, and <it>BmMhc1</it>, and the intron following exon 10 of <it>PdcMhc1 </it>that have a consensus GT site, exon 10 of <it>TicMhc1 </it>has a GC 5' donor site. The intron following exon 11a of <it>DapMhc1 </it>starts with a consensus GT site, while the intron following exon 11b starts with the absolutely rare GA dinucleotide. Also, all split codons are shared between the genes.</p>
            <fig id="F5">
               <title>
                  <p>Figure 5</p>
               </title>
               <caption>
                  <p>Diagram of the arthropod Mhc1 proteins</p>
               </caption>
               <text>
                  <p><b>Diagram of the arthropod Mhc1 proteins</b>. The exon-intron structure of the <it>Mhc1 </it>genes is shown based on the protein sequence. Exons are shown as boxes while introns are represented by spaces. The same colour scheme has been used as in Figure 1. Numbers on alternative exons denote the number of variants. The exons are drawn that the intron positions align between the different <it>Mhc1 </it>genes. Thus, the exon lengths are not drawn to scale (e.g. the exons encoding the variable loop-2 are different in lengths). On the right side, the protein sequence of <it>Drosophila melanogaster </it>Mhc1 is shown as reference. Dotted lines connect amino acids that are derived from split codons.</p>
               </text>
               <graphic file="1471-2199-9-21-5"/>
            </fig>
            <p>In the part encoding the motor and the neck domain, all intron positions are shared by at least two genes (Figure <figr fid="F5">5</figr>). In the coiled-coil tail domain, all genes have lost several introns so that the exons are considerably longer and the intron positions in many cases are not identical. Assuming, that introns have in most cases been lost and were not gained during evolution <abbrgrp><abbr bid="B33">33</abbr></abbrgrp>, an ancient arthropod <it>Mhc1 </it>gene can be reconstructed (Figure <figr fid="F5">5</figr>). The ancient <it>Mhc1 </it>gene is expected to contain all intron positions that appear in at least one of the analysed <it>Mhc1 </it>genes. In the motor domain, the proposed ancient <it>Mhc1 </it>gene structure completely resembles the <it>DapMhc1 </it>gene. The exon lengths are between 30 and 210 bp. The exons in the tail domain are considerably longer (up to 480 bp).</p>
         </sec>
         <sec>
            <st>
               <p>Structural implications of the alternatively spliced exons</p>
            </st>
            <p>The locations of the alternatively spliced exons of <it>DmMhc1 </it>in the motor domain have been discussed in detail elsewhere <abbrgrp><abbr bid="B34">34</abbr></abbrgrp>. The position of the additional alternatively spliced exons of the <it>BmMhc1</it>, <it>TicMhc1</it>, <it>PdcMhc1</it>, and <it>DapMhc1 </it>genes in the structure of the motor domain are shown in Figure <figr fid="F6">6</figr>. The alternative exons of <it>DapMhc1 </it>encoding the structural part from the P-loop to loop-1 have identical P-loop sequences. The loop-1 sequences are identical in length but differ significantly in composition. Studies have shown that the flexibility of this loop affects the rate of ADP and phosphate release, with greater flexibility leading to an enhancement in the rate of product release <abbrgrp><abbr bid="B35">35</abbr></abbrgrp>. Although the amino acid composition is different between the alternative variants, both contain two glycines and a similar overall charge. The alternative exons of <it>DapMhc1 </it>including loop-4 are similar in length and composition. This region of the motor domain has not been investigated so far and therefore functional consequences of differences in the two variants cannot be drawn. Loop-4 has been postulated to be important for the proper localization of class-I myosins that contain elongated loops that sterically interact with actin-binding proteins <abbrgrp><abbr bid="B36">36</abbr></abbrgrp> but the loop-4 sequences are almost identical between the two <it>DapMhc1 </it>variants and the two variants must therefore modulate a different property of the motor domain. The loop-2 sequence is modulated by alternative exons in the <it>BmMhc1</it>, <it>DapMhc1</it>, <it>PdcMhc1</it>, and <it>TicMhc1 </it>genes. By studies of the <it>Dictyostelium </it>class-2 myosin with its loop-2 replaced with the analogous loop from four other myosins with different enzymatic activities, loop-2 was shown to be involved in the weak and the strong binding interactions with actin <abbrgrp><abbr bid="B37">37</abbr></abbrgrp>. It also plays an important role in the rate-limiting step of P<sub>i </sub>release <abbrgrp><abbr bid="B38">38</abbr><abbr bid="B39">39</abbr></abbrgrp>. The exon variants of the <it>BmMhc1</it>, <it>DapMhc1</it>, <it>PdcMhc1</it>, and <it>TicMhc1 </it>genes encoding the loop-2 sequence have identical numbers of lysine and arginine residues. The "a" variants are always one residue shorter and have only four instead of five glycines. These differences are, however, very subtle and their influence on actin binding is expected to be very small. The variants of the alternative exon in <it>DapMhc1 </it>following loop-2 are very similar. This part of the motor domain has also not been investigated so far.</p>
            <fig id="F6">
               <title>
                  <p>Figure 6</p>
               </title>
               <caption>
                  <p>Structure of the myosin motor domain</p>
               </caption>
               <text>
                  <p><b>Structure of the myosin motor domain</b>. The structure of the motor domain of the class-II myosin of <it>Dictyostelium discoideum </it>has been used to highlight the regions encoded by alternatively spliced exons in arthropod <it>Mhc1 </it>genes. The color-coding is the same as in Figure 1 allowing the identification of corresponding regions.</p>
               </text>
               <graphic file="1471-2199-9-21-6"/>
            </fig>
         </sec>
      </sec>
      <sec>
         <st>
            <p>Discussion</p>
         </st>
         <p>25 muscle myosin heavy chain genes have been identified in 22 species of the Arthropoda. All sequences share strong homology to the alternatively spliced <it>Mhc1 </it>gene that was first described in <it>Drosophila melanogaster </it><abbrgrp><abbr bid="B9">9</abbr></abbrgrp>. The genes contain five to nine clusters of mutually exclusive exons and an penultimate exon that might either be included or excluded in the mRNA, and were assigned by manual inspection of the genomic DNA sequences (Figure <figr fid="F1">1</figr>). Because of the many clusters of alternatively spliced exons automatic identification of all exons failed. This is probably also the main reason for the wrong prediction of the exon organisation of the Anopheles <it>Mhc1 </it>gene (supplementary material of <abbrgrp><abbr bid="B20">20</abbr></abbrgrp>).</p>
         <p>Altogether, alternative splicing of <it>Mhc1 </it>transcripts could result in several hundred differently spliced mRNAs (Table <tblr tid="T1">1</tblr>). The <it>Pediculus Mhc1 </it>gene has the least alternatives for its alternatively spliced exons resulting in a theoretical maximum of 32 different mRNAs, while the water flea gene could result in at least 3072 different mRNAs. Thus, except for <it>Pediculus</it>, <it>Nasonia</it>, and <it>Apis mellifera </it>all arthropod <it>Mhc1 </it>genes, for which all exons could be identified, outscore the 480 mRNA possibilities of <it>Drosophila melanogaster</it>. Although the number of possible transcripts seems vast compared to the number of different muscle myosin heavy chain genes in other metazoa species, the regions to modulate the function of the protein are limited to five to nine. In <it>Drosophila melanogaster</it>, all alternative exons are expressed depending on the developmental stage, but only a limited number of combinations seem to be employed <abbrgrp><abbr bid="B18">18</abbr></abbrgrp>. Whether all alternative exons are expressed in the other arthropod species and which combinations are used has yet to be determined.</p>
         <p>The phylogenetic analysis of the Mhc1 protein sequences agrees with the expected phylogenetic relationship between the species. There are two notable exceptions in the <it>Drosophila </it>species section of the tree. The <it>Dse</it>Mhc1 sequence would have been expected to be the closest relative of the <it>Dss</it>Mhc1 sequence, and the <it>Da</it>Mhc1 sequence would have been expected to separate after the split of the <it>Dp</it>Mhc1 and <it>Drp</it>Mhc1 sequences. There are two possible ways to explain this observation. Either, the <it>Mhc1 </it>genes have evolved asynchronously as has been found for many yeast genes <abbrgrp><abbr bid="B40">40</abbr></abbrgrp> or the genes might have incorporated back-mutations. The sequence identities of 96.1 to 99.7% are very high, and thus only a few mutations would lead to a different phylogenetic classification.</p>
         <p>The <it>Tribolium castaneum</it>, <it>Pediculus humanus corporis</it>, and <it>Bombyx mori Mhc1 </it>genes contain one additional and the <it>Daphnia pulex Mhc1 </it>gene contains four additional clusters of alternatively spliced exons compared to the <it>Drosophila melanogaster </it>gene (Figure <figr fid="F1">1</figr>, Figure <figr fid="F2">2</figr>). All additional alternatively spliced exons are mutually exclusive and encode parts of the motor domain. The additional exons of the <it>Tic</it>, <it>Pdc</it>, and <it>Bm Mhc1 </it>genes encode alternative versions of the loop-2 sequence while the additional exons of the <it>Dap Mhc1 </it>gene are spread over the entire motor domain. In each case, the 3' variant is more homologous to the corresponding sequences in the other <it>Mhc1 </it>genes than the 5' variant (Figure <figr fid="F2">2</figr>).</p>
         <p>A similar conservation is found for alternative exons with multiple variants (Figure <figr fid="F2">2</figr>). In almost all cases, the most 3' variant is the most conserved one. Of the cluster of alternative exons encoding part of the motor domain near the ATP-binding site (exon 7 in <it>DmMhc1</it>), the last of the variants is the only variant that is conserved in all species. The other variants are either missing in certain species, or are very similar to each other as well as to those of other species, so that it is not clear whether they have been derived from independent variant duplications or whether they were present in a common ancestor. Thus, all variants except for the most 3' variant evolved after the separation of <it>Daphnia </it>from the other species. The variants encoding the relay-helix and the relay-loop are highly conserved. Conserved differences confine to only one or two residues. The penultimate of the variants seems to be the most conserved, although mutation of one residue might change this. The exon encoding part of the converter domain has two highly conserved variants, the most 5' and the most 3' variants. The most 3' variant is distinguished from all other variants of this set of alternative exons at the DNA level because the following intron starts with a GT donor site. The most 5' exon is the most important, though not the only, determinant for flight capabilities <abbrgrp><abbr bid="B41">41</abbr><abbr bid="B42">42</abbr></abbrgrp>.</p>
         <p>Based on the exon-intron patterns of the 21 <it>Mhc1 </it>genes the gene structure of the ancient arthropod <it>Mhc1 </it>gene can be predicted. The prediction is based on the assumption that it is very unlikely that the different species, distributed over a broad taxonomic range, invented introns at the same positions independently from each other. In the first half of the genes encoding the motor and the neck domain, all intron positions are shared by at least two genes (Figure <figr fid="F5">5</figr>). The exons encoding the coiled-coil tail domain starting at amino acid 850 are considerably longer and the intron positions in almost all genes are not identical. It is highly probable that further sequencing of arthropod <it>Mhc1 </it>genes will reveal different exon-intron patterns in the tail region while intron positions with one or more of the already analysed genes will be shared. Comparing the intron rich <it>DapMhc1 </it>and <it>PdcMhc1 </it>genes with the mosquito and <it>Drosophila Mhc1 </it>genes, it is apparent that intron loss is a major determinant of arthropod <it>Mhc1 </it>gene evolution. Loss of intron events have also been found for many other arthropod genes <abbrgrp><abbr bid="B33">33</abbr></abbrgrp>. However, as long as data from further arthropod species is missing, it cannot be excluded that some of the introns in the tail region, that are not shared between the analyzed arthropods, have been gained during evolution. Very recently, an analysis of eleven <it>Drosophila </it>genomes showed, that a small number of introns have been gained in these species <abbrgrp><abbr bid="B43">43</abbr></abbrgrp>. The ancient <it>Mhc1 </it>gene is expected to contain all intron positions that appear in at least one of the analysed <it>Mhc1 </it>genes. Analysis of <it>Mhc1 </it>genes of further species might add additional intron positions especially in the tail region. The exon lengths of the ancient <it>Mhc1 </it>gene are between 30 and 210 bp in the motor domain and up to 480 bp in the tail region. These short exons (compared to e.g. the <it>Drosophila Mhc1 </it>gene) resemble exon lengths in vertebrates and further comparative analysis with vertebrate muscle myosin heavy chain genes will reveal the gene structure of the ancient Metazoa gene.</p>
         <p>In addition to the <it>Mhc1 </it>gene, <it>Aedes aegypti </it>encodes a further muscle myosin heavy chain gene, named <it>Mhc3 </it>that encodes only one variant of each of the alternatively spliced exons of the <it>Mhc1 </it>gene. The presence of this gene is not an artefact from sequencing or the assembly process. Both genes, <it>Mhc1 </it>and <it>Mhc3</it>, are very different at the DNA level, and both are confirmed by several EST clones, although the translated exons show high identities. That also means, that the <it>Mhc3 </it>gene, that does not encode any alternatively spliced exons, is expressed during the life cycle of <it>Aedes aegypti</it>. However, there is not enough data that shows that the <it>Mhc3 </it>gene is expressed in a biological important (e.g. muscle-specific) manner. Note that the combination of alternatively spliced exons does not correspond to any of the tissue-specific combinations found in <it>Drosophila </it><abbrgrp><abbr bid="B18">18</abbr></abbrgrp>. The <it>Culex pipiens quinquefasciatus </it>genome contains another two muscle myosin heavy chain genes in addition to the <it>Mhc1 </it>gene, named <it>Mhc3 </it>and <it>Mhc4</it>, that, similarly to <it>AeaMhc3</it>, encode only one variant of most of the alternatively spliced exons of the <it>Mhc1 </it>gene. In one case, the intron between the presumed variant of the alternatively spliced exons and the following constitutive exon disappeared. Unfortunately, there is not enough EST data available for <it>Culex pipiens quinquefasciatus </it>to support any of the myosin heavy chain genes. <it>AeaMhc3</it>, <it>CpqMhc3</it>, and <it>CpqMhc4 </it>retained the same variants of the alternative exons of the corresponding <it>Mhc1 </it>genes. The presence of these further muscle myosin heavy chain genes is very surprising because the number of alternatively spliced exons in the <it>Mhc1 </it>genes already allows for the transcription of several hundred different muscle myosin isoforms. How could it happen that the genomes of <it>Aedes aegypti </it>and <it>Culex pipiens quinquefasciatus </it>encode such genes? According to the phylogenetic tree of the myosin heavy chain genes, the <it>Mhc3 </it>and <it>Mhc4 </it>genes obviously appeared in the common ancester of <it>Aedes </it>and <it>Culex </it>after the divergence from <it>Anopheles gambiae</it>. In addition, there is no evidence for a (partial) second muscle myosin heavy chain gene in the <it>Anopheles gambiae </it>genome. Also, the carboxy-terminal ends of <it>AeaMhc3 </it>and <it>CpqMhc4</it>, that are 3' elongations of the last constitutive exon, do not exist in the <it>AeaMhc1 </it>and <it>CpqMhc1 </it>genes but have an identical counterpart in the <it>AngMhc1 </it>gene that is also supported by several EST clones. It is unlikely that these three organisms have developed such a carboxy-terminal end of the myosin gene independently from each other. Instead, it is more probable that the ancient <it>AeaMhc1 </it>and <it>CpqMhc1 </it>genes have lost this specific carboxy-terminus after incorporation of the <it>Mhc3 </it>and <it>Mhc4 </it>genes into the genome. This would mean that this carboxy-terminus is only used in the specific combination of alternatively spliced exons as found in the <it>AeaMhc3 </it>and <it>CpqMhc4 </it>genes. Whether this is also true for the <it>AngMhc1 </it>gene has to be verified. Based on their identity in sequence and gene structure it is most probable that <it>CpqMhc3 </it>has been derived by gene duplication of <it>CpqMhc4 </it>or <it>CpqMhc4 </it>is a duplication of <it>CpqMhc3</it>.</p>
         <p>There are two possibilities as to how the <it>Mhc3 </it>and <it>Mhc4 </it>genes could have appeared in the common ancestor of <it>Aedes </it>and <it>Culex</it>. The genes have either been derived from a duplication of the <it>Mhc1 </it>gene as part of a single gene or chromosomal region duplication event. Or, a partially spliced transcript of <it>Mhc1 </it>has been reincorporated into the genome (Figure <figr fid="F7">7</figr>). If the <it>Mhc3 </it>and <it>Mhc4 </it>genes had been derived from duplication, then all variants except one of the alternative exons of only one of the (then) two <it>Mhc </it>genes had to be lost in addition to the loss of both terminal exons in <it>Mhc3</it>. Given the number of possible transcripts of the <it>Mhc1 </it>gene and the possibility to duplicate alternative exons, it is very unlikely that there would be a need for a second gene with the same set of alternative exons. If it were advantageous to keep two almost identical genes, it would be very unlikely that only one of the genes has lost all except one of its alternative exons. In addition, there must have been a very strong evolutionary pressure to keep exactly this special combination of alternative exons. The second possibility would mean that in the first step during the splicing process all alternatively spliced exons, which are not needed, are removed leaving introns between the remaining alternatively spliced and constitutive exons (Figure <figr fid="F7">7</figr>). In the second step, all introns are spliced to yield the mRNA for translation. In the case of the <it>Mhc3 </it>and <it>Mhc4 </it>genes, the transcript containing one combination of alternative exons but all introns would have been integrated into the genome, probably after retrotranscription. How should this type of genes be called? At least the <it>AeaMhc3 </it>gene is completely transcribed, and also <it>CpqMhc3 </it>and <it>CpqMhc4 </it>do not contain any premature stop codons or frameshift mutations. However, compared to the corresponding <it>Mhc1 </it>genes they retained only one variant exon of each of the alternative exons. Thus, they do not belong to the non-processed pseudogenes. We would rather regard them as a new type of partially processed pseudogenes.</p>
         <fig id="F7">
            <title>
               <p>Figure 7</p>
            </title>
            <caption>
               <p>Model for the process of alternative splicing</p>
            </caption>
            <text>
               <p><b>Model for the process of alternative splicing</b>. The model describes the three different origins of pseudogenes. Non-processed pseudogenes are often found adjacent to their paralogous functional gene and retain the same exon-intron structure. Processed pseudogenes are marked by the absence of both 5' promotor sequence and introns, the presence of flanking direct repeats, and are randomly integrated into the genome. In the case of the arthropod <it>Mhc </it>genes, these get in the first step transcribed. In a second step, the alternative exons get spliced resulting in a certain combination of alternative exons and retaining the exon-intron structure. In the case of <it>AeaMhc3</it>, <it>CpqMhc3</it>, and <it>CpqMhc4</it>, these transcripts have been integrated into the genome. Normally in a third step, the introns get spliced revealing the final mRNA ready for translation. Dark grey bars represent constitutive and coloured bars alternatively spliced exons. Light grey bars represent non-coding sequence.</p>
            </text>
            <graphic file="1471-2199-9-21-7"/>
         </fig>
      </sec>
      <sec>
         <st>
            <p>Conclusion</p>
         </st>
         <p>25 arthropod muscle myosin heavy chain genes have been identified and analysed. Compared to the well-studied gene of <it>Drosophila melanogaster </it>other arthropod genes might contain up to four additional alternatively spliced exons encoding part of the motor domain. This considerably extends the possibilities of other arthropod species to fine-tune myosin and thus muscle characteristics. An ancient arthropod muscle myosin heavy chain gene has been reconstructed whose gene structure can best be explained if introns are lost and not gained during evolution of this gene. <it>Aedes aegypti </it>and <it>Culex pipiens quinquefasciatus </it>even encode further muscle myosin heavy chain genes that, however, have lost all except one variant of the alternatively spliced exons. These genes most probably entered the genome by reincorporating a certain processed transcript and not via a gene or genomic region duplication event. If the gene has been derived from a processed transcript then splicing of alternative exons must involve a first step, in which all other variants are spliced out leaving intronic sequence around the variant of choice. In a second step, all introns are spliced.</p>
      </sec>
      <sec>
         <st>
            <p>Methods</p>
         </st>
         <sec>
            <st>
               <p>Identification and annotation of the arthropod muscle myosin heavy chains</p>
            </st>
            <p>The genes for <it>Aea</it>, <it>Ang</it>, <it>Am</it>, <it>Bm</it>, <it>Cpq</it>, <it>Dm</it>, <it>Drp</it>, <it>Dp</it>, <it>Dse</it>, <it>Dss</it>, <it>Dy</it>, <it>Dw</it>, <it>Pdc</it>, and <it>Tic Mhc1 </it>and <it>Mhc3 </it>have been obtained by TBLASTN searches against the insects section of the NCBI wgs database (Table <tblr tid="T1">1</tblr>)<abbrgrp><abbr bid="B44">44</abbr></abbrgrp>. The genes for the <it>Da</it>, <it>Der</it>, <it>Dg</it>, <it>Dmo</it>, and <it>Dv Mhc1 </it>have been obtained using the BLAT alignment tool <abbrgrp><abbr bid="B45">45</abbr></abbrgrp> against the UCSC Genome Browser database <abbrgrp><abbr bid="B46">46</abbr><abbr bid="B47">47</abbr></abbrgrp>. The <it>DhMhc1 </it>sequence was derived from the NCBI nonredundant database. The <it>DapMhc1 </it>sequence has been obtained by a TBLASTN search against the 9&#215; assembly of the <it>Daphnia pulex </it>genome provided by the DOE Joint Genome Institute <abbrgrp><abbr bid="B48">48</abbr></abbrgrp> and the <it>Daphnia </it>Genomics Consortium <abbrgrp><abbr bid="B49">49</abbr></abbrgrp>. The <it>NavMhc1 </it>gene was derived from version 1.0 of the <it>Nasonia vitripennis </it>assembly provided by the Human Genome Sequencing Center at Baylor College of Medicine <abbrgrp><abbr bid="B50">50</abbr></abbrgrp>. The exons of the genes were predicted by manual inspection of the nucleotide sequences. For the correct prediction of the transcriptional start and the 3' terminal exons, the analysis of cDNA and EST data, that has been obtained from the EST section of NCBI's nucleotide database, was necessary. In particular, the following data has been obtained: For <it>TicMhc1</it>, only a small amount of EST data is available, confirming the prediction of exon2. There is not enough data to exclude a further untranslated 5' exon, as well as further C-terminal exons. For <it>AngMhc1</it>, several EST and cDNA clones support exon1 and the different C-termini. The C-termini of <it>AeaMhc1 </it>are also supported by several EST clones (e.g. GenBank ID <ext-link ext-link-type="gen" ext-link-id="DV384821">DV384821</ext-link>). Exon1 of <it>AeaMhc3 </it>is supported by EST data. Exon1 of <it>AeaMhc3 </it>has been used for the identification of exon1 of <it>AeaMhc1</it>, as there is no direct evidence by EST data. Surprisingly, it is found 26,432 bp before the translation start codon ATG. For <it>AmMhc1</it>, the N-terminus is not supported by EST or cDNA data. Therefore it is not clear whether there might be an additional 5' untranslated exon. The C-termini are supported by several EST and cDNA clones (e.g. GenBank ID <ext-link ext-link-type="gen" ext-link-id="CK629939">CK629939</ext-link>). The C-terminus of <it>DapMhc1 </it>is supported by EST data (e.g. GenBank ID <ext-link ext-link-type="gen" ext-link-id="BJ927473">BJ927473</ext-link>), while there is no EST data for the N-terminus. For <it>BmMhc1</it>, exon2 is supported by EST data. However, the corresponding EST clones are not long enough to exclude a further 5' untranslated exon. Both C-termini of <it>BmMhc1 </it>are supported by EST clones (e.g. GenBank ID <ext-link ext-link-type="gen" ext-link-id="BP179837">BP179837</ext-link>). The genomic DNA of the <it>BmMhc1 </it>gene contains a gap in the coiled-coil tail region. The missing amino acid sequence has been derived from EST data. However, the exon/intron structure in the corresponding region remains unresolved.</p>
         </sec>
         <sec>
            <st>
               <p>Analysis of the relationship of the alternatively spliced exons</p>
            </st>
            <p>All alternatively spliced exons have been aligned manually. Some kind of relationship is already obvious from these sequence alignments. To get a more quantitative description, sequence identity matrices have been calculated for each set of aligned exons. Subsequently, sets of homologous exons from all <it>Mhc1 </it>genes have been clustered by sequence similarity. We have visualized the results in graphs that have to be read in columns. The highest identity between an exon listed on top and any variant of a certain Mhc1 protein listed on the left side has been set to 1 (red colour) while the differences between the values of the lower identity exons and the value of the highest identity have been plotted for the other combinations of exons. Thus, in every column the highest identity of the named exon to one of the variants of the other Mhc1 proteins is visualized.</p>
         </sec>
         <sec>
            <st>
               <p>Building trees</p>
            </st>
            <p>The phylogenetic tree was generated using neighbour joining and the Bootstrap (1,000 replicates) method as implemented in ClustalW (standard settings) <abbrgrp><abbr bid="B51">51</abbr></abbrgrp> and drawn by using TreeView <abbrgrp><abbr bid="B52">52</abbr></abbrgrp>. The sequence of <it>DapMhc1 </it>has been used as outgroup.</p>
         </sec>
      </sec>
      <sec>
         <st>
            <p>List of abbreviations</p>
         </st>
         <p>Mhc, myosin heavy chain; for abbreviations of species names see Table <tblr tid="T1">1</tblr>.</p>
      </sec>
      <sec>
         <st>
            <p>Authors' contributions</p>
         </st>
         <p>F.O. performed data analysis. M.K. assembled all sequences, performed data analysis and wrote the manuscript. Both authors read and approved the manuscript.</p>
      </sec>
   </bdy>
   <bm>
      <ack>
         <sec>
            <st>
               <p>Acknowledgements</p>
            </st>
            <p>M.K. was supported by a Liebig Stipendium of the Fonds der Chemischen Industrie, which was in part financed by the BMBF. This work has been funded by grant I80798 of the VolkswagenStiftung and grant KO 2251/3-1 of the Deutsche Forschungsgemeinschaft. We thank the DOE Joint Genome Institute <abbrgrp><abbr bid="B48">48</abbr></abbrgrp> and the <it>Daphnia </it>Genomics Consortium <abbrgrp><abbr bid="B49">49</abbr></abbrgrp> for providing access to the assembly of the <it>Daphnia pulex </it>genome, and the Human Genome Sequencing Center at Baylor College of Medicine for providing access to the assembly of the <it>Nasonia vitripennis </it>genome preliminary to publication. Also, we would like to thank all the reviewers for their very helpful and thoughtful comments that improved the manuscript considerably.</p>
         </sec>
      </ack>
      <refgrp>
         <bibl id="B1">
            <title>
               <p>Alternative splicing: increasing diversity in the proteomic world</p>
            </title>
            <aug>
               <au>
                  <snm>Graveley</snm>
                  <fnm>BR</fnm>
               </au>
            </aug>
            <source>Trends Genet</source>
            <pubdate>2001</pubdate>
            <volume>17</volume>
            <issue>2</issue>
            <fpage>100</fpage>
            <lpage>107</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/S0168-9525(00)02176-4</pubid>
                  <pubid idtype="pmpid" link="fulltext">11173120</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B2">
            <title>
               <p>Protein diversity from alternative splicing: a challenge for bioinformatics and post-genome biology</p>
            </title>
            <aug>
               <au>
                  <snm>Black</snm>
                  <fnm>DL</fnm>
               </au>
            </aug>
            <source>Cell</source>
            <pubdate>2000</pubdate>
            <volume>103</volume>
            <issue>3</issue>
            <fpage>367</fpage>
            <lpage>370</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/S0092-8674(00)00128-8</pubid>
                  <pubid idtype="pmpid" link="fulltext">11081623</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B3">
            <title>
               <p>ASD: the Alternative Splicing Database</p>
            </title>
            <aug>
               <au>
                  <snm>Thanaraj</snm>
                  <fnm>TA</fnm>
               </au>
               <au>
                  <snm>Stamm</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Clark</snm>
                  <fnm>F</fnm>
               </au>
               <au>
                  <snm>Riethoven</snm>
                  <fnm>JJ</fnm>
               </au>
               <au>
                  <snm>Le Texier</snm>
                  <fnm>V</fnm>
               </au>
               <au>
                  <snm>Muilu</snm>
                  <fnm>J</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>2004</pubdate>
            <volume>32</volume>
            <issue>Database issue</issue>
            <fpage>D64</fpage>
            <lpage>9</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">308764</pubid>
                  <pubid idtype="pmpid" link="fulltext">14681360</pubid>
                  <pubid idtype="doi">10.1093/nar/gkh030</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B4">
            <title>
               <p>Origin of alternative splicing by tandem exon duplication</p>
            </title>
            <aug>
               <au>
                  <snm>Kondrashov</snm>
                  <fnm>FA</fnm>
               </au>
               <au>
                  <snm>Koonin</snm>
                  <fnm>EV</fnm>
               </au>
            </aug>
            <source>Hum Mol Genet</source>
            <pubdate>2001</pubdate>
            <volume>10</volume>
            <issue>23</issue>
            <fpage>2661</fpage>
            <lpage>2669</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/hmg/10.23.2661</pubid>
                  <pubid idtype="pmpid" link="fulltext">11726553</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B5">
            <title>
               <p>Variable window binding for mutually exclusive alternative splicing</p>
            </title>
            <aug>
               <au>
                  <snm>Anastassiou</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Liu</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Varadan</snm>
                  <fnm>V</fnm>
               </au>
            </aug>
            <source>Genome Biol</source>
            <pubdate>2006</pubdate>
            <volume>7</volume>
            <issue>1</issue>
            <fpage>R2</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1431710</pubid>
                  <pubid idtype="pmpid" link="fulltext">16507134</pubid>
                  <pubid idtype="doi">10.1186/gb-2006-7-1-r2</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B6">
            <title>
               <p>Mutually exclusive splicing of the insect Dscam pre-mRNA directed by competing intronic RNA secondary structures</p>
            </title>
            <aug>
               <au>
                  <snm>Graveley</snm>
                  <fnm>BR</fnm>
               </au>
            </aug>
            <source>Cell</source>
            <pubdate>2005</pubdate>
            <volume>123</volume>
            <issue>1</issue>
            <fpage>65</fpage>
            <lpage>73</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/j.cell.2005.07.028</pubid>
                  <pubid idtype="pmpid" link="fulltext">16213213</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B7">
            <title>
               <p>Common exon duplication in animals and its role in alternative splicing</p>
            </title>
            <aug>
               <au>
                  <snm>Letunic</snm>
                  <fnm>I</fnm>
               </au>
               <au>
                  <snm>Copley</snm>
                  <fnm>RR</fnm>
               </au>
               <au>
                  <snm>Bork</snm>
                  <fnm>P</fnm>
               </au>
            </aug>
            <source>Hum Mol Genet</source>
            <pubdate>2002</pubdate>
            <volume>11</volume>
            <issue>13</issue>
            <fpage>1561</fpage>
            <lpage>1567</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/hmg/11.13.1561</pubid>
                  <pubid idtype="pmpid" link="fulltext">12045209</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B8">
            <title>
               <p>The organization and evolution of the dipteran and hymenopteran Down syndrome cell adhesion molecule (Dscam) genes</p>
            </title>
            <aug>
               <au>
                  <snm>Graveley</snm>
                  <fnm>BR</fnm>
               </au>
               <au>
                  <snm>Kaur</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Gunning</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Zipursky</snm>
                  <fnm>SL</fnm>
               </au>
               <au>
                  <snm>Rowen</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Clemens</snm>
                  <fnm>JC</fnm>
               </au>
            </aug>
            <source>Rna</source>
            <pubdate>2004</pubdate>
            <volume>10</volume>
            <issue>10</issue>
            <fpage>1499</fpage>
            <lpage>1506</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1370636</pubid>
                  <pubid idtype="pmpid" link="fulltext">15383675</pubid>
                  <pubid idtype="doi">10.1261/rna.7105504</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B9">
            <title>
               <p>Functional domains of the Drosophila melanogaster muscle myosin heavy-chain gene are encoded by alternatively spliced exons</p>
            </title>
            <aug>
               <au>
                  <snm>George</snm>
                  <fnm>EL</fnm>
               </au>
               <au>
                  <snm>Ober</snm>
                  <fnm>MB</fnm>
               </au>
               <au>
                  <snm>Emerson</snm>
                  <fnm>CP</fnm>
                  <suf>Jr.</suf>
               </au>
            </aug>
            <source>Mol Cell Biol</source>
            <pubdate>1989</pubdate>
            <volume>9</volume>
            <issue>7</issue>
            <fpage>2957</fpage>
            <lpage>2974</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">362764</pubid>
                  <pubid idtype="pmpid" link="fulltext">2506434</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B10">
            <title>
               <p>The molecular motor toolbox for intracellular transport</p>
            </title>
            <aug>
               <au>
                  <snm>Vale</snm>
                  <fnm>RD</fnm>
               </au>
            </aug>
            <source>Cell</source>
            <pubdate>2003</pubdate>
            <volume>112</volume>
            <issue>4</issue>
            <fpage>467</fpage>
            <lpage>480</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/S0092-8674(03)00111-9</pubid>
                  <pubid idtype="pmpid" link="fulltext">12600311</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B11">
            <title>
               <p>Molecular motors</p>
            </title>
            <aug>
               <au>
                  <snm>Schliwa</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Woehlke</snm>
                  <fnm>G</fnm>
               </au>
            </aug>
            <source>Nature</source>
            <pubdate>2003</pubdate>
            <volume>422</volume>
            <issue>6933</issue>
            <fpage>759</fpage>
            <lpage>765</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/nature01601</pubid>
                  <pubid idtype="pmpid" link="fulltext">12700770</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B12">
            <title>
               <p>Drawing the tree of eukaryotic life based on the analysis of 2269 manually annotated myosins from 328 species</p>
            </title>
            <aug>
               <au>
                  <snm>Odronitz</snm>
                  <fnm>F</fnm>
               </au>
               <au>
                  <snm>Kollmar</snm>
                  <fnm>M</fnm>
               </au>
            </aug>
            <source>Genome Biol</source>
            <pubdate>2007</pubdate>
            <volume>8</volume>
            <issue>9</issue>
            <fpage>R196</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1186/gb-2007-8-9-r196</pubid>
                  <pubid idtype="pmpid" link="fulltext">17877792</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B13">
            <title>
               <p>A millennial myosin census</p>
            </title>
            <aug>
               <au>
                  <snm>Berg</snm>
                  <fnm>JS</fnm>
               </au>
               <au>
                  <snm>Powell</snm>
                  <fnm>BC</fnm>
               </au>
               <au>
                  <snm>Cheney</snm>
                  <fnm>RE</fnm>
               </au>
            </aug>
            <source>Mol Biol Cell</source>
            <pubdate>2001</pubdate>
            <volume>12</volume>
            <issue>4</issue>
            <fpage>780</fpage>
            <lpage>794</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">32266</pubid>
                  <pubid idtype="pmpid" link="fulltext">11294886</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B14">
            <title>
               <p>Tails of unconventional myosins</p>
            </title>
            <aug>
               <au>
                  <snm>Oliver</snm>
                  <fnm>TN</fnm>
               </au>
               <au>
                  <snm>Berg</snm>
                  <fnm>JS</fnm>
               </au>
               <au>
                  <snm>Cheney</snm>
                  <fnm>RE</fnm>
               </au>
            </aug>
            <source>Cell Mol Life Sci</source>
            <pubdate>1999</pubdate>
            <volume>56</volume>
            <issue>3-4</issue>
            <fpage>243</fpage>
            <lpage>257</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1007/s000180050426</pubid>
                  <pubid idtype="pmpid" link="fulltext">11212352</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B15">
            <title>
               <p>Introduction</p>
            </title>
            <aug>
               <au>
                  <snm>Holmes</snm>
                  <fnm>KC</fnm>
               </au>
            </aug>
            <source>Philos Trans R Soc Lond B Biol Sci</source>
            <pubdate>2004</pubdate>
            <volume>359</volume>
            <issue>1452</issue>
            <fpage>1813</fpage>
            <lpage>1818</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1693463</pubid>
                  <pubid idtype="pmpid" link="fulltext">15647157</pubid>
                  <pubid idtype="doi">10.1098/rstb.2004.1581</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B16">
            <title>
               <p>Identification and analysis of the myosin superfamily in Drosophila: a database approach</p>
            </title>
            <aug>
               <au>
                  <snm>Yamashita</snm>
                  <fnm>RA</fnm>
               </au>
               <au>
                  <snm>Sellers</snm>
                  <fnm>JR</fnm>
               </au>
               <au>
                  <snm>Anderson</snm>
                  <fnm>JB</fnm>
               </au>
            </aug>
            <source>J Muscle Res Cell Motil</source>
            <pubdate>2000</pubdate>
            <volume>21</volume>
            <issue>6</issue>
            <fpage>491</fpage>
            <lpage>505</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1023/A:1026589626422</pubid>
                  <pubid idtype="pmpid" link="fulltext">11206129</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B17">
            <title>
               <p>A genomewide survey of developmentally relevant genes in Ciona intestinalis. IX. Genes for muscle structural proteins</p>
            </title>
            <aug>
               <au>
                  <snm>Chiba</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Awazu</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Itoh</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Chin-Bow</snm>
                  <fnm>ST</fnm>
               </au>
               <au>
                  <snm>Satoh</snm>
                  <fnm>N</fnm>
               </au>
               <au>
                  <snm>Satou</snm>
                  <fnm>Y</fnm>
               </au>
               <au>
                  <snm>Hastings</snm>
                  <fnm>KE</fnm>
               </au>
            </aug>
            <source>Dev Genes Evol</source>
            <pubdate>2003</pubdate>
            <volume>213</volume>
            <issue>5-6</issue>
            <fpage>291</fpage>
            <lpage>302</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1007/s00427-003-0324-x</pubid>
                  <pubid idtype="pmpid" link="fulltext">12740698</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B18">
            <title>
               <p>Spatially and temporally regulated expression of myosin heavy chain alternative exons during Drosophila embryogenesis</p>
            </title>
            <aug>
               <au>
                  <snm>Zhang</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Bernstein</snm>
                  <fnm>SI</fnm>
               </au>
            </aug>
            <source>Mech Dev</source>
            <pubdate>2001</pubdate>
            <volume>101</volume>
            <issue>1-2</issue>
            <fpage>35</fpage>
            <lpage>45</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/S0925-4773(00)00549-9</pubid>
                  <pubid idtype="pmpid" link="fulltext">11231057</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B19">
            <title>
               <p>The genome sequence of Drosophila melanogaster</p>
            </title>
            <aug>
               <au>
                  <snm>Adams</snm>
                  <fnm>MD</fnm>
               </au>
               <au>
                  <snm>Celniker</snm>
                  <fnm>SE</fnm>
               </au>
               <au>
                  <snm>Holt</snm>
                  <fnm>RA</fnm>
               </au>
               <au>
                  <snm>Evans</snm>
                  <fnm>CA</fnm>
               </au>
               <au>
                  <snm>Gocayne</snm>
                  <fnm>JD</fnm>
               </au>
               <au>
                  <snm>Amanatides</snm>
                  <fnm>PG</fnm>
               </au>
               <au>
                  <snm>Scherer</snm>
                  <fnm>SE</fnm>
               </au>
               <au>
                  <snm>Li</snm>
                  <fnm>PW</fnm>
               </au>
               <au>
                  <snm>Hoskins</snm>
                  <fnm>RA</fnm>
               </au>
               <au>
                  <snm>Galle</snm>
                  <fnm>RF</fnm>
               </au>
               <au>
                  <snm>George</snm>
                  <fnm>RA</fnm>
               </au>
               <au>
                  <snm>Lewis</snm>
                  <fnm>SE</fnm>
               </au>
               <au>
                  <snm>Richards</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Ashburner</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Henderson</snm>
                  <fnm>SN</fnm>
               </au>
               <au>
                  <snm>Sutton</snm>
                  <fnm>GG</fnm>
               </au>
               <au>
                  <snm>Wortman</snm>
                  <fnm>JR</fnm>
               </au>
               <au>
                  <snm>Yandell</snm>
                  <fnm>MD</fnm>
               </au>
               <au>
                  <snm>Zhang</snm>
                  <fnm>Q</fnm>
               </au>
               <au>
                  <snm>Chen</snm>
                  <fnm>LX</fnm>
               </au>
               <au>
                  <snm>Brandon</snm>
                  <fnm>RC</fnm>
               </au>
               <au>
                  <snm>Rogers</snm>
                  <fnm>YH</fnm>
               </au>
               <au>
                  <snm>Blazej</snm>
                  <fnm>RG</fnm>
               </au>
               <au>
                  <snm>Champe</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Pfeiffer</snm>
                  <fnm>BD</fnm>
               </au>
               <au>
                  <snm>Wan</snm>
                  <fnm>KH</fnm>
               </au>
               <au>
                  <snm>Doyle</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Baxter</snm>
                  <fnm>EG</fnm>
               </au>
               <au>
                  <snm>Helt</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Nelson</snm>
                  <fnm>CR</fnm>
               </au>
               <au>
                  <snm>Gabor</snm>
                  <fnm>GL</fnm>
               </au>
               <au>
                  <snm>Abril</snm>
                  <fnm>JF</fnm>
               </au>
               <au>
                  <snm>Agbayani</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>An</snm>
                  <fnm>HJ</fnm>
               </au>
               <au>
                  <snm>Andrews-Pfannkoch</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Baldwin</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Ballew</snm>
                  <fnm>RM</fnm>
               </au>
               <au>
                  <snm>Basu</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Baxendale</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Bayraktaroglu</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Beasley</snm>
                  <fnm>EM</fnm>
               </au>
               <au>
                  <snm>Beeson</snm>
                  <fnm>KY</fnm>
               </au>
               <au>
                  <snm>Benos</snm>
                  <fnm>PV</fnm>
               </au>
               <au>
                  <snm>Berman</snm>
                  <fnm>BP</fnm>
               </au>
               <au>
                  <snm>Bhandari</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Bolshakov</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Borkova</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Botchan</snm>
                  <fnm>MR</fnm>
               </au>
               <au>
                  <snm>Bouck</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Brokstein</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Brottier</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Burtis</snm>
                  <fnm>KC</fnm>
               </au>
               <au>
                  <snm>Busam</snm>
                  <fnm>DA</fnm>
               </au>
               <au>
                  <snm>Butler</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Cadieu</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Center</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Chandra</snm>
                  <fnm>I</fnm>
               </au>
               <au>
                  <snm>Cherry</snm>
                  <fnm>JM</fnm>
               </au>
               <au>
                  <snm>Cawley</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Dahlke</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Davenport</snm>
                  <fnm>LB</fnm>
               </au>
               <au>
                  <snm>Davies</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>de Pablos</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Delcher</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Deng</snm>
                  <fnm>Z</fnm>
               </au>
               <au>
                  <snm>Mays</snm>
                  <fnm>AD</fnm>
               </au>
               <au>
                  <snm>Dew</snm>
                  <fnm>I</fnm>
               </au>
               <au>
                  <snm>Dietz</snm>
                  <fnm>SM</fnm>
               </au>
               <au>
                  <snm>Dodson</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Doup</snm>
                  <fnm>LE</fnm>
               </au>
               <au>
                  <snm>Downes</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Dugan-Rocha</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Dunkov</snm>
                  <fnm>BC</fnm>
               </au>
               <au>
                  <snm>Dunn</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Durbin</snm>
                  <fnm>KJ</fnm>
               </au>
               <au>
                  <snm>Evangelista</snm>
                  <fnm>CC</fnm>
               </au>
               <au>
                  <snm>Ferraz</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Ferriera</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Fleischmann</snm>
                  <fnm>W</fnm>
               </au>
               <au>
                  <snm>Fosler</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Gabrielian</snm>
                  <fnm>AE</fnm>
               </au>
               <au>
                  <snm>Garg</snm>
                  <fnm>NS</fnm>
               </au>
               <au>
                  <snm>Gelbart</snm>
                  <fnm>WM</fnm>
               </au>
               <au>
                  <snm>Glasser</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Glodek</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Gong</snm>
                  <fnm>F</fnm>
               </au>
               <au>
                  <snm>Gorrell</snm>
                  <fnm>JH</fnm>
               </au>
               <au>
                  <snm>Gu</snm>
                  <fnm>Z</fnm>
               </au>
               <au>
                  <snm>Guan</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Harris</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Harris</snm>
                  <fnm>NL</fnm>
               </au>
               <au>
                  <snm>Harvey</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Heiman</snm>
                  <fnm>TJ</fnm>
               </au>
               <au>
                  <snm>Hernandez</snm>
                  <fnm>JR</fnm>
               </au>
               <au>
                  <snm>Houck</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Hostin</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Houston</snm>
                  <fnm>KA</fnm>
               </au>
               <au>
                  <snm>Howland</snm>
                  <fnm>TJ</fnm>
               </au>
               <au>
                  <snm>Wei</snm>
                  <fnm>MH</fnm>
               </au>
               <au>
                  <snm>Ibegwam</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Jalali</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Kalush</snm>
                  <fnm>F</fnm>
               </au>
               <au>
                  <snm>Karpen</snm>
                  <fnm>GH</fnm>
               </au>
               <au>
                  <snm>Ke</snm>
                  <fnm>Z</fnm>
               </au>
               <au>
                  <snm>Kennison</snm>
                  <fnm>JA</fnm>
               </au>
               <au>
                  <snm>Ketchum</snm>
                  <fnm>KA</fnm>
               </au>
               <au>
                  <snm>Kimmel</snm>
                  <fnm>BE</fnm>
               </au>
               <au>
                  <snm>Kodira</snm>
                  <fnm>CD</fnm>
               </au>
               <au>
                  <snm>Kraft</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Kravitz</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Kulp</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Lai</snm>
                  <fnm>Z</fnm>
               </au>
               <au>
                  <snm>Lasko</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Lei</snm>
                  <fnm>Y</fnm>
               </au>
               <au>
                  <snm>Levitsky</snm>
                  <fnm>AA</fnm>
               </au>
               <au>
                  <snm>Li</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Li</snm>
                  <fnm>Z</fnm>
               </au>
               <au>
                  <snm>Liang</snm>
                  <fnm>Y</fnm>
               </au>
               <au>
                  <snm>Lin</snm>
                  <fnm>X</fnm>
               </au>
               <au>
                  <snm>Liu</snm>
                  <fnm>X</fnm>
               </au>
               <au>
                  <snm>Mattei</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>McIntosh</snm>
                  <fnm>TC</fnm>
               </au>
               <au>
                  <snm>McLeod</snm>
                  <fnm>MP</fnm>
               </au>
               <au>
                  <snm>McPherson</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Merkulov</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Milshina</snm>
                  <fnm>NV</fnm>
               </au>
               <au>
                  <snm>Mobarry</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Morris</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Moshrefi</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Mount</snm>
                  <fnm>SM</fnm>
               </au>
               <au>
                  <snm>Moy</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Murphy</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Murphy</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Muzny</snm>
                  <fnm>DM</fnm>
               </au>
               <au>
                  <snm>Nelson</snm>
                  <fnm>DL</fnm>
               </au>
               <au>
                  <snm>Nelson</snm>
                  <fnm>DR</fnm>
               </au>
               <au>
                  <snm>Nelson</snm>
                  <fnm>KA</fnm>
               </au>
               <au>
                  <snm>Nixon</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Nusskern</snm>
                  <fnm>DR</fnm>
               </au>
               <au>
                  <snm>Pacleb</snm>
                  <fnm>JM</fnm>
               </au>
               <au>
                  <snm>Palazzolo</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Pittman</snm>
                  <fnm>GS</fnm>
               </au>
               <au>
                  <snm>Pan</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Pollard</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Puri</snm>
                  <fnm>V</fnm>
               </au>
               <au>
                  <snm>Reese</snm>
                  <fnm>MG</fnm>
               </au>
               <au>
                  <snm>Reinert</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Remington</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Saunders</snm>
                  <fnm>RD</fnm>
               </au>
               <au>
                  <snm>Scheeler</snm>
                  <fnm>F</fnm>
               </au>
               <au>
                  <snm>Shen</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Shue</snm>
                  <fnm>BC</fnm>
               </au>
               <au>
                  <snm>Siden-Kiamos</snm>
                  <fnm>I</fnm>
               </au>
               <au>
                  <snm>Simpson</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Skupski</snm>
                  <fnm>MP</fnm>
               </au>
               <au>
                  <snm>Smith</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Spier</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Spradling</snm>
                  <fnm>AC</fnm>
               </au>
               <au>
                  <snm>Stapleton</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Strong</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Sun</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Svirskas</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Tector</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Turner</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Venter</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Wang</snm>
                  <fnm>AH</fnm>
               </au>
               <au>
                  <snm>Wang</snm>
                  <fnm>X</fnm>
               </au>
               <au>
                  <snm>Wang</snm>
                  <fnm>ZY</fnm>
               </au>
               <au>
                  <snm>Wassarman</snm>
                  <fnm>DA</fnm>
               </au>
               <au>
                  <snm>Weinstock</snm>
                  <fnm>GM</fnm>
               </au>
               <au>
                  <snm>Weissenbach</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Williams</snm>
                  <fnm>SM</fnm>
               </au>
               <au>
                  <cnm>WoodageT</cnm>
               </au>
               <au>
                  <snm>Worley</snm>
                  <fnm>KC</fnm>
               </au>
               <au>
                  <snm>Wu</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Yang</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Yao</snm>
                  <fnm>QA</fnm>
               </au>
               <au>
                  <snm>Ye</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Yeh</snm>
                  <fnm>RF</fnm>
               </au>
               <au>
                  <snm>Zaveri</snm>
                  <fnm>JS</fnm>
               </au>
               <au>
                  <snm>Zhan</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Zhang</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Zhao</snm>
                  <fnm>Q</fnm>
               </au>
               <au>
                  <snm>Zheng</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Zheng</snm>
                  <fnm>XH</fnm>
               </au>
               <au>
                  <snm>Zhong</snm>
                  <fnm>FN</fnm>
               </au>
               <au>
                  <snm>Zhong</snm>
                  <fnm>W</fnm>
               </au>
               <au>
                  <snm>Zhou</snm>
                  <fnm>X</fnm>
               </au>
               <au>
                  <snm>Zhu</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Zhu</snm>
                  <fnm>X</fnm>
               </au>
               <au>
                  <snm>Smith</snm>
                  <fnm>HO</fnm>
               </au>
               <au>
                  <snm>Gibbs</snm>
                  <fnm>RA</fnm>
               </au>
               <au>
                  <snm>Myers</snm>
                  <fnm>EW</fnm>
               </au>
               <au>
                  <snm>Rubin</snm>
                  <fnm>GM</fnm>
               </au>
               <au>
                  <snm>Venter</snm>
                  <fnm>JC</fnm>
               </au>
            </aug>
            <source>Science</source>
            <pubdate>2000</pubdate>
            <volume>287</volume>
            <issue>5461</issue>
            <fpage>2185</fpage>
            <lpage>2195</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1126/science.287.5461.2185</pubid>
                  <pubid idtype="pmpid" link="fulltext">10731132</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B20">
            <title>
               <p>The genome sequence of the malaria mosquito Anopheles gambiae</p>
            </title>
            <aug>
               <au>
                  <snm>Holt</snm>
                